|
 |
|
Description:
The aim of Behavioral Cloning is to imitate other agents behavior
based on observation [BS95] . Inputs and outputs
of the agent to be imitated are observed, and then a function (or
model) mapping inputs to outputs is induced from these (input, output)
pairs. This can be useful for transferring behaviors between agents,
or from humans to agents. For instance, Sammut et al. use this
framework to learn from a human (the agent to be cloned) piloting a
flight simulator [SHKM92] . Laird et al. [LL99] built the KNOMIC system, that learns to
imitate an expert from input/output traces in dynamic,
non-deterministic, and noisy domains. The authors claim that KNOMIC
can learn from observing a human in difficult domains such as Air
Combat and Quake II (a videogame). Sklar et al.
[SBP01] , and more recently Thurau et al.
[TBS04] , have applied this framework to computer games like Tron
and Quake. Behavioural cloning is complicated by the fact that the
internal state and inner workings of the agent to be modeled, are not
available. Previous research has tried to work around this problem by
inducing the agent's subgoals and using them to build the model [SB97] . In addition, in many agent worlds, the
environment can be observed only partially. Agents working in these
worlds make use of memories. This memory skill should be replicated
somehow in the clon, otherwise its behavior will differ from the
original agent. In fact, any cognitive skill available to the
original agent, like memory, forecasting capabilities, high level
planning, etc. should be made available to the clon. Behavioral
Cloning is still an open research field.
Behavioural Cloning is a very general framework that can be applied to
many domains. Here, the Robocup
domain will be considered. The goal of Robocup is, "by the year
2050, develop a team of fully autonomous humanoid robots that can win
against the human world soccer champion team". In order to ignore
problems directly related to robotic hardware and focus on high level
control tasks, there exists a Robosoccer Simulator, that simulate a
soccer field, where software agents can connect and interact. In order
to clone human behavior, the work carried out by Aler et al. lets a
human player interact with the simulator by means of a special purpose
interface [AGV05] . Thus, the human can play
Robosoccer like a videogame. The interface has been carefully designed
so that the only information that is displayed to the user is the one
available to the actual agent in the simulated field. A trace is
obtained by observing the human play. Records are obtained for every
server cycle. This trace is made of many "(s, a)" such records, where
"s" is the observation made by the agent sensors (distance to the
ball, angle to the ball, etc). And "a" is the action carried out by
the human player in that situation (for instance, kicking the ball,
turning, etc.).
Then, Machine Learning tecniques can be used to obtain a classifier
that determines which action has to be carried out in a particular
situation. Finally, the classifier must be programmed into a software
agent. If the agent is a proper clon of the human, its gameplay should be
similar.
|
|
|
Instances and best known solutions for those instances:
The data provided here has been obtained from a human playing a
striker against 1 goalie and 3 defences. The human has to dribble the
opponents and score. Data is supplied in arff Weka's format. In
this particular domain (behavioral clonning for the Robosoccer
Simulator), it is difficult to summarize behavior by means of
classification accuracy, because what should be evaluated is the
complete play of the agent, which so far, can only be assesed visually
by watching the agent play. However, for comparison purposes, the PART
algorithm implemented in Weka
obtained a 93.56% 10-fold crossvalidation accuracy (with standard
parameters) and 92.65% (with 10-fold reduced error-pruning) [AGV05] . The meaning of the attributes is as
follows:
- Distance-ball: Distance from the ball to the agent
- Angle-ball: Angle between the ball and the agent's view line
- Ball-in-view: Whether the ball is within the view cone of the agent
- X, Y: Absolute agent location
- Angle: Angle of the agent's view line
- Distance-opponent1: Distance from the closest opposite player to the agent
- Angle-opponent1: Angle between the agent and the closest opponent
- Opponent2-in-view: It indicates whether the closest opponent could be seen in the last server cycle
- Distance-opponent2: Distance from the second closest opposite player to the agent
- Angle-opponent2: Angle between the agent and the second closest opponent
- Opponent2-in-view: It indicates whether the second closest opponent could be seen in the last server cycle
- Distance-goal: Distance to the opponent's goal
- Angle-goal: Angle between the opponent's goal and the agent
- Goal-in-view: It indicates whether the opponent's goal could be seen during the last server cycle
- Action: Action carried out by the human (dash with strength 99, dash with strength 60, turn 10 degrees left, turn 10 degrees right, kick with strength 99, kick with strength 60
|
|
|
Related Papers:
[BS95] M. Bain, C. Sammut. A
Framework for Behavioral Cloning. Machine Intelligence
Agents. Oxford University Press. 1995.
[SHKM92] C. Sammut, S. Hurst,
D. Kedzier, D. Michie. Learning to Fly. Proceedings of the
Ninth International Conference on Machine
Learning. 385-393. 1992.
[LL99] . Michael van Lent, John
Laird. Learning Hierarchical Performance Knowledge by
Observation. Proceedings of the Sixteenth International
Conference on Machine Learning. 229-238. 1999.
[SBP01] . Elizabeth Sklar,
Alan D. Blair, Jordan B. Pollack. Training Intelligent Agents
Using Human Data Collected on the Internet (chapter 8). Agent
Engineering. 201-226. 2001.
[TBS04] . C. Thurau,
C. Bauckhage, G. Sagerer. Imitation learning at all levels of
game-AI. Proc. Int. Conf. on Computer Games, Artificial
Intelligence,Design and Education. 402-408. 2004.
[SB97] . Suc D,, Bratko I. Skill
reconstruction as induction of LQ controllers with
subgoals. IJCAI-97: Proc. 15th International joint conference on
artificial intelligence. 914-920 (Vol 2). 1997.
[AGV05] . Ricardo Aler, Oscar
Garcia, Jose M. Valls. Correcting and Improving Imitation Models of
Humans for Robosoccer Agents. Conference on Evolutionary Computation
(CEC'05), Evolutionary Computation and Games Session. 2005.
Click here to get the bibliography in bibtex
format.
|
|