Behavioral Cloning for the Robosoccer Simulator


Description:

      The aim of Behavioral Cloning is to imitate other agents behavior based on observation [BS95] . Inputs and outputs of the agent to be imitated are observed, and then a function (or model) mapping inputs to outputs is induced from these (input, output) pairs. This can be useful for transferring behaviors between agents, or from humans to agents. For instance, Sammut et al. use this framework to learn from a human (the agent to be cloned) piloting a flight simulator [SHKM92] . Laird et al. [LL99] built the KNOMIC system, that learns to imitate an expert from input/output traces in dynamic, non-deterministic, and noisy domains. The authors claim that KNOMIC can learn from observing a human in difficult domains such as Air Combat and Quake II (a videogame). Sklar et al. [SBP01] , and more recently Thurau et al. [TBS04] , have applied this framework to computer games like Tron and Quake. Behavioural cloning is complicated by the fact that the internal state and inner workings of the agent to be modeled, are not available. Previous research has tried to work around this problem by inducing the agent's subgoals and using them to build the model [SB97] . In addition, in many agent worlds, the environment can be observed only partially. Agents working in these worlds make use of memories. This memory skill should be replicated somehow in the clon, otherwise its behavior will differ from the original agent. In fact, any cognitive skill available to the original agent, like memory, forecasting capabilities, high level planning, etc. should be made available to the clon. Behavioral Cloning is still an open research field. Behavioural Cloning is a very general framework that can be applied to many domains. Here, the Robocup domain will be considered. The goal of Robocup is, "by the year 2050, develop a team of fully autonomous humanoid robots that can win against the human world soccer champion team". In order to ignore problems directly related to robotic hardware and focus on high level control tasks, there exists a Robosoccer Simulator, that simulate a soccer field, where software agents can connect and interact. In order to clone human behavior, the work carried out by Aler et al. lets a human player interact with the simulator by means of a special purpose interface [AGV05] . Thus, the human can play Robosoccer like a videogame. The interface has been carefully designed so that the only information that is displayed to the user is the one available to the actual agent in the simulated field. A trace is obtained by observing the human play. Records are obtained for every server cycle. This trace is made of many "(s, a)" such records, where "s" is the observation made by the agent sensors (distance to the ball, angle to the ball, etc). And "a" is the action carried out by the human player in that situation (for instance, kicking the ball, turning, etc.). Then, Machine Learning tecniques can be used to obtain a classifier that determines which action has to be carried out in a particular situation. Finally, the classifier must be programmed into a software agent. If the agent is a proper clon of the human, its gameplay should be similar.

 

     

Click here to get this description in tex format and here to get the figure in eps format.

Instances and best known solutions for those instances:

      The data provided here has been obtained from a human playing a striker against 1 goalie and 3 defences. The human has to dribble the opponents and score. Data is supplied in arff Weka's format. In this particular domain (behavioral clonning for the Robosoccer Simulator), it is difficult to summarize behavior by means of classification accuracy, because what should be evaluated is the complete play of the agent, which so far, can only be assesed visually by watching the agent play. However, for comparison purposes, the PART algorithm implemented in Weka obtained a 93.56% 10-fold crossvalidation accuracy (with standard parameters) and 92.65% (with 10-fold reduced error-pruning) [AGV05] . The meaning of the attributes is as follows:

  • Distance-ball: Distance from the ball to the agent
  • Angle-ball: Angle between the ball and the agent's view line
  • Ball-in-view: Whether the ball is within the view cone of the agent
  • X, Y: Absolute agent location
  • Angle: Angle of the agent's view line
  • Distance-opponent1: Distance from the closest opposite player to the agent
  • Angle-opponent1: Angle between the agent and the closest opponent
  • Opponent2-in-view: It indicates whether the closest opponent could be seen in the last server cycle
  • Distance-opponent2: Distance from the second closest opposite player to the agent
  • Angle-opponent2: Angle between the agent and the second closest opponent
  • Opponent2-in-view: It indicates whether the second closest opponent could be seen in the last server cycle
  • Distance-goal: Distance to the opponent's goal
  • Angle-goal: Angle between the opponent's goal and the agent
  • Goal-in-view: It indicates whether the opponent's goal could be seen during the last server cycle
  • Action: Action carried out by the human (dash with strength 99, dash with strength 60, turn 10 degrees left, turn 10 degrees right, kick with strength 99, kick with strength 60

Data

Related Papers:

[BS95] M. Bain, C. Sammut. A Framework for Behavioral Cloning. Machine Intelligence Agents. Oxford University Press. 1995.

[SHKM92] C. Sammut, S. Hurst, D. Kedzier, D. Michie. Learning to Fly. Proceedings of the Ninth International Conference on Machine Learning. 385-393. 1992.

[LL99] . Michael van Lent, John Laird. Learning Hierarchical Performance Knowledge by Observation. Proceedings of the Sixteenth International Conference on Machine Learning. 229-238. 1999.

[SBP01] . Elizabeth Sklar, Alan D. Blair, Jordan B. Pollack. Training Intelligent Agents Using Human Data Collected on the Internet (chapter 8). Agent Engineering. 201-226. 2001.

[TBS04] . C. Thurau, C. Bauckhage, G. Sagerer. Imitation learning at all levels of game-AI. Proc. Int. Conf. on Computer Games, Artificial Intelligence,Design and Education. 402-408. 2004.

[SB97] . Suc D,, Bratko I. Skill reconstruction as induction of LQ controllers with subgoals. IJCAI-97: Proc. 15th International joint conference on artificial intelligence. 914-920 (Vol 2). 1997.

[AGV05] . Ricardo Aler, Oscar Garcia, Jose M. Valls. Correcting and Improving Imitation Models of Humans for Robosoccer Agents. Conference on Evolutionary Computation (CEC'05), Evolutionary Computation and Games Session. 2005.

Click here to get the bibliography in bibtex format.

Last Updated: 1/7/05                                                                               For any question or suggestion, click here to contact with us.