Multi Agent Systems In Complex

Published Date: 02 Nov 2017

ABSTRACT

Multi-agent systems in complex, real-time domains require agents to act effectively both sovereign and as part of a team. This dissertation addresses multi-agent systems consisting of teams of autonomous agents acting in real-time, noisy, collaborative, and adversarial environments. The engineering of multi-agent systems composed of learning agents brings together techniques from machine learning, game theory, utility theory, and complex systems. A designer must choose carefully which machine-learning algorithm to use since otherwise the systemâ€™s behavior will be unpredictable and often undesirable. A growing number of agents adopting software engineering methodologies have been proposed in recent years. The purpose of these methodologies is to provide methods, models, techniques and tools so that the development of softwareâ€™s can be carried out in a formal and systematic way. Machine Learning is an interesting and promising area to merge with Multi agent Systems. Machine learning has the potential to provide robust mechanisms that leverage upon experience to equip agents with a large spectrum of behaviors, ranging from effective individual performance in a team, to collaborative achievement of independently and jointly set high-level goals in the presence of adversaries. The whole dissertation foundation is based on the concept of multi-agent systems and the notion of adaptation and learning in different systems. This includes a description of key dimensions for classifying multi-agent systems, as well as a description of key criteria for characterizing single-agent and multi-agent learning as the two principal categories of learning in multi-agent systems.

PROJECT PROPOSAL

Student Name:

Student No.:

Email Address:

Award Name: MS in Software Engineering

Site Name: APIIT India

Title of the Project:

Layered Learning in Multi Agent Systems

Proposed Supervisor (If known):

INTRODUCTION

Learning is an essential element of intelligent behaviour. We know that humans cannot learn an arbitrary piece of knowledge at any time. Instead, an agent is receptive to those ideas that would not be too difficult to learn with a reasonably small amount of effort. Other ideas remain unfathomable and distant, until the agentâ€™s knowledge develops further, rendering such formerly difficult knowledge now simple enough to absorb. In this way, knowledge accumulates incrementally over the lifetime of the agent. This is the starting point for our discussion that knowledge accumulates over the lifetime of the agent, seemingly as a result of a very basic kind of learning mechanism. Our discussion focuses on computational processes, not psychological validity or neurological plausibility [UT Goff Paul E, Stracuzzi David J, 2001].

As knowledge is assimilated, the frontier of receptivity advances, improving the basis for further I explore the idea that knowledge can accumulate incrementally in a virtually unbounded number of layers, and refer to this view and its approaches as multi-layered learning. I proceed with a discussion of why layered learning knowledge representation is essential for maximizing knowledge compression and hence generalization.

Multi-agent Systems is the emerging subfield of Artificial Intelligence that aims to provide both principles for construction of complex systems involving multiple agents and mechanisms for coordination of independent agentsâ€™ behaviours. As of yet, there would little work with Multi-agent Systems that require real-time control in noisy environments. Because of the inherent complexity of this type of Multi-agent System, Machine Learning is an interesting and promising area to merge with Multi-agent Systems. Machine learning has the potential to provide robust mechanisms that leverage upon experience to equip agents with a large spectrum of behaviours, ranging from effective individual performance in a team, to collaborative achievement of independently and jointly set high-level goals in the presence of adversaries. Learning will also help agents adapt to unforeseen behaviours on the parts of other agents, through the use of on-line adaptive methods that may include explicit opponent modelling. My dissertation will focus on learning in this particularly complex class of multi-agent domains.

DISSERTATION AIMS/OBJECTIVES

The approach of this dissertation is to find the answer of the question that

Dissertation Question

Can layered learning help multi-agents to work autonomously and in a group as teammates in a real-time, noisy environment with limited communication?

More specifically, the dissertation contributes an agent structure enabling the use of ML techniques to improve an agent's behavior in domains with the following characteristics:

A need for real-time decision-making. Whether the agents can make their own decisions at the real time by them-selves or they will perform as they are feed to do.

Sensor and actuator noise with hidden state.

The designed framework will be then used to carry out future analysis. This would bring the hidden key strengths and weaknesses of selected prominent agent-oriented methodologies.

Several independent agents with the same well-defined high-level goal that is the agents can work together as team members and share their decisions among them-selves and work as a well-defined group.

The similarities and differences between selected agent-oriented methodologies will be identified by using a structural analysis of their process and models.

The aim of the author of this dissertation is to conduct research analysis on the layered learning of multi agents system so that revolutionary systems can be designed that can take the right decisions at the right moment by their own without getting any external help and the agents can work in a team as well individually. The general approaches to answering the dissertation question would be to create an existence proof: a full-edged functioning multi-agent system that incorporates learning in a real-time, noisy environment with both teammates and adversaries. The methodology designed will help software companies in the field of software development. The new analysis are needed for software development companies so that the software can be developed within the estimated time and meet the customer requirements by producing the expected results at the end of development.

The objectives of this research are:

In partial fulfilment of the award of the master degree requirement for software engineering technologies.

To learn software engineering concepts and apply them in the development of Multi Agent Oriented Software Engineering.

Apply the various Software Engineering Principles to the system for the ease of learning.

To evaluate and compare the current multi-agent-oriented system engineering methodologies in a way that their strengths can be combined to provide new heights for the new advance system and their flaws can be eradicated.

The manner the authorâ€™s object would be fulfilled is as follows:

The author is going to categorise the whole dissertation into four different categories. First, the dissertation defines a team member agent architecture within which a flexible team structure is presented, allowing agents to decompose the task space into flexible roles and allowing them to smoothly switch roles while acting. Team organization is achieved by the introduction of a locker-room agreement as a collection of conventions followed by all team members. It defines agent roles, team formations, and pre-compiled multi-agent plans.

Secondly, the dissertation will introduce layered learning, a general-purpose machine learning paradigm for complex domains in which learning a mapping directly from agents' sensors to their actuators is intractable. Given hierarchical task decomposition, layered learning allows for learning at each level of the hierarchy, with learning at each level directly affecting learning at the next higher level.

Third, the dissertation will introduce a new multi-agent reinforcement learning algorithm, namely team-partitioned, opaque-transition reinforcement learning (TPOT-RL). TPOT-RL is designed for domains in which agents cannot necessarily observe the state changes when other team members act. It exploits local, action-dependent features to aggressively generalize its input representation for learning and partitions the task among the agents, allowing them to simultaneously learn collaborative policies by observing the long-term effects of their actions.

Fourth, the dissertation will ontributes a fully functioning multi-agent system that incorporates learning in a real-time, noisy domain with teammates and adversaries. Detailed algorithmic descriptions of the agents' behaviours as well as their source code are included in the dissertation.

Ultimately, this dissertation demonstrates that by learning portions of their cognitive processes, selectively communicating, and coordinating their behaviours via common knowledge, a group of independent agents can work towards a common goal in a complex, real-time, noisy, collaborative, and adversarial environment.

DISSERTATION BACKGROUND

Learning is a process of compressing observations and experiences into a form that can be applied advantageously thereafter. A general statement or hypothesis may explain a great many observations succinctly, and because it exploits regularity to achieve compression, it will likely be an excellent predictor of future events [Rissanen J. & Langdon, 1979]. To the extent that a hypothesis is a correct theory, it can help the agent to predict consequences, and therefore to improve the agentâ€™s projective reasoning. Structural and procedural knowledge can each be compressed, and this has important implications not only for space consumption, but also for time consumption and learnability.

A Software Agent is a computer program which works toward goals in a dynamic environment on behalf of human or computational. They work without direct supervision and possibly over an extended period of time. An agent exhibits a significant degree of flexibility in how it seeks to transform goals into action tasks [agdef.htm, 2006].

We are given one world, with multiple internal states, one of which is the initial state of the world. We have an n number of agents inhabiting this world. We also have one World function (this is the function of the world). It has two arguments. The first one is the current state of the world, while the second is the n-th set the actions of the n number of agents. The function of the world returns the new internal state of the world, which results from the current one and from the actions of the n agents.

The whole research of the dissertation revolves around the concept of multi agent systems with an addition essence of layered learning. The very research involves huge amount of prior knowledge about exactly what this whole concept deals with and how to answer the very question of whether layered learning multi agents systems can work as a team member without getting affected by the outside interference. These types of systems are basically an open system which can exchange energy from the system outside it but can also respond independently at the time of decision making.

The software engineering is a very vast field with lots of complexities. Therefore the existing methodologies must be extended to ensure sophisticated and autonomous intelligence. This can be achieved by using agent methodologies, because they can inherit properties of reactive system, simple functional systems and distributed concurrent systems. Jennings and Wooldridge (1998) suggest three classes of systems utilizing the Agent-oriented computing model: Open Systems, Complex Systems, and Ubiquitous Computing Systems.

Though these types of systems are quite expensive and really hard to build because these require lots of effort and hard work and above these require a tone of research and time.

Agent-Oriented Software Engineering needs tools and methodologies to cover the stages from the problem realization, requirements gathering and architectural design to implementation. All these steps require a life cycle process including verification, testing and re-engineering to prove that the integrated system is healthy [Lin, 2005].

INTELLECTUAL CHALLENGE

The field faces many challenges, some pragmatic and others deeply theoretical. A pragmatic one that seems very important and pressing is the development of an appropriate high-level software infrastructure/framework to support the building of multi-agent systems. At this point, the programming overhead to create a nontrivial multi-agent system is still high and, thus, the number of fielded commercial applications is small. The development of such a framework is timely because of the emerging software infrastructure and standards being developed for mobile computing and interoperability among programs residing at distant sites (e.g., Java) which will simplify the construction of agents. However, this work will only partially solve the problems of building multi-agent systems since it does not deal with high-level coordination issues. There are two possible approaches to building these higher level capabilities.

In addition to this, the other intellectual challenges are:

There has also been a long tradition of work dating back to the inception of the field on coordination based on logical reasoning about the beliefs, desires, intentions (BDI) and commitments of agents and more recent work on the use of market mechanisms for solving multi-agent resource allocation problems . Similarly, there has been little cross-fertilization among these areas and the self-interested and cooperative camps. The synthesis of ideas from each of these different approaches to coordination holds great potential for future developments in the field.

There are different approaches for the translation of Multi-Agent System specification into code and obtaining code from the specification. The consistency between both is hard to find.

Agent oriented software engineering methodology does not support the implementation phases well. Incorporating quality assurance, guideline estimation is not an easy task with them. Research shows that during the actual proposal of these methodologies outside of student projects is unclear.

Another issue is how to scale up to agent societies of hundreds and thousands of agents.

The challenge of how to design large-scale agent societies and how to evolve them as the environment changes is rapidly becoming a major issue in the field. Working on this problem will also shed light on many of the basic issues in multi-agent systems research. [Huhns, 1993]

Understanding the problem domain is essential for agent oriented systems engineering. Therefore, choosing the perfect agent-oriented methodology for building software systems will be a challenge [Wooldridge, 1999].

An important trend in the field is the development of analysis techniques to predict the performance of multi-agent systems. These performance characterizations also relate to the applicability of specific coordination mechanisms. [Corkill, 2003].

RESEARCH PROGRAMME

The research programme provides the action plan to suggest the new analysis and design technique in software engineering using Agent Oriented, and tells how the work will be done. The research programme will guide the search for the entire project.

Project Initiation Date:

Project Duration: 24 weeks

Project Completion Date:

WORK BREAKDOWN STRUCTURE (WBS)

Research Analysis (Estimated Time =3 Weeks)

Task: Introduction and Background (Duration : 7 Days)

1.1.1 Multi-Agent Systems

1.1.2. Genetic Programming

1.1.3. Strategic Games

1.1.4. Existing Systems

Task: Detailed Dissertation Task (Duration : 14 Days)

Research on specification of the dissertation

Identify dissertation task

Research/Further Investigation (Estimated Time= 9 Weeks)

Task: System Investigation and Research (Duration :6 Days)

The purpose of the evaluation

The evaluation type and procedure

Research on the functional areas of the existing Multi Agent methodologies.

Accumulate research findings

Categorize research modules

Evaluate findings

Task: The Evaluation Framework (Duration :5 Days)

Concepts

Modeling Language

Process

Pragmatics

Task : Result of the Survey (Duration :3 days)

Concepts

Modeling Language

Process

Pragmatics

Task : Result of the Case Study (Duration :6 days)

Execution Model

Complexity analysis

Structure of Knowledge

Model of Protocols

Handlers

Task : Structural Analysis â€“ The Commonalities (Duration :4 days)

Capturing Goals

The Role of Use Cases in Requirement Analysis

Roles

Social System â€“ Static and Dynamic structure

Individual Agent

Agent Acquaintance Model

Task : Structural Analysis â€“ The Difference (Duration :6 days)

Early Requirements in Execution Model

Environmental Model in MAS

Task : A Mature and Complete MAS Methodology (Duration :6 days)

Design/Method notations (Estimated Time= 4 Weeks)

Task : Creation and Interface design

Requirement Analysis

Architecture Design

Detailed Design

Find out the entities

Build relationship between entities

Creation of rough interface design

Document the system

Implementation (Estimated Time= 4 Weeks)

Testing (Estimated Time= 3 Weeks)

Integrate all the modules

Prepare a test plan

Find the target test users

Testing and debugging the full system

Evaluation and Conclusions (Estimated Time= 1 Week)

Compile system documentation

Finishing

DELIVERABLES

The layered learning in multi agent systems using multi-agents proposed by the author will remove the existing problems and gaps that computer world is currently facing. With the assistance of the proposed procedure, the development task will be easy and the software will be delivered on time and within budget. The fortes and dimness based on a feature-based evaluation must be assessed. This will be achieved by using a survey and a case study on the Itinerary Planner System.

The main deliverables for the dissertation are:

Team Member Agent Architecture: The team member agent architecture is suitable for domains with periodic opportunities for safe, full communication among team members, interleaved with periods of limited communication and a need for real-time action. This architecture includes mechanisms for task decomposition and dynamic role assignment as well as for communication in single-channel, low-bandwidth communication environments. It is implemented in both simulated and real robotic soccer.

Layered Learning: Layered learning is a hierarchical ML paradigm that combines multiple machine learning modules, each directly affecting the next. Layered learning is described in general and then illustrated as a set of three interconnecting learned behaviors within a complex, real-time, collaborative and adversarial domain.

Documented Result: The results and other technical specifications will be documented. These documents will show with example, how the new methodology can be applied for the layered learning analysis and how it will benefit others.

RESOURCES

The following is the list of the hardware and software resources that will help in getting the appropriate information about Agents, Software engineering and Object oriented techniques:

A computer/laptop with high speed internet connectivity.

Windows 7, MS Office 2010

Experts of the different areas such as MAS, Software Engineering.

College Library

Online and offline literature which includes articles, journals, publications, white papers etc.

SOLUTIONS OF IDENTIFIED PROBLEMS

Some solutions for identified problem are listed below. Some more solution would be providing after the complete research work in final dissertation document.

More learned layers:

The layered learning implementation in this dissertation will consists of three learned layers ranging from an individual behavior to a team behavior. The possible extension of this implementation to include another team behavior and an adversarial behavior. Successfully implementing these or other additional learned layers would be an interesting avenue for future research.

Automatic task decomposition:

Layered learning works with a given task decomposition. However, it could be combined with a method for learning task decompositions. Let A be an algorithm for learning task decompositions within a domain. Suppose that A does not have an objective metric for comparing different decompositions. Applying layered learning on the task decomposition and quantifying the resulting performance can be used as a measure of the utility of A's output. The creation of an algorithm A for learning task decompositions is a challenging open research direction.

Error propagation.

In this dissertation, each of the learned layers will be validate in a controlled testing scenario, demonstrating the effectiveness of the learned behavior. However, no study is made of the propagation of errors from one layer to the next. It is possible that errors at initial layers could hurt performance at all subsequent layers. However, since learned layers are trained individually, it is also possible that the learning at subsequent layers could compensate for earlier errors and thus render them inconsequential. A detailed study of error propagation and compensation within layered learning implementations is a promising area for future research.

Choosing a reward function

Similarly, the implementations of TPOT-RL in this dissertation vary the agents' reward function only minimally. As with all reinforcement learning methods, the reward function has an important effect on what is learned. It would be interesting to quantify the effects of the reward function on the learned behaviors when using TPOT-RL.

LITERATRE REVIEW

INTRODUCTION

Multi-agent systems are often presented as the next major generation of software to cope with the increasing complexity in modern applications. MAS are distributed systems of autonomous and interacting entities named agents. They are possibly large-scale systems and the agent research community aims at having agents collaborate or compete with one another to achieve their functions in a highly modular and flexible way. Multi-agent systems in complex, real-time domains require agents to act effectively both autonomously and as part of a team. This dissertation addresses multi-agent systems consisting of teams of autonomous agents acting in real-time, noisy, collaborative, and adversarial environments. Because of the inherent complexity of this type of multi-agent system, this dissertation investigates the use of machine learning within multi-agent systems. The dissertation makes four main contribution to the field of Machine learning and Multi-Agent system.

The innovations reported in this dissertation are designed primarily for real-time, noisy, collaborative and adversarial domains. As such, simulated robotic soccer the RoboCup soccer server [Noda et al. 98] in particular has served as an ideal research test bed. However, the positive results achieved are not limited to this domain. Throughout the dissertation, the extent to which each result generalizes is discussed. In addition, some of the techniques developed in simulated robotic soccer would be applied in two other domains with some similar characteristics: real robotic soccer and network routing.

As the main test bed, all the contributions of this dissertation are originally developed and implemented in simulated robotic soccer. It is a fully distributed, multi-agent domain with both teammates and adversaries. There is hidden state, meaning that each agent has only a partial world view at any given moment. The agents also have noisy sensors and actuator, meaning that they do not perceive the world exactly as it is, nor can they affect the world exactly as intended. In addition, the perception and action cycles are asynchronous, prohibiting the traditional AI paradigm of using perceptual input to trigger actions. Communication opportunities are limited; the agents must make their decisions in real-time; and the actions taken by other agents, both teammates and adversaries, and their resulting state transitions are unknown. We refer to this last quality of unknown state transitions as opaque transitions.

In order to test the generality of our simulator results, we transfer some of our techniques to our real robot system. In particular, portions of the team member agent architecture are implemented in the real robot system as well as in simulation. The real robot system is completely different domain from the simulator. First, at the most basic level, the I/O is entirely different. While the simulator deals with abstract, asynchronous perceptions and actions, the robotic system processes real-time video images via an overhead camera and outputs motor control commands synchronously (i.e. triggered by perception) via radio. Second, the agents all share the same perception of the world, which makes the robotic system not completely distributed. However, functionally the robots are controlled independently: each is controlled by an independent function call using a turn-taking methodology. They can also be controlled by separate processes with a common sensory input stream. Three other differences of the robots from the simulator are the absence of communication among teammates (which is possible, but not used in our system), the absence of hidden state (agents have a full world view via an overhead camera), and a resulting full knowledge about the state transitions in the world. These domain differences prevent the use of identical agent programs in the two domains, but they do not limit the applicability of the flexible teamwork structure.

While developed within the context of robotic soccer, the multi-agent algorithms presented in this dissertation generalize beyond robotic soccer as well. To support this claim, we implement one of our algorithms in a different domain, namely network routing. We believe several other multi-agent domains are also similar to robotic soccer. It is part of our future work to continue to identify other such multi-agent domains.

Although network routing differs from robotic soccer in a lot of ways, in an abstract sense it is very similar. Even though our network routing simulator does not involve communication, noisy sensors and actuators, or adversaries, it retains the essential characteristics motivating the development of TPOT-RL: a distributed team of agents operating in real-time with opaque transitions.

Table 1.1 summarizes the domain comparison. The remainder of this chapter provides the domain specifications of simulated robotic soccer, real robotic soccer, and network routing as used experimentally in this dissertation. I use this chapter to describe in detail the aspects of the domains that are not part of the dissertation contributions: the substrates upon which the contributions are built.

Simulator

Robots

Network routing

Distributed perception

Yes

Distributed action

Yes

Asynchronous perception/action

Yes

Teammates

Yes

Adversaries

Yes

Hidden state

Yes

Noisy sensors

Yes

Noisy actuators

Yes

Communication

Yes

Real-time

Yes

Opaque transitions

Yes

Table1.1: A comparison of the experimental domains used in this dissertation .

THE ROBOCUP SOCCER SERVER

The RoboCup soccer servers [Noda et al. 98] would be used as the basis for successful international competitions [RoboCup 97] and research challenges [Kitano et al. 97]. As one of the first users, I helped to test and tune it over the course of its development, and participated in its first test as the basis for a competition (Pre-RoboCup-96 at IROS-96). Experiments reported in this dissertation are conducted in several different versions of the simulator ranging from version 1 to the current version 4. This section describes the current simulator. The soccer server is a complex and realistic domain. Unlike many AI domains, the soccer server embraces as many real-world complexities as possible. It models a hypothetical robotic system, merging characteristics from different existing and planned systems as well as from human soccer players. The server's sensor and actuator noise models are motivated by typical robotic systems, while many other characteristics, such as limited stamina and vision, are motivated by human parameters.

OVERVIEW OF SIMULATORS

The simulator, acting as a server, provides a domain and supports users who wish to build their own agents (also referred to as clients or players). Client programs connect to the server via UDP sockets, each controlling a single player. The soccer server simulates the movements of all of the objects in the world, while each client acts as the brain of one player, sending movement commands to the server. The server causes the player being controlled by the client to execute the movement commands and sends sensory information from that player's perspective back to the client.

When a game is to be played, two teams of 11 independently controlled clients connect to the server. The object of each team is to direct the ball into one of the goals at the ends of the field, while preventing the ball from entering the other goal.

The simulator includes a visualization tool, pictured in Figure 2.1. Each player is represented as a two-halved circle. The light side is the side towards which the player is facing. In Figure 2.1, all of the 22 players are facing the ball, which is in the middle of the field. The black bars on the left and right sides of the field are the goals. The simulator also includes a referee, which enforces the rules of the game. It indicates changes in play mode, such as when the ball goes out of bounds, when a goal is scored, or when the game ends. It also enforces the offside rule. Like in real soccer, a player is offside if it is in the opponent's half of the field and closer to the opponent's goal line (the line along which the goal is located) than all or all but one of the opponent players when the ball is passed to it. The crucial moment for an offside call is when the ball is kicked, not when it is received: a player can be behind all of the opponent defenders when it receives a pass, but not when a teammate kicks the ball towards it1.

Figure 1.1: The soccer server display. Each player is represented as a two-halved circle. The light side is the side towards which the player is facing. All players are facing the ball, which is in the middle of the field. The black bars on the left and right sides of the field are the goals. (Re-drawn based on [Stone, 1998]).

One of the real-world complexities embraced by the soccer server is asynchronous sensing and acting. Most AI domains use synchronous sensing and acting: an agent senses the world, acts, senses the result, acts again, and so on. In this paradigm, sensations trigger actions. On the other hand, both people and complex robotic systems have independent sensing and acting rates. Sensory information arrives via different sensors at different rates, often unpredictably (e.g. sound). Meanwhile, multiple actions may be possible in between sensations or multiple sensations may arrive between action opportunities.

AGENT ARCHITECTURE

In order to create a coherent team of agents, the entire agent architecture must be designed with the team in mind. "Collaboration must be designed into systems from the start; it cannot be patched on. [Grosz 96]"

A multi-agent system which involves several agents that collaborate towards the achievement of a joint objective is viewed as a team of agents. Most proposed teamwork structures (e.g. joint intentions [Cohen et al. 99], shared plans [Grosz 96]) rely on agents in a multi-agent system to negotiate and/or contract with each other in order to initiate team plans. However, in dynamic, real-time domains with unreliable communication, complex negotiation protocols may take too much time and/or be infeasible due to communication restrictions.

Simulated robotic soccer provides a time-critical environment in which agents in a team alternate between periods of limited and unlimited communication. Before games and at half-times, the team can effectively communicate with no limitations: each agent can be given the entire internal decision-making mechanisms of all of its teammates. However, as described next Section during the course of a game, the agents must act independently in a dynamic, real-time, low-bandwidth communication environment: if the agents take the time to fully synchronize while playing, they may miss critical action opportunities and concede an advantage to the opponents. The agent architecture described in this chapter defines a complete agent, including perception, cognition, and action. It is fully implemented as a simulated robotic soccer team.

RELATED WORK

This dissertation contributes to the fields of Multi-Agent Systems (MAS), Machine Learning (ML), and a sub field of ML, Reinforcement Learning (RL). In each of these areas, there is an immense body of literature. The dissertation also contributes to the growing body of research within the robotic soccer domain. In this chapter, In Section 9.1, I review the intersection of MAS and ML. In particular, I give an overview of MAS with emphasis on multi-agent learning approaches [Stone and Veloso 97].In Section I review research within the robotic soccer domain. I review the prior work in these areas that is most related to my dissertation research.

MAS FROM ML PROSPECTIVE

There are many possible ways to divide MAS and the related field of Distributed Artificial Intelligence (DAI). Overviews and taxonomies are numerous [Decker 87, Bond and Gasser 88,Durfee et al. 89,Durfee 92, Lesser 95,Parunak 96,Stone and Veloso 97,Jennings et al. 98, Sycara 98], each with its own way to organize the field. This chapter is organized along two main dimensions: agent heterogeneity and amount of communication among agents. Agents are homogeneous if they are physically and behaviorally identical; they are heterogeneous if they differ in some way. Communication is direct interaction among agents in the world. Beginning with the simplest multi-agent scenario, homogeneous non-communicating agents, the full range of possible multi-agent systems, through highly heterogeneous communicating agents, is considered. Because of the inherent complexity of MAS, there is much interest in using ML techniques to help deal with this complexity [Weib and Sen 96, Sen 96,Wei_ 97]. Many existing ML techniques can be directly applied in multi-agent scenarios by delimiting a part of the domain that only involves a single agent. However multi-agent learning is more concerned with learning issues that arise because of the multi-agent aspect of a given domain. As described by Weib, multi-agent learning is learning that is done by several agents and that become possible only because several agents are present" [Weib 95].

In multi-agent systems, there are multiple agents who model each other's goals and/or actions. In the fully general multi-agent scenario, there may be direct interaction among agents via communication. Although this interaction could be viewed as environmental stimuli, we present inter-agent communication as being separate from the environment. From an individual agent's perspective, the environment's dynamics can be affected by other agents. In addition to the uncertainty that may be inherent in the domain, other agents intentionally affect the environment in unpredictable ways. Thus, all multi-agent systems can be viewed as having dynamic environments.

Figure 1.2: The general multi-agent scenario. Agents model each other's goals, actions, and domain knowledge, which may differ as indicated by the different fonts. They may also interact directly (communicate) as indicated by the arrows between the agents. (Re-drawn based on [Stone, 199])).

The simulated robotic soccer domain can be used to study all of the different multi-agent scenarios presented in this section. Throughout the section, I discuss how the different issues are reflected in this domain.

CONCLUSION

Motivated by the challenges inherent in the soccer server, a simulated robotic soccer domain, this dissertation contributes several techniques for building successful agents in real-time, noisy, collaborative and adversarial environments. This chapter reviews the dissertation's scientific contributions to the fields of Multi-Agent Systems and Machine Learning and then discusses promising directions for future research in this challenging class of domain.

CONTRIBUTIONS OF DISSERTATION

The four main contributions of this dissertation are going to be summarized as follows.

The team member agent architecture presented in Chapter 3 is suitable for PTS domains: domains with team synchronization opportunities interleaved with periods of real-time, dynamic action and limited communication. The team member agent architecture incorporates a locker-room agreement, which includes the definition of a flexible teamwork structure including mechanisms for task decomposition and dynamic role assignment. Roles, team formations, and multi-agent set-plays are defined. The team member agent architecture is implemented both in simulated robotic soccer and in real robotic soccer. Empirical results demonstrate the effectiveness of the teamwork structure and communication paradigm implementations.

Layered learning is a hierarchical ML paradigm applicable in complex domains in which learning a mapping directly from sensors to actuators is intractable. Given hierarchical task decomposition, layered learning allows for learning at each level of the hierarchy, with learning at each level directly affecting learning at the next higher level. Layered learning is applied in simulated robotic soccer as a set of three interconnecting learned behaviors. At the lowest layer, ball interception, an individual skill, is learned. This learned individual skill is used as part of the training behavior for learning pass evaluation, a multi-agent behavior (Chapter 6). The learned multi-agent behavior is then used to create the input space for learning pass selection, a team behavior. All of the learned behaviors are validated empirically in controlled testing scenarios.

TPOT-RL is a new multi-agent reinforcement learning method. TPOTRL is designed for complex domains in which agents cannot necessarily observe the state transitions caused by their or other agents' actions. It exploits local, action dependent features to aggressively generalize its input representation for learning and partitions the task among the agents, allowing them to simultaneously learn collaborative policies by observing the long-term effects of their actions. TPOT-RL is applicable in domains with large state spaces and limited training opportunities. TPOT-RL is developed and tested in the simulated robotic soccer domain. It is also successfully applied in another multi-agent domain, namely network routing.

The CMUnited simulated robotic soccer system is a fully implemented and operational team of simulated robotic soccer agents which has performed successfully at international tournaments. This dissertation contributes algorithmic details of the CMUnited implementation, as well as its source code.

BIBLIOGRAPHIC REFERENCES

Sorin Achim, Peter Stone, and Manuela Veloso. Building a dedicated robotic soccer system. In Proceedings of the IROS-96 Workshop on RoboCup, pages 41{48, Osaka, Japan, November 1996. (pp. 43, 46, 49, 87, 171, 203)

P. Agre and D. Chapman. Pengi: An implementation of a theory of activity. In Proceedings of the National Conference on Artificial Intelligence, pages 268{272, 1987. (p. 20).

Tomohito Andou. Refinement of soccer agents' positions using reinforcement learning. In Hiroaki Kitano, editor, RoboCup-97: Robot Soccer World Cup I, pages 373{388. Springer Verlag, Berlin, 1998. (pp. 97, 167, 206)

David Andre and Astro Teller. Evolving team Darwin united. In Minoru Asada and Hiroaki Kitano, editors, RoboCup-98: Robot Soccer World Cup II. Springer Verlag, Berlin, 1999. (pp. 100, 206)

Elisabeth Andre, Gerd Herzog, and Thomas Rist. On the simultaneous interpretation of real world image sequences and their natural language description: The system soccer. In Proc. of the 8th ECAI, pages 449{454, Munich, 1988. (p. 202)

David Andre, Emiel Corten, Klaus Dorer, Pascal Gugenberger, Marius Joldos, Johan Kummeneje, Paul Arthur Navratil, Itsuki Noda, Patrick Riley, Peter Stone, Romoichi Takahashi, and Travlex Yeap. Soccer server manual, version 4.0. Technical Report RoboCup-1998-001, RoboCup, 1998. At URL http://ci.etl.go.jp/~noda/soccer/server/Documents.html. (pp. 27, 30, 31, 32, 35, 37, 39, 43)

Elisabeth Andre, Gerd Herzog, and Thomas Rist. Generating multimedia presentations for RoboCup soccer games. In Hiroaki Kitano, editor, RoboCup-97: Robot Soccer World Cup I, pages 200{215. Springer Verlag, Berlin, 1998. (p. 204)

Sachiyo Arai, Kazuteru Miyazaki, and Shigenobu Kobayashi. Generating cooperative behavior by multi-agent reinforcement learning. In Sixth European Workshop on Learning Robots, Brighton, UK, August 1997. (p. 167)

Ronald C. Arkin. Towards Cosmopolitan Robots: Intelligent Navigation in Extended Man-Made Environments. PhD dissertation, University of Massachusetts, September 1987. (p. 20)

R.C. Arkin. Cooperation without communication: Multi-agent schema based robot navigation.

Journal of Robotic Systems, 9(3):351{364}, April 1992. (p. 20).

Tucker Balch. Behavioral Diversity in Learning Robot Teams. PhD dissertation, College of Computing, Georgia Institute of Technology, 1998. (pp. 98, 188, 192, 206)

R. A. Barman, S. J. Kingdon, J. J. Little, A. K. Mack worth, D. K. Pai, M. Sahota, H. Wilkinson, and Y. Zhang. DYNAMO: Real-time experiments with multiple mobile robots. In Intelligent Vehicles Symposium, pages 261, 66, Tokyo, July 1993. (pp. 19, 185, 202)

Alan H. Bond and Les Gasser. An analysis of problems and research in DAI. In Alan H. Bond and Les Gasser, editors, Readings in Distributed Artificial Intelligence, pages 3{35. Morgan Kaufmann Publishers, San Mateo, CA, 1988. (p. 181)

Mike Bowling, Peter Stone, and Manuela Veloso. Predictive memory for an inaccessible environment. In Proceedings of the IROS-96 Workshop on RoboCup, pages 28{34, Osaka, Japan, November 1996. (p. 68)

J. A. Boyan and M. L. Littman. Packet routing in dynamically changing networks: A reinforcement learning approach. In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances In Neural Information Processing Systems 6. Morgan Kaufmann Publishers, 1994. (pp. 49, 167)

Justin A. Boyan and Andrew W. Moore. Generalization in reinforcement learning: Safely approximating the value function. In G. Tesauro, D. Touretzky, and T. Leen, editors, Ad-vances in Neural Information Processing Systems, volume 7. The MIT Press, 1995. (p. 166)

V. Braitenburg. Vehicles experiments in synthetic psychology. MIT Press, 1984. (p. 224) Rodney A. Brooks. A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation, RA-2:14{23, 1986. (pp. 20, 101)

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now

Multi Agent Systems In Complex

ABSTRACT

PROJECT PROPOSAL

Student Name:

Student No.:

Email Address:

Title of the Project:

Proposed Supervisor (If known):

INTRODUCTION

DISSERTATION AIMS/OBJECTIVES

Dissertation Question

Can layered learning help multi-agents to work autonomously and in a group as teammates in a real-time, noisy environment with limited communication?

DISSERTATION BACKGROUND

INTELLECTUAL CHALLENGE

RESEARCH PROGRAMME

Project Initiation Date:

Project Completion Date:

WORK BREAKDOWN STRUCTURE (WBS)

Research Analysis (Estimated Time =3 Weeks)

1.1.1 Multi-Agent Systems

1.1.2. Genetic Programming

1.1.3. Strategic Games

1.1.4. Existing Systems

Research/Further Investigation (Estimated Time= 9 Weeks)

Design/Method notations (Estimated Time= 4 Weeks)

Task : Creation and Interface design

Implementation (Estimated Time= 4 Weeks)

Testing (Estimated Time= 3 Weeks)

Evaluation and Conclusions (Estimated Time= 1 Week)

DELIVERABLES

The main deliverables for the dissertation are:

RESOURCES

SOLUTIONS OF IDENTIFIED PROBLEMS

More learned layers:

Automatic task decomposition:

Error propagation.

Choosing a reward function

LITERATRE REVIEW

INTRODUCTION

Simulator

Robots

Network routing

Table1.1: A comparison of the experimental domains used in this dissertation .

THE ROBOCUP SOCCER SERVER

OVERVIEW OF SIMULATORS

AGENT ARCHITECTURE

RELATED WORK

MAS FROM ML PROSPECTIVE

CONCLUSION

CONTRIBUTIONS OF DISSERTATION

The four main contributions of this dissertation are going to be summarized as follows.

BIBLIOGRAPHIC REFERENCES

Our Service Portfolio

Want To Place An Order Quickly?

Do not panic, you are at the right place

Get 20% Discount, Now £19 £14/ Per Page14 days delivery time

Get An Instant Quote

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time