CSCE 475/875

Handout 9: Search Algorithms for Agents

October 1, 2007

 

This handout is based on Chapter 4 of G. Weiss, (Ed.), Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence, MIT Press, 1999.

 

Introduction

 

Search algorithms that are useful for problem solving by multiple agents.

 

In search problems, the sequence of actions required for solving a problem cannot be known a priori from the system’s viewpoint but must be determined by a trial-and-error exploration of alternatives.  Search is almost the most important thing in AI.

 

Three classes: (1) path-finding problems, (2) constraint satisfaction problems, and (3) two-player games.

 

Constraint Satisfaction

 

Definition of a CSP

 

A CSP is a problem to find a consistent value assignment variables that take their values from finite, discrete domains.

 

Formally, a CSP consists of n variables, , , …, , whose values are taken from finite, discrete domains , , …, , respectively, and a set of constraints on their values.

 

A constraint is defined by a predicate.  That is, the constraint  is a predicate that is defined on the Cartesian product .  This predicate is true iff the value assignment of these variables satisfies this constraint.

 

IMPORTANT

Since constraint satisfaction is NP-complete in general, a trial-and-error exploration of alternatives is inevitable.  (For example, given only four colors, is it possible to color all 49 contiguous U.S. states without two neighboring states sharing the same color?)

 

Assume that the variables of a CSP are distributed among agents.  Solving a CSP in which multiple agents are involved (a distributed CSP) can be considered as achieving coherence among the agents.  Many application problems in DAI such as interpretation, assignment, and multiagent truth maintenance tasks.

 

In the following algorithms, each process corresponds to a variable, and the processes act asynchronously to solve a CSP.  In addition, the following communication model is assumed:

·        Processes communicate by sending messages.  A process can send messages to other processes if and only if the process knows the addresses/identifiers of other processes.

·        The delay in delivering a message is finite, though random (may not be true).

·        For the transmission between any pair of processes, messages are received in the order in which they were sent (not always true).

 

Algorithms:  Filtering Algorithm, Hyper-Resolution-Based Consistency Algorithm, Asynchronous Backtracking Algorithm, Asynchronous Weak-Commitment Search

 

Filtering Algorithm

 

Filtering algorithm (Waltz 1975) is a type of consistency algorithms.  Filtering algorithm achieves 2-consistency (or arc-consistency): any variable value has at least one consistent value of another variable.

Text Box: Consistency algorithms can be classified by the notion of k-consistency. A CSP is k-consistent if and only if the following condition is satisfied:  Given any instantiation of any k-1 variables satisfying all the constraints among those variables, it is possible to find an instantiation of any kth variable such that these k variable values satisfy all the constraints among them.

In this algorithm, each process communicates its domain to its neighbors and then removes values that cannot satisfy constraints from its domain.  More specifically, a process  performs revise for each neighboring process . If some value of the domain is removed by performing the revise procedure, then  sends the new domain to neighboring processes.  If it receives a new domain from a neighboring process, then it calls the revise procedure again.

 

Filtering algorithm cannot solve a problem alone, but can reduce work.  So, it is often used as pre-processing.

 

Text Box: A k-consistency algorithm transforms a given problem into an equivalent k-consistent problem.  If the problem is k-consistent and j-consistent and j < k, the problem is called strongly k-consistent.  If there are n variables in a CSP and the CSP is strongly n-consistent, then a solution can be obtained immediately without any trial-and-error exploration, since for any instantiation of k-1 variables, we can always find at least one consistent value for k-th variables.Hyper-Resolution-Based Consistency Algorithm

 

In this algorithm (de Kleer 1989), all constraints are represented as a nogood, which is a prohibited combination of variable values.  For example, in the graph-coloring problem, a constraint between  and  can be represented as two nogoods { = red,  = red} and { = red,  = red}.

 

CAUTION

The hyper-resolution rule can generate a very large number of nogoods.  If we restrict the application of the rules so that only nogoods whose lengths (the length of a nogood is the number of variables that constitute the nogood) are less than k are produced, the problem becomes strongly k-consistent.

 

Asynchronous Backtracking

 

The asynchronous backtracking algorithm (Yokoo et al. 1992) is an asynchronous version of a backtracking algorithm, which is a standard method for solving CSPs:  Pick one alternative, and then move on.  If it doesn’t work anymore, backtrack one step, and then pick another alternative, and so on.   In this algorithm, the priority order of variables/processes is determined, and each process communicates its tentative value assignment to neighboring processes.  The priority order is simply arbitrary but consistent: alphabetical order of the variable identifiers. 

 

IMPORTANT

One limitation of the asynchronous backtracking algorithm is that the process/variable ordering is statically determined.  If the value selection of a higher priority process is bad, the lower priority processes need to perform an exhaustive search to revise the bad decision. 

 

Asynchronous Weak-Commitment Search

 

The Asynchronous Weak-Commitment Search (AWCS) algorithm (Yokoo 1995) that introduces a method for dynamically ordering processes so that a bad decision can be revised without an exhaustive search. 

 

Here we use heuristics such the min-conflict heuristic:  when a variable value is to be selected, a value that minimizes the number of constraint violations with other variables is preferred.  Although this heuristic has been found to be very effective, it cannot completely avoid bad decisions.   A priority value is thus determined for each variable, and the priority order among processes is determined using the priority values by the following rules.

 

Differences between AWCS and ABA:

·        The priority order is determine d using the communicated priority values (dynamic).  IMPORTANT:  If the current value is not consistent with the local_view (i.e., some constraint with variables of higher priority processes is not satisfied), the agent changes its value using the min-conflict heuristic, i.e., it selects a value that is not only consistent with the local view, but also minimizes the number of constraint violations with variables of lower priority processes.

·        When  cannot find a consistent value with its local_view,  sends nogood messages to other processes, and increments its priority value.  (This is a pretty self-confident agent!)  IMPORTANT:  If  cannot resolve a new nogood,  will not change its priority value but will wait for the next message.  This procedure is needed to guarantee the completeness of the algorithm.  Just in case.  Wait until new changes come around.

 

Path-Finding Problem

 

Definition

 

A path-finding problem consists of (1) a set of nodes N, each representing a state, (2) a set of directed links L, each representing an operator available to a problem solving agent given a particular state.  There is a unique node s called the start node or the initial node.  There exists a set of nodes G, each representing a goal state.

 

For each link, the weight of the link is defined, representing the costs of applying the operator.  This weight is sometimes known as the distance between the nodes.

 

If a node is directly linked from node i, then that node is a neighbor of i.

 

Algorithms: Asynchronous Dynamic Programming, Learning Real-Time A* Algorithm, Real-Time A* Algorithm, Moving Target Search Algorithm, Real-Time Bidirectional Search Algorithms, Real-Time Multiagent Search Algorithms.

 

Asynchronous Dynamic Programming

 

In a path-finding problem, the principle of optimality holds.  In short, the principle of optimality states that a path is optimal if and every if every segment of it is optimal.

 

The shortest distance from node i to goal nodes as .  From the principle of optimality, the shortest distance via neighboring node j is given by  where  is the cost of the link between i,j.  If node i is not a goal node, the path to a goal node must visit one of the neighboring nodes.  Therefore,  holds.  (So, actually this equation says that the optimal path has to look ahead, can’t be based locally!)

 

If  is given for each node, the optimal path can be obtained by repeating the following procedure:  For each neighboring node j of the current node i, compute .  Then, move to the j that gives .

Asynchronous dynamic programming (Bertsekas 1982) computes  by repeating the local computations of each node:

(1)   For each node i, there exists a process corresponding to i.

(2)   Each process records , which is the estimated value of .  (Heuristic!!!  h is a heuristic function; it is called admissible if it never overestimates.)  The initial value of  is arbitrary except for goal nodes.

(3)   For each goal node g,  is 0.

(4)   Each process can refer to h values of neighboring nodes (via shared memory or message passing)

 

In the above situation, each process updates  by:

 

For each neighboring node j, compute , where  is the current estimated distance from j to a goal node, and  is the cost of the link from i to j.  Then, update  as follows: .

Disadvantages: In reality, we cannot use asynchronous dynamic programming for a reasonably large path-finding problem.  The number of nodes can be huge, and we cannot afford to have processes for all nodes. 

 

Learning Real-Time A*

 

How do an agent choose which nodes to perform local computations for?

 

One way is to choose the current node where the agent is located.  First, the agent updates the h value of the current node, and then moves to the best neighboring node.  This procedure is repeated until the agent reaches a goal state.  This method is called the Learning Real-Time A* (LRTA*) algorithm (Korf 1990).

 

Real-Time A*

 

Real-Time A* (RTA*) updates the value of  in a different way from LRTA*.  Instead of setting  to the smallest value of  for all neighbors j, the second smallest value is assigned to  (but which j?  TYPO in the book.)  Thus, is this more aggressive?  RTA* learns more efficiently than LRTA* but can overestimate heuristic costs; sort of to force the system to have second thoughts.

 

Moving Target Search

 

IMPORTANT

Heuristics searches like above assume that the goal state is fixed and does not change during the course of the search.

 

MTS, a generalization of LRTS* to the case where the target can move.  MTS must acquire heuristic information for each target location.  Thus, MTS maintains a matrix of heuristic values, representing the function  for all pairs of states x and y.  Conceptually, all heuristic values are read from this matrix, which is initialized to the values returned by the static evaluation function.  Over the course of the search, these heuristic values are updated to improve their accuracy. 

 

There are now two movements: the problem solver moving from node to node, and the target goal moving.  The task is accomplished when the problem solver arrives at the same node as the target.

 

Real-Time Bidirectional Search

 

Bidirectional search basically says in addition to me starting from the initial state, you start from the goal state.  Hopefully we get to find the path sooner that way.  For example, in movies, a couple running towards each other beats out one standing still, one running! 

 

If two robots are to be brought to meet each other, how can they do this efficiently?  Should they negotiate their actions, or make decisions independently?  Is the two-robot organization really superior to a single robot one?

Text Box: 	Why is RTBS efficient for n-puzzles but not for mazes?
 
RTBS is different from unidirectional search in their problem spaces!  Let x and y be the locations of 2 problem solvers.  We call a pair of locations  a p-state, and the problem space consisting of p-states a combined problem space.  When the number of states in the original problem space is n, the number of p-states in the combined problem space becomes .
 
Let i and g be initial and goal states; then  becomes the initial p-state in the combined problem space (assuming only one goal state!).  The goal p-state requires both problem solvers to share the same location.  Thus, there are n locations where they can meet, thus, there are n goal p-states!
 
The performance of real-time search is sensitive to the topography of the problem space, especially to heuristic depressions, i.e., a set of connected states with heuristic values less than or equal to those of the set of immediate and completely surrounding states (this is because, in real-time search, erroneous decisions seriously affect the consequent problem solving behavior, so we have to be cautious.)  If heuristic depressions are shallow, the problem is easier to solve.  If heuristic depressions are deep, then the problem is harder to solve.  (Deep heuristic depressions mean we do not understand the topography well, or we cannot afford to be aggressive!)
 
In n-puzzles, heuristic depressions are shallow.  In mazes, they are deep. (Do you know why?)

In RTBS, two problem solvers start from the initial and goal states move toward each other.  The coordination cost is expected to be limited within some constant time. 

 

Think about the Marco Polo game!

 

Centralized RTBS where the best action is selected from among all possible moves of the two problem solvers, and then make the move.  Decoupled RTBS where the two problem solvers independently make their own decisions.

 

The evaluation results show that, in clear situations (i.e., heuristic functions return accurate values), decoupled RTBS performs better than centralized RTBS, while in uncertain situations (i.e., heuristic functions return inaccurate values), the latter becomes more efficient.  “Yes, this makes sense: if my sensing is good, then decoupled RTBS is better.  But if my sensing is not good, I better incorporate both, then centralized is better. Not just faster, but the path is shorter too!!”

 

Real-Time Multiagent Search

 

If there exist multiple agents, how can these agents cooperatively solve a problem?  Organization!  For example, multiple agents share the same problem space with a single fixed goal.  Each agent executes the LRTA* algorithm independently, but they shared the updated h values.  In this case, when one of the agents reaches the goal, the objective of the agents as a whole is satisfied.

 

To see how this organization is efficient, we look at the (1) effects of sharing experiences among agents (efficiency), and (2) effects of autonomous decision making (more robust)

 

Two-Player Games

 

Formalization of Two-Player Games

 

We use a game tree to show the moves of 2 players.  The player who plays first is MAX, and the opponent is MIN.  A node that shows MAX’s turn is a MAX node, and the other is MIN node.  There is a unique initial node called a root node.  If a node m can be reached in a single move from a node n, then we say that m is a child node of n; and n is a parent of m.  Any nodes that reach n after a sequence of moves are the ancestors of n.


Now, how do we traverse this MIN-MAX tree?

 

Minimax Procedure

 

In the minimax procedure, we first generate a part of the game tree, (2) evaluate the merit of the nodes on the search frontier using a static evaluation function, (3) use these values to estimate the merit of ancestor nodes.

 

The key is in the evaluation function:  A node favorable to MAX has a large evaluation value, a node favorable to MIN has a small evaluation value.  So, we can assume that MAX will choose the move that leads to the node with the max evaluation value, while MIN will do the opposite.

 

(1)   The evaluation value of a MAX node is equal to the maximum value of any of its child nodes.

(2)   The evaluation value of a MIN node is equal to the minimum value of any of its child nodes.

 

I will assume that you will pick the worst move for me.  Then I will pick the best move out of that.  This is the logic.

 

Alpha-Beta Pruning

 

Used to speed up Minimax without any loss of information.  This algorithm prunes a part of a tree that cannot influence the evaluation value of the root node.  More specifically, for each node, the following node is recorded and updated:

 

·         value: represents the lower bound of the evaluation value of a MAX node (how well can you do?

·        * value: represents the upper bound of the evaluation value of a MIN node (how bad can you do?)

 

While visiting nodes in a game tree from the root node by a depth-first order to a certain depth, these values are updated by the following rules:

 

·        The  value of a MAX node is the maximum value of any of its child nodes visited so far.

·        The * value of a MIN node is the minimum value of any of its child nodes visited so far.

 

IMPORTANT:

We can prune a part of the tree if one of the following conditions is satisfied:  (An elegant algorithm!)

·        -cut:  If the * value of a MIN node is smaller than or equal to the maximum  value of its ancestor MAX nodes, we can use the * value as the evaluation value of the MIN node, and can prune a part of the search tree under the MIN node.  In other words, the MAX player never chooses a move that leads to the MIN node, since there exists a better move for the MAX player.

·        *-cut:  If the  value of a MAX node is larger than or equal to the minimum * value of its ancestor MIN nodes, we can use the  value as the evaluation value of the MAX node, and can prune a part of the search tree under the MAX node.  In other words, the MIN player never chooses a move that leads to the MAX node, since there exists a better move for the MIN player.

 

IMPORTANT:  But the propagation upwards and downwards cost time.  Must remember this when we consider alpha-beta pruning.  The algorithm is quite straightforward to determine when to stop.

 

References

 

Bertsekas, D. P. (1982).  Distributed Dynamic Programming, IEEE Trans. Automatic Control, AC-27(3):610-616.

de Kleer, J. (1989).  A Comparison of ATMS and CSP Techniques, in Proceedings of the 11 International Joint Conference on Artificial Intelligence, 290-296.

Korf, R. E. (1990).  Real-Time Heuristic Search, Artificial Intelligence, 42(2-3):189-211.

Lesser, V. R. and D. D. Corkill (1981).  Functionally Accurate, Cooperative Distributed Systems, IEEE Transactions on Systems, Man, and Cybernetics, 11(1):81-96.

Waltz, D. (1975).  Understanding Line Drawing Scenes with Shadows, in P. Winston (ed.) The Psychology of Computer Vision, 19-91, McGraw-Hill.

Yokoo, M. (1995).  Asynchronous Weak-Commitment Search for Solving Distributed Constraint Satisfaction Problems, in Proceedings of the 1st International Conference on Principles and Priactice of Constraint Programming, 88-102.

Yokoo, M., E. H. Durfee, T. Ishida, and K. Kuwabara (1992).  Distributed Constraint Satisfaction for Formalizing Distributed Problem Solving, in Proceedings of the 12th IEEE International Conference on Distributed Computing Systems, 612-621.