CSCE 475/875
Handout 9: Search Algorithms
for Agents
October 1, 2007
This handout is based on Chapter 4 of G. Weiss, (Ed.), Multiagent
Systems: A Modern Approach to Distributed Artificial Intelligence, MIT
Press, 1999.
Search algorithms that are useful for problem solving by multiple agents.
In search problems, the sequence of actions required for solving a problem cannot be known a priori from the system’s viewpoint but must be determined by a trial-and-error exploration of alternatives. Search is almost the most important thing in AI.
Three classes: (1) path-finding problems, (2) constraint satisfaction problems, and (3) two-player games.
Constraint Satisfaction
Definition of a CSP
A CSP is a problem to find a consistent value assignment variables that take their values from finite, discrete domains.
Formally, a CSP consists of n variables,
,
, …,
, whose values are taken from finite, discrete domains
,
, …,
, respectively, and a set of constraints on their values.
A constraint is defined by a predicate. That is, the constraint
is a predicate that is
defined on the Cartesian product
. This predicate is
true iff the value assignment of these variables satisfies this constraint.
Since constraint satisfaction is NP-complete in general, a
trial-and-error exploration of alternatives is inevitable. (For example, given only four colors, is it
possible to color all 49 contiguous
Assume that the variables of a CSP are distributed among agents. Solving a CSP in which multiple agents are involved (a distributed CSP) can be considered as achieving coherence among the agents. Many application problems in DAI such as interpretation, assignment, and multiagent truth maintenance tasks.
In the following algorithms, each process corresponds to a variable, and the processes act asynchronously to solve a CSP. In addition, the following communication model is assumed:
· Processes communicate by sending messages. A process can send messages to other processes if and only if the process knows the addresses/identifiers of other processes.
· The delay in delivering a message is finite, though random (may not be true).
· For the transmission between any pair of processes, messages are received in the order in which they were sent (not always true).
Algorithms: Filtering Algorithm, Hyper-Resolution-Based
Consistency Algorithm, Asynchronous Backtracking Algorithm, Asynchronous
Weak-Commitment Search
Filtering Algorithm
Filtering algorithm (Waltz 1975) is a type of consistency algorithms. Filtering algorithm achieves 2-consistency (or arc-consistency): any variable value has at least one consistent value of another variable.

In this algorithm, each process communicates its domain
to its neighbors and then removes values that cannot satisfy constraints from
its domain. More specifically, a
process
performs revise
for each neighboring process
. If some value of the domain is removed by performing the
revise procedure, then
sends the new
domain to neighboring processes. If it
receives a new domain from a neighboring process, then it calls the revise
procedure again.
Filtering algorithm cannot solve a problem alone, but can reduce work. So, it is often used as pre-processing.
Hyper-Resolution-Based Consistency
Algorithm
In this algorithm (de Kleer
1989), all constraints are represented as a nogood, which is a prohibited
combination of variable values. For
example, in the graph-coloring problem, a constraint between
and
can be represented as
two nogoods {
= red,
= red} and {
= red,
= red}.
The hyper-resolution rule can generate a very large number of nogoods. If we restrict the application of the rules so that only nogoods whose lengths (the length of a nogood is the number of variables that constitute the nogood) are less than k are produced, the problem becomes strongly k-consistent.
Asynchronous Backtracking
The asynchronous backtracking algorithm (Yokoo et al. 1992) is an asynchronous version of a backtracking algorithm, which is a standard method for solving CSPs: Pick one alternative, and then move on. If it doesn’t work anymore, backtrack one step, and then pick another alternative, and so on. In this algorithm, the priority order of variables/processes is determined, and each process communicates its tentative value assignment to neighboring processes. The priority order is simply arbitrary but consistent: alphabetical order of the variable identifiers.
One limitation of the asynchronous backtracking algorithm is that the process/variable ordering is statically determined. If the value selection of a higher priority process is bad, the lower priority processes need to perform an exhaustive search to revise the bad decision.
Asynchronous
Weak-Commitment Search
The Asynchronous Weak-Commitment Search (AWCS) algorithm (Yokoo 1995) that introduces a method for dynamically ordering processes so that a bad decision can be revised without an exhaustive search.
Here we use heuristics such the min-conflict heuristic: when a variable value is to be selected, a value that minimizes the number of constraint violations with other variables is preferred. Although this heuristic has been found to be very effective, it cannot completely avoid bad decisions. A priority value is thus determined for each variable, and the priority order among processes is determined using the priority values by the following rules.
Differences
between AWCS and
· The priority order is determine d using the communicated priority values (dynamic). IMPORTANT: If the current value is not consistent with the local_view (i.e., some constraint with variables of higher priority processes is not satisfied), the agent changes its value using the min-conflict heuristic, i.e., it selects a value that is not only consistent with the local view, but also minimizes the number of constraint violations with variables of lower priority processes.
·
When
cannot find a
consistent value with its local_view,
sends nogood
messages to other processes, and increments its priority value. (This is a pretty self-confident agent!) IMPORTANT: If
cannot resolve a new
nogood,
will not change its
priority value but will wait for the next message. This procedure is needed to guarantee the
completeness of the algorithm. Just in
case. Wait until new changes come
around.
Definition
A path-finding problem consists of (1) a set of nodes N, each representing a state, (2) a set of directed links L, each representing an operator available to a problem solving agent given a particular state. There is a unique node s called the start node or the initial node. There exists a set of nodes G, each representing a goal state.
For each link, the weight of the link is defined, representing the costs of applying the operator. This weight is sometimes known as the distance between the nodes.
If a node is
directly linked from node i, then that node is a neighbor of i.
Algorithms: Asynchronous Dynamic Programming, Learning Real-Time A* Algorithm, Real-Time A* Algorithm, Moving Target Search Algorithm, Real-Time Bidirectional Search Algorithms, Real-Time Multiagent Search Algorithms.
Asynchronous Dynamic
Programming
In a path-finding problem, the principle of optimality holds. In short, the principle of optimality states that a path is optimal if and every if every segment of it is optimal.
The shortest
distance from node i to goal nodes as
. From the principle
of optimality, the shortest distance via neighboring node j is given by
where
is the cost of the
link between i,j. If node i
is not a goal node, the path to a goal node must visit one of the neighboring
nodes. Therefore,
holds. (So, actually this
equation says that the optimal path has to look ahead, can’t be based locally!)
If
is given for each
node, the optimal path can be obtained by repeating the following
procedure: For each neighboring node j
of the current node i, compute
. Then, move to the j
that gives
.
Asynchronous
dynamic programming (Bertsekas 1982) computes
by repeating the local
computations of each node:
(1) For each node i, there exists a process corresponding to i.
(2)
Each process records
, which is the estimated value of
. (Heuristic!!! h is a heuristic function; it is
called admissible if it never overestimates.)
The initial value of
is arbitrary except
for goal nodes.
(3)
For each goal node g,
is 0.
(4) Each process can refer to h values of neighboring nodes (via shared memory or message passing)
In the above
situation, each process updates
by:
For each
neighboring node j, compute
, where
is the current
estimated distance from j to a goal node, and
is the cost of the
link from i to j. Then,
update
as follows:
.
Disadvantages: In reality, we cannot use asynchronous dynamic programming for a reasonably large path-finding problem. The number of nodes can be huge, and we cannot afford to have processes for all nodes.
Learning Real-Time
A*
How do an agent choose which nodes to perform local computations for?
One way is to choose the current node where the agent is located. First, the agent updates the h value of the current node, and then moves to the best neighboring node. This procedure is repeated until the agent reaches a goal state. This method is called the Learning Real-Time A* (LRTA*) algorithm (Korf 1990).
Real-Time A*
Real-Time A* (RTA*) updates the value of
in a different way
from LRTA*. Instead of setting
to the smallest value
of
for all neighbors j,
the second smallest value is assigned to
(but which j? TYPO in the book.) Thus, is this more aggressive? RTA* learns more efficiently than LRTA* but
can overestimate heuristic costs; sort of to force the system to have second
thoughts.
Moving Target Search
Heuristics searches like above assume that the goal state is fixed and does not change during the course of the search.
MTS, a generalization of LRTS* to the case where the target
can move. MTS must acquire heuristic
information for each target location.
Thus, MTS maintains a matrix of heuristic values, representing the
function
for all pairs of
states x and y.
Conceptually, all heuristic values are read from this matrix, which is
initialized to the values returned by the static evaluation function. Over the course of the search, these
heuristic values are updated to improve their accuracy.
There are now two movements: the problem solver moving from node to node, and the target goal moving. The task is accomplished when the problem solver arrives at the same node as the target.
Real-Time Bidirectional Search
Bidirectional search basically says in addition to me starting from the initial state, you start from the goal state. Hopefully we get to find the path sooner that way. For example, in movies, a couple running towards each other beats out one standing still, one running!
If two robots are to be brought to meet each other, how can they do this efficiently? Should they negotiate their actions, or make decisions independently? Is the two-robot organization really superior to a single robot one?

In RTBS, two problem solvers start from the initial and goal states move toward each other. The coordination cost is expected to be limited within some constant time.
Think about the Marco Polo game!
Centralized RTBS where the best action is selected from among all possible moves of the two problem solvers, and then make the move. Decoupled RTBS where the two problem solvers independently make their own decisions.
The evaluation results show that, in clear situations (i.e., heuristic functions return accurate values), decoupled RTBS performs better than centralized RTBS, while in uncertain situations (i.e., heuristic functions return inaccurate values), the latter becomes more efficient. “Yes, this makes sense: if my sensing is good, then decoupled RTBS is better. But if my sensing is not good, I better incorporate both, then centralized is better. Not just faster, but the path is shorter too!!”
Real-Time Multiagent Search
If there exist multiple agents, how can these agents cooperatively solve a problem? Organization! For example, multiple agents share the same problem space with a single fixed goal. Each agent executes the LRTA* algorithm independently, but they shared the updated h values. In this case, when one of the agents reaches the goal, the objective of the agents as a whole is satisfied.
To see how this organization is efficient, we look at the (1) effects of sharing experiences among agents (efficiency), and (2) effects of autonomous decision making (more robust)
Formalization of Two-Player Games
We use a game tree to show the moves of 2 players. The player who plays first is MAX, and the opponent is MIN. A node that shows MAX’s turn is a MAX node, and the other is MIN node. There is a unique initial node called a root node. If a node m can be reached in a single move from a node n, then we say that m is a child node of n; and n is a parent of m. Any nodes that reach n after a sequence of moves are the ancestors of n.
Now, how do we traverse this MIN-MAX tree?
Minimax Procedure
In the minimax procedure, we first generate a part of the game tree, (2) evaluate the merit of the nodes on the search frontier using a static evaluation function, (3) use these values to estimate the merit of ancestor nodes.
The key is in the evaluation function: A node favorable to MAX has a large evaluation value, a node favorable to MIN has a small evaluation value. So, we can assume that MAX will choose the move that leads to the node with the max evaluation value, while MIN will do the opposite.
(1) The evaluation value of a MAX node is equal to the maximum value of any of its child nodes.
(2) The evaluation value of a MIN node is equal to the minimum value of any of its child nodes.
I will assume that you will pick the worst move for me. Then I will pick the best move out of that. This is the logic.
Used to speed up Minimax without any loss of information. This algorithm prunes a part of a tree that cannot influence the evaluation value of the root node. More specifically, for each node, the following node is recorded and updated:
·
value: represents
the lower bound of the evaluation value of a MAX node (how well can you do?
·
value: represents
the upper bound of the evaluation value of a MIN node (how bad can you do?)
While visiting nodes in a game tree from the root node by a depth-first order to a certain depth, these values are updated by the following rules:
·
The
value of a MAX node is
the maximum value of any of its child nodes visited so far.
·
The
value of a MIN node is
the minimum value of any of its child nodes visited so far.
IMPORTANT:
We can prune a part of the tree if one of the following conditions is satisfied: (An elegant algorithm!)
·
-cut: If the
value of a MIN node is
smaller than or equal to the maximum
value of its ancestor
MAX nodes, we can use the
value as the
evaluation value of the MIN node, and can prune a part of the search tree under
the MIN node. In other words, the MAX
player never chooses a move that leads to the MIN node, since there exists a
better move for the MAX player.
·
-cut: If the
value of a MAX node is
larger than or equal to the minimum
value of its ancestor
MIN nodes, we can use the
value as the
evaluation value of the MAX node, and can prune a part of the search tree under
the MAX node. In other words, the MIN
player never chooses a move that leads to the MAX node, since there exists a
better move for the MIN player.
IMPORTANT: But the propagation upwards and downwards cost time. Must remember this when we consider alpha-beta pruning. The algorithm is quite straightforward to determine when to stop.
Bertsekas, D. P. (1982). Distributed Dynamic Programming, IEEE Trans. Automatic Control, AC-27(3):610-616.
de Kleer, J. (1989). A Comparison of ATMS and CSP Techniques, in Proceedings of the 11 International Joint Conference on Artificial Intelligence, 290-296.
Korf, R. E. (1990). Real-Time Heuristic Search, Artificial Intelligence, 42(2-3):189-211.
Lesser, V. R.
and
Waltz, D. (1975). Understanding Line Drawing Scenes with Shadows, in P. Winston (ed.) The Psychology of Computer Vision, 19-91, McGraw-Hill.
Yokoo, M. (1995). Asynchronous Weak-Commitment Search for Solving Distributed Constraint Satisfaction Problems, in Proceedings of the 1st International Conference on Principles and Priactice of Constraint Programming, 88-102.
Yokoo, M., E. H. Durfee, T. Ishida, and K. Kuwabara (1992). Distributed Constraint Satisfaction for Formalizing Distributed Problem Solving, in Proceedings of the 12th IEEE International Conference on Distributed Computing Systems, 612-621.