Search Paths

In the last reading, we learned about how to check whether there is a path from node A to node B.

In this reading, we'll do additional things:

  1. return the path (a list of nodes) found instead of just True/False
  2. learn BFS, an alternative to DFS that is guaranteed to find the shortest path

Review

Let's revisit the graph search from the end of the last reading (only change is addition of __repr__):

Getting the Path

Rather than just determine if there is a path, let's output what it is. We'll modify the find method. If there's a path, it should return it as a list (or tuple!) of nodes traversed. If there is not a path, it should return None.

The logic works like this: if the X node is a child of the W node, and we've found an (X,Y,Z) path, then (W,X,Y,Z) is also a valid path. On line 11, we're using tuple concatenation to create a new tuple based on the one the child found, but with the parent added to the beginning.

Multiple Paths

What if there are multiple paths? Which will we return?

Nice, it found the shortest one! But what if we build the same graphs, but adding the edges in a different order?

Yuck! If Google Maps gave me these directions for getting from city A to E, then I might just switch to Apple Maps.

The search pattern we're using is known as "depth first search", or DFS. It explores one child completely (including all that child's descendents) before checking if the other children know a path.

The alternative to DFS is BFS, or "breadth first search". This algorithm will explore all children before exploring grandchildren, before exploring great grandchildren, etc.

The algorithm for BFS is not recursive, but many find it less intuitive that recursive DFS. The strategy is to keep a "TODO" list of all nodes that need to be visited. The list is used as a queue, meaning the first nodes added to the list will be the first ones visited. When a child is explored, we generally discover grandchildren, but rather than explore those grandchildren immediately (as in DFS), we add those grandchildren to the end of the TODO queue, to be processed eventually after the other children.

Let's try it:

Cool! We're searching one level at a time. But did you notice we lost some functionality? We aren't keeping track of how we got to the final node. There are often multiple ways to reach a node, so we should add a back attribute to backtrack how we got there.

Let's make a tougher test case, with the following:

  1. cycles
  2. short and long paths to the same target

Now there are two A-to-Z paths:

We ought to find the second one!

Just what we wanted.

Conclusion

We have lots of experience using data structures like lists and dicts to keep track of data. But data structures can also be used to keep track work that a program needs to do. This strategy is used by BFS and many other systems.

Although more complicated to implement, BFS guarantees the shortest path will be found (in contrast to DFS). Of course, significant memory might be required to maintain the queue of nodes to visit if the graph is large. Depending on the shape of the graph, this memory expense may be greater or less than the memory needed for the stack frames necessarly during a recursive depth-first search.