Dict and Graph Search

In this reading, we'll do two things:

  1. see how we could implement a class that works the same as dict (maps keys to values)
  2. see how we can search more general graphs (that may have cycles)

You can download the .ipynb form of this reading here.

Review

Carefully read the following code. It's similar to the BST (binary search tree) implementation in the last reading, except that add and contains have been turned into methods in Node (instead of just functions).

Part 1: BST-based Dictionary

The above BST (binary search tree) is basically acting like a set. We can add values and efficiently check whether certain values have already been added.

What if we want to make it act like a dict? There are a couple things we should do:

  1. have both key and val attributes in Node
  2. the sort order should be based on key, not val
  3. we should implement __getitem__ and __setitem__

Let's do it:

Our BST-based dictionary may have some advantages over the regular dict. Specifically, which features will be easier to add to our version?

  1. getting the values corresponding to all keys in some range
  2. providing a way to loop over the keys in ascending order
  3. finding the key corresponding to a specific value
Answers (1) yes. Substrees correspond to ranges of keys, so we can efficiently search for all keys in a range (not possible with a regular dict).
(2) yes. We would need to write some tricky code with generators, but the data is already arranged in a sorted order.
(3) no. The same value may appear multiple times anywhere in the tree. There is no way to efficiently find all nodes with a given value without checking all the nodes.

Checking whether a tree has a value is an example of a search problem.

What if we want to search a more general graph for a particular value, from a particular starting node? There are many use cases for this:

  1. looking for a route between two cities
  2. looking through different move choices in a game (each board state is a node; an edge represents a legal move)
  3. cycle detection, perhaps representing incoherent preferences we want to identify (do people claim to like chocolate more than vanilla, vanilla more than strawberry, and strawberry more than chocolate?)

We can force a loop with our Node objects, complicating the search problem.

It worked there, but we're going to have a problem if we look for something we can't find. Searching A will make us search B. Searching B will make us search A.

This will cause infinite recusion:

Graph Search, Done Right

To avoid getting trapped in cycles, we'll need to keep track of which nodes we have visited in our search and make sure we don't visit them more than once.

Let's write a Node class for a general directed graph (each node can have any other node as a child) and a Graph class that can keep graph-wide state, like a set of nodes that have been visited during a recursive search.

Challenge: can you modify the find method, so that instead of returning False or True, it returns None (could not find from the starting point) or a list (indicating the path from the src to the dst)?

Conclusion

There are many ways to search through graph structures to find specific nodes/values. In this reading, we learned two new techniques:

  1. if the graph is a binary search tree, we can efficiently find what we want, without checking each node. This is because we know an upper/lower bound on a subtree, so we can skip it if it can't contain what we want.
  2. if a graph has cycles, it's easy to write search code that results in infinite recursion. The trick is to keep a set of visited nodes, to make sure we don't keep searching over the same ones infinitely.