CS8001 - ODA Notes
Problem Sets - what to expect
- The questions will be quite similar to the reality checks being made up primarily of multiple choice and code snippets.
- There's a good bit of questions ranging between 1 to 2 dozen questions.
- The questions will show whether they are correct or incorrect
Table of Contents
M3 - Stacks/Queue ADTs
Priority Queues
We will very briefly discuss a tangentially related concept to Queues, called the Priority Queue. The topic is placed here to help broaden you awareness of the ideas that can emerge from the topic. This is an important growth mindset technique to think about data structures and behavior in order to create different structures. We will not be discussing the typical implementation of them until later in Module 6
Priority Queues: A Linear Model?
In the video, we briefly mentioned that Priority Queues are a linear ADT. This is partially true since we can think of the data being placed linearly in sorted order based on priority. For queues, this was precisely the case, except that the priority was the order in which data was enqueued. However, in practice, a linear model doesn't work very well for implementing Priority Queues. Let's consider the idea of implementing a Priority Queue by maintaining a sorted list. We have two options to consider, an array-backed sorted list or a linked sorted list.
For the array-backed version, it would similar to an ArrayQueue where the element with the highest priority could be at the front, while the element with lowest priority could be at the back. With this model, dequeuing and peeking would be 𝑂(1) operations like in the ArrayQueue. However, in general, enqueuing would be an 𝑂(𝑛) procedure, even if we consider amortizing the resize cost. This is because if we wanted to enqueue, we'd need to maintain the sorted order of the data, so we'd need to shift data around to place it. There are other variations we could use such as flipping the priority order and keeping the data zero-aligned, but they all lead us to similar conclusions.
For the linked version, it would again be similar to a LinkedQueue where the element with highest priority would be at the head while the element with lowest priority would be at the tail. Once again, peeking and dequeuing operations are 𝑂(1), but enqueuing would be an 𝑂(𝑛) operation since we need to find the location to insert the data.
So, with our current range of ideas, it appears that implementing a Priority Queue efficiently is out of reach. We will begin developing the ideas that we do need starting with Module 6, in the near future!
Deques (double-ended queue “deck”)
We end the section by discussing a natural extension of both Stacks and Queues, the Deque ADT. Rather than specializing in a specific add and remove behavior like Stacks and Queues, Deques support add and remove operations from either end of the data. The techniques involved in implementing the deque have already been discussed, so we will simply upcycle the ideas to show how it can be done.
Found in online purchasing system.
Java's % Operator
CSVIS
M4 - BST Introduction
Introduction to Tree ADT
- Stacks, queues and deques are limited by their ADT operations (linear)
- Trees are a type of linked lists
- Trees are typically not bidirectional
- Node with children are internal nodes, nodes with no children are leaf nodes (or external nodes)
- No concept of front and back since they aren’t linear
Was that an ADT?
In the first few modules, we made a large distinction between ADTs and data structures. To be clear, ADTs are high-level descriptions of a model, whereas a data structure is the low-level, detail-oriented implementation of a model. Up until now, the distinction was stark because there were multiple implementations for each ADT, and it served as a nice way of introducing the concepts.
However, moving forward, we'll be much looser with that distinction. We do this for multiple reasons, however, the primary reason is that the distinction will more inhibit learning than helping it in the topics to come. Some of the upcoming topics can be difficult to grasp. We would prefer for you to focus on these topic complexities, rather than worry about the semantics of whether a Tree is an ADT or a data structure (it can be either depending on the language used). We will still distinguish between the ADTs in the diagrams shown in the module review sections, but they are not as important as they were in previous modules.
Binary Trees
- Each node has minimum 3 components, data and reference to its left / right children.
Binary Search Trees
- Left node must be smaller than parent node, right node must be greater than parent node.
- Big-O:
- Average: BST divides data in half perfectly at each level. Insertion + search + deletion are all O(log n).
- Worst: If BST doesn’t divide the data in half (ie same as linked list). In this case, i + s + d are O(n).
CSVIZ
Preorder Traversal
- There are three different binary search tree traversals that all share a common strategy in how they approach accessing every data.
- Each traversal is a variation of the same recursive strategy of traversing to the depths of each path.
- Traversals essentially accomplish what an Iterable/Iterator in Java does. All that is different is the ordering of the same actions.
- Preorder is a depth-first algorithm
- Useful when making an exact copy of a tree.
(when left mark is reached, record data)
Postorder Traversal
- Also depth-first
- Useful when removing leavings from a tree.
- recurses first before looking at the data, whereas preorder looks at data first.
(when right mark is reached, record data)
Inorder Traversal
- Depth-first
- Unique for BSTs since it returns a sorted list.
Levelorder Traversal
- Breadth-first traversal strategy
- There are other variations of this concept, but they are all similar to the levelorder traversal.
- Unlike depth traversals that utilize a Stack (the recursive stack), the levelorder traversal utilizes a Queue (with while loop) to access each data in an order sorted by closeness to the root, the level.
Choosing a Traversal
We have covered four different traversals for Binary Trees in this module. It is important to note that these traversal concepts can easily be extended to general trees. After all, three of the traversals are derived from simply changing up the order the "look at node" operation is performed with respect to the recursive calls. The procedures within each traversal are defined for Binary Trees. However, the properties of the traversals are exemplified when specifically examining the data returned from BSTs.
Let's take the time to highlight some of their uses and distinguishing properties. So, what makes each of these traversals unique?
- Preorder: The preorder traversal uniquely identifies a BST, when adding the data to an empty tree in the order given by this traversal. Although we haven't discussed how the add method works in BSTs, you can double check this feature of preorder traversal in the visualization tool. Most definitely, check this feature out after you've learned about the adding procedure. The preorder traversal is a hybrid depth approach that biases towards giving you data closer to the root faster, than data in leaf nodes.
- Inorder: The most notable property of the inorder traversal is that if implemented on a BST, the data is returned in a sorted order. However, please note that the inorder traversal is the only one of the four traversals that does not uniquely define the BST. If the traversal is operating on a general Binary Tree, then it will give you the data on a left-to-right basis (if the tree is properly spaced out when drawn).
- Postorder: The postorder traversal is similar to preorder in that it uniquely identifies the BST. One useful feature of the postorder traversal is that if you were to remove data from this ordering, you would only remove leaf nodes. This particular property will come in handy when you learn the remove procedure of a BST. The postorder traversal is a hybrid depth approach that biases towards giving you data in leaf nodes faster, than data closer to the root.
- Levelorder: This traversal is the black sheep in implementation, but it's also the easiest to understand conceptually. The levelorder traversal gives you the data in an ordering sorted based on the depth of the data. Thus, it is the fully realized version of the preorder traversal where it gives you full control of getting internal nodes before deeper leaves. As you might have guessed, it also uniquely defines a BST.
We hope that this helps you keep track of what each traversal does. But more importantly, we hope that you look at these four traversals and think about what else you can change up to get the behavior and ordering that you want. For example, you could try traversing right first before traversing left in an inorder fashion to get the list is descending order. The procedures we've given you are the bare bones skeletons, which you can start modifying and tailoring to your needs as they arise, so have some fun experimenting!
M5 - BST Operations
We are going to explore the power of the tree data structure, specifically Binary Search Trees, BSTs. We discuss the motivation for BSTs and how they relate to the binary search algorithm. We present efficient procedures for searching, adding, and removing in a BST. Similar to how binary search works, BSTs work by splitting the data into subtrees with about half of the data as the level above it, yielding running times on average.
At the end of the module, we introduce an example of a probabilistic data structure known as a SkipList at the end of the module. Though the basis of SkipLists and their structure is different from BSTs, they are actually very similar in how their search procedure works, which is why we've chosen to introduce them here.
Learning Objectives
- Extend their understanding of tree structures and their impact on efficient search operations.
- Learn efficient procedures for searching, adding, and removing from BSTs.
- Further their understanding of the pointer reinforcement restructuring recursion technique by applying it to the add and remove methods.
- Learn about a simple probabilistic data structure as a gateway to the world of randomization.
BST Searching
- If tree is sorted, then tree is cut by half at each height.
- Efficiency: Avg: O(log n), Worst: O(n), Best: O(1)
Best and Average Case Complexity Analysis
BST Adding
- Similar to BST search
- Use ‘pointer reinforcement’, different from ‘look-ahead’ which keeps track of parent instead of current node and never reaches a null node.
- Pointer reinforcement uses return field to restructure tree. Happens after recursive call.
Pointer Reinforcement Pseudocode
- Public wrapper + private helper method.
- make sure to return the current node at the end to reinforce the pointer.
BST Removing
- Must find element before removing.
- 3 types of removal (zero-child case, predecessor, successor, two child case)
- Easiest, just remove node if node to remove has no child. No restructuring needed.
- Removing node with one child, set pointer of parent node (13) to its child (8).
- Find “predecessor” of node to be deleted, and make that the parent of tree. This is the farthest right node on the left half of sub-tree.
- Find “successor” of node to be deleted, and make that the parent. This is the farthest left node on the right half of sub-tree.
- Resulting tree differs based on using either predecessor or successor strategy.
- The two
i
(public, private) are different variables
Pointer Reinforcement Remove Pseudocode
- Dummy node serves as storage data passed in recursive call.
- pointer reinforcement logic same as add
- 2 base cases: if node is null + is node data matches data to remove.
- If no children - return null - sets parent’s pointer to child as null
- If non-null - return child node
- If 2 children - make another call to remove successor (or predecessor)
- Get data from successor by making another recursive call to remove successor node and get data. Then set current node’s data to child node’s data.
END OF PROBLEM SET 1
M6 - Heaps
Let's look at the "cousin" to BSTs in the Binary Tree group, (binary) Heaps. Heaps are a binary tree-type data structure that prioritize access to either the smallest or largest item in the data structure depending on the heap type. In a sense, heaps are a specialist data structure most commonly coupled with the Priority Queue ADT. Whereas, BSTs are a generalist data structure useful for a lot of different operations.
An interesting caveat to heaps is that unlike BSTs, they are commonly implemented using backing arrays. There is an additional shape property constraint for heaps on top of the binary tree shape which lends itself to an array implementation. This allows us the opportunity to explore how an array implementation may work for a tree-type data structure. We'll explore this through three operations: adding, removing, and building a heap from a group of unordered data.
Learning Objectives
Students will:
- Learn about heaps, the most common data structure for implementing the Priority Queue ADT.
- Break down and understand efficient algorithms for adding, removing, and building a heap.
- Expand their knowledge of different tree types structures and orderings, along with the benefits (and difficulties) such constraints bring.
- Experience what it's like to back a tree-type data structure using an array rather than in a linked fashion.
Intro to (Binary) Heaps
Here we present an introduction to a new data structure, (binary) heaps. The term binary is in parentheses because there are multiple types of heaps, and we are discussing a particular heap. Binary heaps are the most common type of heap. Thus, when a heap is mentioned, most often it is usually referring to a binary heap.
Heaps are another binary tree type of data structure with the additional shape property of completeness. We will study this concept in this lesson. Completeness is a very nice property that lends itself to an array-implementation, which is usually faster at a low-level when compared to a tree node implementation. Heaps also have a different order property when compared to BSTs. The heap order property changes the way we need to look at basic operations such as add and remove.
- Heaps must be “complete” - filled left to right with no gaps.
Order Property
In a heap, the parent node is always larger (in the case of a maxheap) or smaller (in the case of a
minheap) than its children. There is no relationship between the children.
Shape Property
A heap is always complete. At any point, each level of the tree is completely filled, except for the
last level which may be filled from left to right with no gaps.
- Min Heap - Smallest element at the root
- Max heap - Largest element at the root
- Backing array doesn’t have any gaps
Heap Operations
- Not designed for search!
- Index 0 never used (makes math easier)
- Only consider add and remove.
- Key Idea: Enforce shape property, then order property. “Up-heaping”.
- The left child of an item in index i would then be at index 2i, and the right child would be at index 2i + 1.
- The parent of an item in index i would be at i/2(this is assuming you are doing integer division).
Adding
- Upheaping (ie. swapping nodes) continues up to root node.
Removing
- You can only remove from the root
- Remove the item at the root, and move the item in the last slot of the heap to the root. This might break the order property, but that is fine.
- Compare the item with the item in the left child and the item in the right child. If either the left child or the right child is smaller (in the case of a minheap) or larger (in the case of a maxheap), swap the item with the smaller/larger of the two children.
- Repeat the previous step until you don’t make a swap or you reach the bottom of the heap.
- Re-ordering takes the most time (like upheaping).
- order property is violated, so we perform a downheap (this is a max heap implementation)
- Continue until order property is satisfied.
Performance
New Operations? New Heaps - Food for Thought
Build Heap Algorithm
- Need to build from bottom up
- index size / 2 gives you the parent
- Start with last index divided by 2 (32). Swap parent with smaller child (32 with 5) and perform a downheap. Downheap looks at swapped child node, if child node is a leaf the subrouting terminates. If child exists, it continues the swapping.
- 71 is swapped with smallest data, continue downheap
- Swap smallest child with 71. Downheap is complete since 71 has a leaf node.
- Reach the root node, both children meet the ordering requirement. Terminate downheap.
- Time complexity is actually O(n)
- Leaf nodes have O(1) cost, and few of the higher nodes have O(log n) cost.
M7 - Hashmaps
Now, we take a look at HashMaps (HashTables), one of the most commonly used data structures in computer science. HashMaps are very useful if we want to search for a specific data in our data structure. This is helpful when we'd like to check for membership of data where we're using the search data as a unique identifier for some other data (in which case the searched data is called a key, and the implemented ADT is the Map ADT), and with the Set ADT.
Looking back at our toolbox, the best time complexity we have for this problem so far is on average 𝑂(log(𝑛)) via BSTs. Remarkably, we can do better, with 𝑂(1) search times on average! In this module, we'll be introducing the big idea of how we achieve this incredible efficiency, known as hashing. We won't be rigorously proving the average efficiency, but we hope to provide you with an understanding of the mechanisms of hashing, the flexibility such a technique brings, as well as some intuition for why the idea works.
Learning Objectives
Students will:
- Learn about the Set ADT and the Map ADT.
- Explore the idea of hash functions, along with how they're used in HashMaps to achieve
𝑂(1) average search times.
- Understand the challenges of collisions, a necessary consequence of how HashMaps work.
- Observe multiple collision resolution strategies at work: external chaining, linear probing, quadratic probing, and double hashing.
Introduction to HashMaps
- Note: backing array of 100
- Mod hashcode by 100 to get compress keys
- Collision may occur with modulo - this is inevitable, but there are ways around it
The Map and Set ADTs
Let's first recap what we know about the Map ADT. Maps are defined by multiple key-value pairs (denoted
<K,V>
), satisfying the following properties:
- Keys are unique and immutable, essentially allowing keys to act as search objects for their associated values.
- Values are tied to keys, but they do not need to be unique. There may be multiple key value pairs where the value is the same.
- The keys do not need to have any sort of ordering to them, though they are allowed to. If ordering is desired, then it becomes something called an Ordered Map, which we will not be delving into in this course.
Now, what operations do maps typically support? There are other operations such as returning the size or clearing the data structure that might be useful, but we omit them for brevity.
V put(K key, V value)
: This method adds the key-value pair to the map. If they key is already present in the map, then the value is replaced with the new one passed into the method, and the old value is returned. If the key is not a duplicate, then a null reference is returned.
V remove(K key)
: This method removes a key-value pair from the map and returns the value associated with the key.
V search(K key)
: This method looks for the key in the map and returns the associated value. A modification of this operation that is commonly used is to just check if a key is in the map (typically called
containsKey.
Set<K> keySet()
: This method returns a Set of all of the keys in the map. We will discuss what a set is below.
List<V> values()
: This method returns a list of all the values in the map. The ordering of the list is in no particular order.
Now, what exactly is a set? The Set ADT can be thought of as exactly what a Map is, except we only have keys as our data and no values. If you're familiar with what a set is in math, then you can think of them as much the same thing here, except that these sets are typed to hold only one type of data. Similarly, you may sometimes hear people say the term dictionary to refer to a Map; in particular, Python uses the term dictionary to refer to this idea in the language.
If you think back to some of our earlier data structures such as BSTs, you can think of them as implementing the Set ADT or the Map ADT. If we search in a BST to check for membership of data, then it's a Set. If we're looking up some identifier tied to other data (such as an id number tied to a user record), then it's a Map. For most of the examples in this module, we will omit the values and only look at the keys. This is because there is very little restriction imposed on the values, so any search algorithm will depend solely on the keys.
Spec'ing our HashMap
In the video in the previous section, we touched briefly on the idea of collisions, and as you progress through the module, you will find that collisions are the source of most inefficiencies in HashMaps. Most of our time will be spent on how to resolve collisions (collision resolution strategies), but the truth of the matter is that much of the work of a good HashMap occurs even before all of these collison strategies! The old adage "Prevention is the best medicine" applies here very well, so let's take a quick look at some things to consider when designing a HashMap.
Picking Hash Functions
The first thing we're going to look at is hash functions. Formally, a hash function is a function ℎ:𝐾→𝑍, which means a mapping from keys to integers. This step is necessary because our keys might not even be numerical like the phone number example from the video. For example, if our key is a String, then we need a function to convert from a String to an Integer.
As for the requirements of this function, we technically only require that if two keys, 𝑘1 and 𝑘2, are equal, then their hash function evaluations, ℎ(𝑘1) and ℎ(𝑘2) are equal. So, technically, the zero-function, where we map every possible key to 0 is a valid hash function. However, it's not a very good hash function. We want the reverse to be as close to true as possible as well: if the hash function evaluations are equal, then the keys are also equal. In some cases, this may be possible, but in many cases, it may not be possible without infinite memory (or may not be possible at all). After all, what good is a perfect hash function if we can't store it in memory?
Let's take a look at our example of converting String keys into Integers. One reasonable way to do this is to just assign each character a numerical value, then add those values up. For example, if we assigned the letter A the value of 1 and the letter B the value 2, then the string ABAB would have the hash value of 1+2+1+2=6. This is an okay hash function, but it has a weakness. It doesn't take the ordering of characters into account at all, so any string of length 4 with an equal number of A and B characters will collide into the same hash value. One way we can fix this is by taking a page out of our number system and multiplying these values by something based on their position before adding it up. For example, if we multiplied it by powers of 10 based on the position from right to left, then we get 1×103+2×102+1×101+2×100=1212.
This was just one example of a hash function from Strings to Integers, but there are plenty of other reasonable ones as well. In Java, hash functions are written in the hashcode()
method, which is why Java always tells you to override it when you override the equals()
method!
TLDR on .hashcode()
:
.hashcode()
returns an numeric representation of an object
.hashcode()
is an instance method of any object in Java
- In the ideal unrealistic scenario, every distinct object has a distinct hashcode value
- In the real scenario, some objects which are different have the same hashcode which is called a collision
- Objects that are equal by
.equals()
have the same hashcode
There are strategies to create a hashcode implementation that aims at minimizing collisions. To see more about it, take a look at the class below (credit for the file to our colleagues in CS 1331). This class has an implemented class Vehicle which includes the overridden .hashcode()
method which is aimed at minimizing collisions.
Vehicle.java
Other Considerations
Once we've picked our hash function, we also have three other things to consider: compression function, table size, and load factor. The compression function's job is to shrink and shift the output of the hash function so that it fits into the range backing table (array). This is because for most applications, it's not reasonable to have a table of capacity the largest hash function value. The most common compression function you will see is to simply mod by the table length, though there can be others.
Since the compression function often uses the table size in order to "spread out" our data in the specified range, it's also common to have the table size be a prime number in order to minimize collisions due to compression. There are again, other choices for table sizes, but most often, HashMap implementations will "lazily attempt" to use a prime number for the table length without actually checking if it's prime or not. Another popular choice for table sizes is to use a power of two since it makes the modulo operation more efficient (this is out of scope of the course, but prime numbers are the most expensive for divide-type operations, whereas powers of two are efficient thanks to bitmasks).
Finally, we have the load factor, which is defined as the size over the capacity. This parameter has to do with how high we're willing to let the load factor become before we resize the table, and it varies quite a bit in implementation. It's also very dependent on how we decide to handle collisions. The intuition here though is that as the load factor increases, the table will become inundated with entries, and it becomes more likely for a collision to occur. If we keep the threshold low, then we can almost guarantee 𝑂(1) search times, but we may have to resize more often, and we will use more memory than we may want to.
As you can see, a lot goes into designing a good HashMap. There are countless papers and studies out there looking at what hash functions work best in practice and in theory, so it's a very interesting topic to look at! It also speaks to how important these ideas are; these ideas are quite flexible and general, so they can be applied in data structures other than HashMaps.
External Chaining
Now begins our exploration into various collision handling policies. These policies can be broadly categorized into two different types:
- Closed Addressing: A policy where all keys that collide into the same location are stored at that location by some means.
- Open Addressing: A policy where additional, colliding keys can be stored at a different location based on the original location.
We will discuss one closed-addressing policy and three different open addressing policies. The first policy we discuss is the simplest one in this lesson, external chaining, a form of closed addressing.
What is External Chaining?
- Backing structure: array of LinkedLists → each index can store multiple entries
- “Closed addressing” strategy - entries are put at the exact index calculated (see index 61 in image below).
- For duplicate entries, you need to traverse the entire LinkedLists
Resizing with External Chaining
- When using external chaining, we don’t run out of space. However, long chains impact runtimes.
- Given an backingArray of size 10 and index 0 has 5 K-V pairs as LinkedLists. Load factor = size / capacity = 5 / 10 = 50% (Tip: Don’t think of size as indices that are filled, it’s per K-V pairs).
Linear Probing, Example 1 (Adding)
What is probing?
- Backing structure - Array → each index can store only one entry.
- “Open addressing” strategy - entries may not end up at the original index calculated.
What is Linear Probing?
- Linear Probing: If a collision occurs at a given index, increment the index by one and check again.
Index = (h + origIndex) % backingArray.length
(h = number of times probed [0 ... N])
Wraparound technique: 50 ends up at first index (idx + 6 steps % 7).
Linear Probing, Examples 2 and 3 (Searching + Removing)
The next operations we take a look at are searching and removing from a HashMap. As it turns out, removing from a HashMap doesn't work if we remove items from the array normally. So, we perform what we call is a soft removal. Soft removals are where we leave the entry there, but mark it with a flag to denote that the entry is removed. Such entries are called DEL markers (some sources refer to them as tombstones ). These are entries that exist in the array in order to assist in HashMap operations, but from the user's perspective, looking at the HashMap as a black box, these entries do not exist at all since the user has already removed them.
Removing from HashMap
- Move through array and find value to remove. Pop the value and replace it with a DEL marker.
- DEL markers typically set as a boolean variable at index. We don’t use
null
because empty array indices are the true nulls.
Putting into DEL Index
Probing Scenarios:
- Valid (not null or deleted) and unequal key
- Valid and equal key
- Deleted (DEL)
- Null
Example steps:
- Start at hashed index of remove value (idx = H(1) % 7 = 1). Start incrementing index.
- Save first DEL index if encountered
- Continue moving through array to see if value is found.
- If not, place probed value into saved DEL index.
- If null entry found and DEL index is not encountered, place it in first null index.
Resizing Backing Array (for Linear Probing)
When backing array surpasses threshold:
- Create a new backing array of capacity 2N+1
- Loop through old backing array
- Rehash all cells to new backing array
- Skip over all DEL markers. They no longer serve their purpose.
The Efficiency of DEL Markers
One question to keep in mind, as you consider the use of DEL Markers for removing, is how these DEL markers impact efficiency. The put/remove/search operations all have termination conditions for whether the key is found in the map or not. If the key is in the map, then all is well. However, the more complex problem is when the key is not in the map. There two termination conditions for when the key is not there, whichever comes first:
- We find a null entry in the table to tell us to stop searching.
- We probe for
table.length
times, meaning that we've checked every spot in the table already.
The second case is especially pathological, but it can in fact happen even if the Max Load Factor threshold is set reasonable. As an example, suppose we have a table of length 13, and one by one, we add and remove the keys
0,1,…,11,12 from the table. This leaves us with a final table of size 0 with no null entries, all while never having the load factor be larger than 0.08. This configuration is terrible because it guarantees that any put/remove/search is guaranteed
𝑂(𝑛) performance; just goes to show that DEL Markers worsen the efficiency of our operations. This is also why when we resize, we get rid of any DEL Markers in the process.
There are ways to try to avoid this, such as including DEL Markers in the load factor calculation, or triggering an intentional resize or reconstruction of the table if we know beforehand we'll be removing a lot. We won't be delving into more detail than this, but it is worth keeping in mind the cost that DEL Markers have on performance.
Quadratic Probing
Previously, we saw the most basic of open addressing techniques, linear probing. If you do some examples with it, you will find that linear probing has the problem of developing contiguous blocks of data. These blocks of data can be problematic as they get larger, because any search for data in these indices will necessarily continue until we hit the end of that block.
For example, in the diagram above, indices 3 - 7 are occupied by either data or DEL Markers. If a put/remove/search operation occurs at any of these indices, then the operation continues until it reaches index 8, the green cell, giving a degenerated linear performance. This problem is known as primary clustering (or turtling), and the effect can be mitigated by quadratic probing, which is seen in this lesson.
What is Quadratic Probing?
- Quadratic Probing: If a collision occurs at a given index, add h^2 to the original index and check again. Why? Breaks up clusters created by linear probing.
Index = (h^2 + origIndex) % backingArray.length
(h = number of times we've probed [0 ... N]).
Example 1
Solutions to Infinite Probing
What happens when quadratic probing keeps finding occupied indices (infinite probing)? (See value 22 in Example 1)
- Solution 1: Continually resize until a spot is eventually found
- Solution 2: Impose a set of conditions on the table to ensure that this scenario never occurs.
The Complexities of Quadratic Probing
As we saw in the video, while quadratic probing is a simple modification of the linear probing schemes, it brings with it some more complexities and complications with it. For one, it was proposed to solve the problem of primary clustering, but quadratic probing suffers from something called secondary clustering, which is where keys belonging to different indices collide in quadratic steps rather than linear ones. It isn't as obvious as in linear probing since there isn't a huge block of data, but it does still exist.
The most pervasive issue of quadratic clustering is the infinite probing issue seen in the video. There are two approaches to this problem:
- If
table.length
probes have been made, but a spot still hasn't been found, then resize the hash table and try again.
- Construct the table so that infinite probing will never occur.
The first one is a natural solution to the problem, but it's not without its potential issues as well. A resize operation will involve rehashing keys into a new table, but rehashing can again cause the infinite probing issue. There are some pathological cases where you can setup resizes within resizes, nested however many times you like, though this is quite rare.
The second option will have to be a bit mysterious since a number theory background is not a prerequisite for this course, but some hashing schemes can eliminate the infinite probing case entirely. For example, if we enforce that our table length is a prime number, and we maintain a max load factor of 0.5, then we can guarantee that it will not happen.
Double Hashing
What is Double Hashing?
- Double Hashing: If a collision occurs at a given index, add a multiple of
c
to the original index and check again.
- Why? Breaks up clusters created by linear probing
Index = (c * h + origIndex) % backingArray.length
(h= number of times we’ve probed)
c = result of second hash function
(linear probing if c = 1)
- Hashing strategy:
- First Hash: H(k), used to calculate origIndex
- Second hash: D(k), used to calculate probing constant.
D(k) = q - H(k) % q
- q = some prime number, and q < N
- Delete flag still required when removing.
Example 1
More thoughts on double hashing
As we saw in the example, double hashing can be a powerful framework, so long as it is implemented carefully and efficiently. With a good secondary hash function (one that spreads out the keys and is relatively independent of the primary hash function), even pathological cases where we have many collisions can be prevented unlike linear and quadratic probing.
We would like to make a few remarks that are a bit more subtle, some of which are out of the scope of the course before moving on:
- Linear probing is a special case of double hashing, but quadratic probing is not. The secondary hash function depends on the key as input, whereas if we wanted the same effect as in quadratic probing, we'd need the probe count as the input. The takeaway here is that the secondary hash function is truly a hash function since it's only input is the key, not the probe count.
- Double hashing might be more computationally expensive than the other strategies depending on what the secondary hash function is. At a low level, computers often have specialized instruction sets that do things like incrementing (which we'd use in linear probing), making those strategies faster if collisions aren't that pathological.
This brings us to the end of the collision strategies we'll be covering in this course. In the next lesson, we'll be doing a wrap-up to help compile all the things we've discussed in this module!
Collision Handling: A Summary
We've covered a lot of ground in this module, the basics of HashMaps, in addition to the four different collision handling strategies. We presented them in the following order: external chaining, linear probing, quadratic probing, and double hashing. This ordering was chosen for two reasons: (1) we went from simpler concepts to more complicated concepts and (2) we went in (rough) order of popularity.
In practice, you will mostly see external chaining and linear probing (or some variation of them). External chaining is very nice because it is an extremely simple idea. If there are multiple collisions, then just attach the new item to a chain at that index. The primary reason you would not want to use it is something we discussed a while back, locality of reference. Computers are designed to access adjacent/nearby memory locations quickly, which the open addressing schemes can take advantage of. On the other hand, external chaining necessary involves dynamic allocation for the chaining, so it cannot take advantage of this. Linear probing is the simplest scheme to take advantage of this, so it's another popular choice.
So, what about quadratic probing and double hashing? We made a few arguments for why they might be useful in their respective lessons, but now it's time to discuss why they might not be very popular.
- Their implementations can be tricky and need to be done carefully. We've had semesters in the past where students implemented quadratic probing. If you ask one of them what the experience was like, they'll probably respond negatively, and for good reason! These schemes can be efficient in terms of collision handling, but their complicated implementations make them unlikely to be used in practice.
- Aim to avoid collisions, not how to fix them. We mentioned this previously, but it's important, so we'll say it again. If you are running into cases where you have lots of collisions, then you're not using HashMaps correctly! The
𝑂(1) performance of HashMaps happens when there are few collisions, not when we have many collisions. So, if you have lots of collisions, work on the hash function and table sizing rather than the collision strategy.
- The number of collisions is not everything, much happens at a lower level. For example, incrementing is an instruction that is used very often, so computers are optimized to do them very quickly. Arithmetic like multiplication and modulo are also relatively fast, but they're still slower than incrementing. Another example is the idea of locality of reference we mentioned above. Quadratic probing and double hashing "skip" around the array unlike linear probing. So, while they still get a speedup since the memory locations are "close," they're not "adjacent" like they are in linear probing, so it's not as good of a speedup.
- Configuration of the table might be constrained to avoid the infinite probing problem. Both quadratic probing and double hashing can run into the infinite probing problem if not implemented carefully. If you try to implement it in a way to completely avoid this, then your table length needs to be a prime number. In quadratic probing (and in some cases of double hashing), your max load factor is be no more than 0.5. These are rather non-trivial constraints! Putting aside the problem of finding a suitably large prime number as well as resizing, a max load factor of 0.5 means that you have to allocate an array where half of the space is never used, which can be a non-starter in many contexts.
Hopefully this helps conceptualize the different collision strategies, giving you an understanding of what techniques are out there, and why they are (or are not) used in practice. If you're a theory-oriented student, you might find the theory of
𝑘-independent hashing interesting. This concept analyzes the contexts in which we can guarantee 𝑂(1) performance (at least in expectation). But enough theory-crafting, let's take a look at what some popular programming languages use in their standard implementations for HashMaps.
Food for Thought: What do actual programming languages use?
Let's look at three different languages that have some standard implementations: Java, Python, and C++.
- Java's implementation is the HashMap, which uses a variation of external chaining. It starts off with external chaining, but if a chain reaches a certain threshold in size, then it will instead use a balanced BST (we'll cover self-balancing BSTs in the next course), sorting based on the hashcodes and the natural ordering imposed by Comparable if implemented. This threshold approach is used because for small chains, the overhead of a BST is not worth it since the performance costs are similar for small 𝑛. The max load factor is by default 0.75.
- Python's equivalent is the dictionary. There are variants, so let's look at the CPython implementation. This implementation uses a variation of an open addressing strategy like linear probing. It's difficult to describe in detail without getting into the nitty gritty, but it uses a linear recurrence (rather than just incrementing like in linear probing) to determine the probe sequence. Then, it adds a "perturbation" to the recurrence to change how it probes for common failure cases. The table also uses powers of two rather than prime numbers, and the max load factor is 2/3. The link above is worth taking a look at if you're interested; the documentation gives some interesting commentary on the design decisions made in the implementation.
- For C++, we have the unordered_map, which uses external chaining. An interesting thing about this is that the max load factor defaults to 1.0, which is a nice reminder that external chaining is the only strategy we've seen where the load factor can exceed 1.
Just looking at these three implementations gives us some interesting strategies and design choices made by the developers, which reminds us that these strategies are not set in stone. It's also good to note that these designs and standards can change over time, so it's possible these will change with more data and later version numbers.
Food for Thought: A Hashing Application: Bloom Filters
The only example of a hashing based data structure/algorithm that will be seen in this course sequence is what has been covered thus far for HashMaps in this module. However, hashing is a very popular and flexible idea which is used in many different applications. We would be remiss if we did not look at any of these applications. Let's take a brief glimpse at a data structure that uses hashing, bloom filters.
We want to set the scene first to motivate bloom filters. Suppose we are given a database that hosts a lot of tagging information. This database takes in some text input, then it returns, say the first 10 results with that tag sorted by some metric, back to the user. This database is hosted on some server that is relatively far away from you, and your internet speed is slow.
In this context, performing a search can be an expensive operation (the delay to send search request, search the database, and receive the reply may take a while). We would like to avoid sending the request if the search would yield 0 results. Is this possible without hosting the entire database closer to you? The answer is yes! (But you knew that already didn't you?)
Rather than answering the question of "Is this tag in the database?" correctly all the time, what if we allow incorrect answers rarely? More specifically, let's say we allow false positives, which is when the tag is not in the database, but we incorrectly output that it is. A bloom filter does precisely that.
In the flow chart above, we have three possible outcomes. If the bloom filter returns no, then we can be sure the answer is correct. If the bloom filter returns yes, then it could be wrong, so we will go ahead and send the search request. In the true positive case, the actual cost is high, but that cost was necessary, so the opportunity cost is low. In the false positive case, the actual cost is high when the search was not necessary, so the opportunity cost is also high. If we can make false positives rare, then on average, we'll be sending requests mostly when necessary, and not sending requests for most negative answers, which is good!
How does a bloom filter achieve this? Why, hashing of course! Since our database is very large, we don't want our backing table to actually store 𝑛 items for our bloom filter. Instead, we'll have use a bit array of length 𝑚 where 𝑚 can be set independently of the number of items, and the array starts with all entries as 0. Additionally, rather than one hash function, we'll need 𝑘 hash functions (again, 𝑘 can be set as you want). To make things more precise, these hash functions are ℎ1,…,ℎ𝑘, where ℎ𝑖:𝐷→{0,1,…,𝑚−1} for all 𝑖=1,…,𝑘, and 𝐷 is our data space.
When adding a new tag to the database, we compute the 𝑘 hash functions for the tag, then we set all of those indices to 1. For querying to see if the tag exists, we compute the 𝑘 hash functions, and we check if all of those indices are 1. If any are 0, then we know for sure the tag isn't in the database. If they're all 1, then it's possible for a false positive since an index could've been flipped to a 1 bit due to a different data. Removing from a bloom filter is possible, but it requires other modifications.
We won't go into performance here, but with the proper setting of 𝑚 and 𝑘, along with well-chosen hash functions, bloom filters do a very good job with less than a 1% false positive error rate. Hopefully, this demonstrates to you just how powerful and flexible an idea hashing is!
CS1332 - Secure Computing Considerations
Deletion versus Purging & New Memory Acquisition
Associated with Module 1: Arrays and ArrayLists
The most basic operations for many of our data structures involve adding elements, accessing those elements, and (eventually) deleting those elements. Many of our students don't realize that many complex systems often "delete" data elements simply by removing the user's ability to access those items through the normal interface, while the actual data still resides in memory. Since the actual data might be sensitive, then it's important to understand the difference between deletion (i.e., simply removing access) versus purging as deliberately overwriting and/or modifying the sensitive data so that it is effectively unreadable if accessed later by unauthorized parties. This can be demonstrated in many of our homework assignments where we require the students to overwrite the to-be-deleted values before modifying the access structures themselves. We could further expand on this by having the students measure the increased number of steps needed to effect purging versus deletion, and discuss how this impacts the overall efficiency of the data structure operations.
This can also be tied into the acquisition of new memory. In many cases, a request is made to the operating system when new memory is needed. One example is the Java Memory Model, where new() requests return a reference to a block of memory with the requested size, but only if such a portion of memory is free. From a security and functionality perspective, it's important to ensure that a new block has actually been assigned before attempting to dereference the results. Also, the Java Memory Model normally purges any potentially sensitive data by overwriting the contents of the portion of memory being returned with zeros.
Pointers and References
Associated with Module 2: Pointers, References & Singly Linked Lists
We make a general distinction between pointers and references. Pointers refer directly to locations in memory. In languages (i.e, C) that allow use of "generic pointers", the flexibility of being able to dereference arbitrary memory locations also allows potential misuse by programs and processes that (attempt to) access memory locations for which they are not authorized. Some languages (e.g., Java) don't support this "generic pointer" approach, and instead provide references. With references, only three general types of operations are allowed: (1) an object reference can be assigned "null", which effectively means that there is not any reference being made to a valid object; (2) an object reference can be assigned to a new() object, where the operating system creates the new instance of the object in memory (barring any exceptions or other anomalous conditions); or, (3) an object reference can be assigned to an another existing object reference (normally of the same type), such that the object will then have two or more references (i.e., "aliases"). The intent of references is to avoid some of the security- and memory management-related problems of allowing generic pointers.
Hashing
Associated with Module 6: Introduction to HashMaps and External Chaining (Closed Addressing)
We use the term "hashing" to define the general process of using some set of mathematical transformations (e.g., formulas) to convert the value of an element (i.e., "key") into the address where that key should be stored for later access/lookup. We will discuss how the term "hashing" is also used to refer to a different set of transformations for a given element. More specifically, users are often advised to "check the hash" of a file that they've downloaded to ensure that file's authenticity. In these cases, a hash function is used to generate a "signature" for the given element, which is often a file, and where the file is often significantly larger in size than a key. Also, the desired properties of the "security-based" hash result are different. For the "addressing-based" hash formula, the main concern is generating as much of a uniformly distributed set of addresses as possible, to ensure that elements are distributed as evenly as possible across the address space as possible to minimize collision effects as well as the average access/lookup times. For the "security-based" hash formula, the main concern is that the hash result should be extremely sensitive to any changes in the file, and yield a (clearly and distinctly) different result if the file's contents have been modified. Given that many Computer Science terms are "overloaded" in practice, it would be worthwhile for students to recognize this distinction in usage.
Randomness
Associated with Module 7: SkipLists & Probabilistic Data Structures
The algorithms that we study are mainly deterministic - each step of the algorithm leads to a single, distinct set of changes to the computation state. We will discuss how the properties of randomness can be leveraged in the implementation of various algorithms (e.g., Skiplists). From a performance standpoint, the randomness can be used to ensure that the elements being stored are distributed to across the data structure to support a relative fast O(log n) access/lookup time.
And from a security standpoint, the use of randomness makes it significantly more difficult for an adversary to insert elements into the data structure in a way that will deliberately degrade the performance. For example, inserting elements into a Binary Search Tree (BST) structure in sequential (or reverse sequential) order will cause the access/lookup performance of the BST to degrade towards that of a list-like structure - from O(log n) to O(n). Similar "attacks" against a Skiplist, however, would be unsuccessful.
M8 - AVLs
Module Overview
The biggest drawback that we encountered for BST operations was that the height of a BST can degenerate to
due to imbalance in the tree. We left the idea of a "balanced tree" intentionally vague back then because there are different ways to define balancing.
In this module, we'll be covering a sub classification of BSTs called a AVL Trees, which are self-balancing trees. The name of the tree itself comes from its creators Adelson-Velsky and Landis. AVLs formally define what a "balanced tree" is, and they impose the additional restriction that by the end of any operation, the tree must remain balanced. For us, we'll be looking at how this applies to the add and remove operations specifically, and how the pointer reinforcement technique helps us implement the needed mechanisms (known as rotations) in a clean and intuitive way.
Learning Objectives
Students will:
- Refresh their knowledge of BST operations as well as pointer reinforcement, a technique to implement simple modifier operations cleanly.
- Gain familiarity with how AVL trees formally define their balancing scheme along with what a balanced tree is/is not under this definition.
- Learn about the various rotation types of an AVL, and how these rotations fit into the general add/remove operations of a BST.
- Better their maturity of recursive concepts by implementing rotations using pointer reinforcement.
Intro to AVLs
Now, we can begin to explore AVL trees by looking at how we define the structure of an AVL. Since AVLs are a sub classification of BSTs, they share the same order and shape properties. However, AVLS have an additional shape property to restrict the height of the tree. Let's take a look at how AVLs define balancing through the concepts of heights and balance factors.
Tips:
- You never need to use children’s balance factors.
- Leniency (-1, 1) is there because a perfectly balanced binary tree is rare.
- Positive BF means node is left heavy, negative BF means node is right heavy.
The Definition of Balanced for AVL Trees
In the video, we saw that balanced trees for an AVL satisfy the following definition:
|node.balanceFactor|≤1 for every node. We also saw a brief justification as to why we chose 1 rather than 0. There are many configurations where it's just not possible to have a perfectly balanced tree. For example,
As we see in the diagram above, we have a perfectly balanced tree with 𝑛=3 (it's so beautiful!). However, this configuration can only happen when 𝑛 is of the form 𝑛=2𝑘−1. Obviously, by adding one more data leads to an unbalanced tree in some way. Thus, we must accept some imbalance in the world of AVLs. The goal is to minimize the sum of the depths of all nodes, which in turn minimizes the imbalance encountered in the tree.
In the diagram below, we add 2 to the original tree. The tree on the left is BST. However, by including the balance found in AVLS, it leads us to the tree on the right.
There you go! Perfectly balanced, as all things should be. But wait. At what cost? We got a perfectly balanced tree by minimizing the sum of the node depths, but look at what happened. All the data noted in red are data that moved while maintaining the order property of a BSTs. The balance process made a single add operation, an 𝑂(𝑛) operation, just to update the tree, which is exactly what we were trying to avoid in the first place.
So, maybe minimizing the imbalance isn't the best thing to do. If we tolerate a little more imbalance, then we begin approaching the definition of balanced for an AVL. AVLs allow the depths of leaves to differ by at most 1 (which is what the balance factor definition says). If we "lazily update" by adding to the natural spot in a BST, then we can significantly save cost because updates will not occur as often. When an update does occur, it will be very efficient as is shown in the next lesson. AVL trees hit that sweet spot between allowing some imbalance while keeping operations efficient. It can in fact be shown that the height of an AVL tree is at most 1.44 log(𝑛), which is a pretty tight tree!
AVL Single Rotations
Properties
- Left rotation: Node BF is -2, and right child has a balance factor of 0 or -1.
- Result: the top node becomes the left child of the middle node.
- Right rotation: Node BF is 2, and left child has balance factor of 0 or +1
- Result: the top node becomes the right child of the middle node.
Tips:
- You must always update the height & BF of the former parent (e.g. node A) before doing the same for the new parent.
Rebalancing after removing a node
- Use standard BST method to remove a node and repair links
- Update BF of its parent.
- Determine is BF is imbalanced (bf > 1 or bf < -1) and needs to left/right rotate.
- bf < -1 = left rotation
- bf > 1 = right rotation
- Apply rotation.
- Set parent node’s left/right pointer to child node’s left/right child (depending on rotation method)
- Set new parent node’s left/right pointer to former parent node
- Update height and BF of affected nodes.
Tips:
- Keep in mind when updating 2 child cases, you may need to update and possibly rebalance on the way back up the predecessor or successor node. Doing this ensures a balance as we bubbled up the tree.
Step 1 - Remove node 1, update BF and Height of parent.
Step 2 - BF of parent node is ←1. Apply left rotation
Step 3: Link parent node 2 to left child (node 3) of child node (node 4)
Step 4 - Link child node 4’s left child to be its former parent. Update weights.
AVL Double Rotations
Properties
- Left-right Rotation: Node BF is 2 and left child’s BF is -1
- Result: top node becomes the right child of the bottom node, and the middle node becomes the left child of the bottom node.
- Right-left Rotation: Node BF is -2 and right child has balance factor of 1.
- Result: top node becomes the left child of the bottom node, and the middle node becomes the right child of the bottom node.
- If a rotation is done, the heights and BF of involved nodes are updated again.
Right-Left Rotation: If balance factor of our unbalanced node is +2 and balance factor of right child is +1.
- Right rotation on child
- Left rotation on parent node
Left-Right Rotation: BF of unbalanced node >1 and left child BF < 0.
- Left rotation on child
- Right rotation on parent node
Properties:
- Original height of an unbalanced subtree will be restored after a rotation due to adding data.
Time Complexity:
- Single rotation: O(1), since they’re only concerned with updating 3 nodes at most (unbalanced node, its child, and grandchild)
- Double rotation: O(1), since they’re comprised of 2 single O(1) rotations
Example: add(54)
1) Add 54 to BST
2) Update imbalance
- Leaf nodes will have BF of 0
- Increase height of parent node by +1 and BF by +1 (if only one child exists)
3) Update balance
We observe that node 50 is unbalanced.
4) Apply double rotation if parent node is >= 2.
Parent node is -2
Left rotate on 50.
Update node 44’s BF + height
Right rotate on 68
Update BF + Height
Node 62 moves up, 50 becomes left child.
Node 62 moves up, and 68 becomes right child.
Update BF + height. Return node 62 as node 44’s right child.
Rotation Table
In the past, students have told us that the table presented in the video for rotation types is very useful as a study tool. We provide it here so that you can reference it as needed. Do keep in mind that the shape shown is the simplest case; there will likely be children nodes or entire subtrees in an actual implementation that need to be properly managed.
Qs: What needs to be changed with a BST to become an AVL tree
- Where does the update code go in the add and remove methods?
- After the add / remove action is made, after the recursive call.
- What is returned when pointer reinforcement is used? What is returned from each rotation?
- How is the balancing information updated? What differences are there in terms of the heights compared to BSTs?
- BF update after each rotation.
- New root of the rotated subtree?
Logistics of an AVL
There are some final logistics behind the functionality of an AVL tree based on the rotations that need to be covered.
The Number of Rotations per Add/Remove
Let's get a better handle on how these rotations and updates occur in an AVL tree. Based on the algorithm, we recurse down the tree to find the location to add/remove. Once we've done the standard BST add/remove operation, on the way back up the tree (unwinding the call recursion), we perform updates to the parent(s) node(s). The height and balance factor is updated for every node on the return path up, but a rotation won't be needed at every node. Each rotation is
𝑂(1), and so is every update, but it might be a good idea to see how often these rotations actually occur. We know that the number of rotations can be at most 𝑂(log(𝑛)) since the height is 𝑂(log(𝑛)), but perhaps the number of rotations is far less than this.
That is precisely what ends up happening with the add operation. As it turns out, each add operation will trigger at most one rotation. This is because when adding data, any rotation will reset the unbalanced subtree to have back its original height. On the other hand, any remove operation can trigger𝑂O(log(n)) rotations in the worst case; remove operations do not have the same nice property. When a rotation is performed due to a removal, the new height of the unbalanced subtree may not regain its original height, propagating height changes up the tree.
Let's try out an example with the add method. The steps involved here will be detail oriented, so take things one step at a time and make sure you understand each step of the reasoning before continuing to read! Consider the diagram below, where we are seeing the first rotation triggered by an add operation up the tree. We can do a similar reasoning for the other rotation types, so let's keep it simple and assume it was a right rotation. We will derive that each of these heights and balance factors shown must be as shown by only assuming that the height of node is ℎ+1 and that a right rotation is necessary (so 𝐶's balance factor is 2).
We immediately know that the height of node 𝐵 is ℎ and the height of 𝑇4 is ℎ−2 due to the balance factor and height of 𝐶. Additionally, this imbalance was caused because of the data added. If any subtree's height changed, it was because the height increased. This also means that the new data must be in the subtree of 𝐵 rather than 𝑇4. The balance factor of 𝐵 can be 0 or 1, but we've written in the image that it must be 1. Why is this? Well, we only added a single data. This means that only the height of 𝐴 or the height of 𝑇3 could have increased, not both, so the balance factor of 𝐵 cannot be 0 and must be 1 (because it's a left rotation). This implies that the new data must be in the subtree of 𝐴 (if it was in the subtree of
𝑇3, then the tree would've been unbalanced before the add), giving heights of ℎ−1 for 𝐴 and ℎ−2 for 𝑇3.
After the rotation, 𝑇1,𝑇2,𝑇3, and 𝑇4 are completely unmodified, and, thus, their heights are unmodified. The heights of 𝑇3,𝑇4 determine the height and balance factor of node 𝐶, which is now ℎ−1 and 0 respectively. The height of 𝐴 is also unchanged. This allows us to compute the height and balance factors of the new root 𝐵, which are ℎ and 0 respectively.
This is exactly what we were looking for because the height of this unbalanced subtree was ℎ+1, and before we added the new data, the height was one less, which is ℎ. We've restored the original height, so the rest of the nodes up the tree will not change heights and balance factors!
The Efficiency of an AVL Tree
AVL trees guarantee that the height of the tree is 𝑂(log(𝑛)), so operations like search, add, and remove now have a worst case of 𝑂(log(𝑛)) since the balancing is all done in 𝑂(log(𝑛)) time along the same path taken to get to the location. To contrast, this is relative to the worst case of 𝑂(𝑛) for those same operations in a normal BST. Additionally, since the height is stored in each AVL node in order to maintain balancing information, computing the height of an AVL tree is now 𝑂(1) as opposed to 𝑂(𝑛) like before. This all seems like a straight improvement compared to a BST, but there are some caveats and asterisks here.
We need to keep in mind that when analyzing the efficiency of some tree operation, we need to be cognizant of how it might differ between an AVL and a BST, both efficiency and correctness. For example, traversals would be
𝑂(𝑛) for both since the balancing structure doesn't change the algorithm or efficiency. However, if an operation would modify the tree structure in some way, then it may be harder to do in an AVL since we must maintain proper balancing by the end of the operation. For example, removing all leaves in a BST can be done reasonably easily in
𝑂(𝑛) time using a single traversal. For an AVL, this may not be so obvious anymore since removing nodes may trigger rotations, which will change the leaves of the tree midway. In other words, correctness may be harder to achieve due to rebalancing, which may worsen the time complexity.
If the main things we're doing are adding, searching, removing, and traversals though, then we're golden!
Food for Thought - Self-Balancing BSTs
Self-Balancing BSTs: There's Two!?
When you look up self-balancing BSTs, you will most likely see two types of BSTs come up: AVL trees and Red Black trees. This course does not cover Red Black trees (perhaps in the future if there is sufficient demand, but there are currently no plans for it), primarily for two reasons:
- We see two types of trees in this course with complex operations and balancing schemes (AVLs and (2, 4) trees that we'll see next module). If you can make it through both of these types of trees and understand their operations, then you shouldn't have much trouble learning Red Black trees. Many of the concepts such as rotations will carry over, so we leave it to you if you are interested.
- We want to keep the material paced well for learning while preventing course length from exploding too much. If we wanted to also cover Red Black trees, we'd probably want to break these three tree types up to prevent the course from becoming too monotonous, and that'd probably extend the data structures part of the course by another 2-3 modules. If you make it through all of these modules, you've probably learned enough so that you can probably go off on your own to learn other interesting data structures!
We will talk a bit about how Red Black trees work, but we'll defer that discussion to the next module because Red Black trees are heavily related to (2, 4) trees. For this reason, if you want to look into Red Black trees, we highly suggest you wait until after the next module.
We mentioned earlier that AVL trees can be shown to have a worst case height of around 1.44log(𝑛). You may wonder how do Red Black trees fare against this? Red Black trees can be shown to have a worst case height of around
2log(𝑛)). In other words, the balancing of a Red Black tree is looser than an AVL tree. Because of this, if you know you'll just be performing a lot of lookups, you'd prefer AVL trees of the two.
AVL trees usually store balancing information like heights or balance factors, whereas Red Black trees only store one bit, which is the "color" of the node, red or black. This may give Red Black trees the edge in memory, but AVL trees can be optimized to just store the balance factor, which is 2 bits. Red Black trees tend to be better in terms of general purpose, which is why they are the more commonly implemented of the two, but the differences between the two are not that significant in terms of performance for most cases.
A performance analysis by Ben Pfaff back in 2003 studied 4 different BST variants in a variety of settings, so you might find that interesting. It's an experimentation paper, so it's fairly readable, and it's interesting to see in what contexts each BST type excels. We'll cite it in the module review.
You may have noted that we said 4 BST variants. These are the regular BST, AVL, Red Black, and Splay trees. We'll also briefly discuss splay trees in the next module, which are one of the more interesting BST variants because they yield efficient operations without maintaining strict balancing!
M99 - SkipLists
Errata: Slide should say, “If data > right, then continue moving right” and “If data < right, move down a level”
- Add 25 between 20 and 45 at bottom level
- Once found, LinkedList removal removes 45 from all levels.
Efficiency:
- Search/Add/Remove: Worst case O(n) is when all nodes live on level 0. Essentially becomes a LinkedLIst.
- Space: Worse case O(n log n) when all nodes are promoted to top level, results in massive grid.
- Avg Case O(n) bc converge to geometric series
Errata: Best case is O(1) for all, if data we're looking for is the leftmost entry of the SkipList that has been promoted O(log(n)) times.
- In the case of adding, if nothing has been promoted, the head as at level 0, and we add a new minimum data.
The Purpose of a SkipList
If you are like most students, you may come out of this lesson thinking "Cool, but what is the point of SkipLists, and why are they placed here categorically?" This line of thinking is certainly valid and one we hope that you take as you try to piece together the big picture of this course. To answer these questions, we give a few different answers, which we hope will stimulate some interesting thought and discussion.
- SkipLists can be useful for simple concurrent access operations: Concurrent access is the scenario where you have two users/machines/agents working with the same data structure simultaneously. In a BST, imagine if one user called a remove operation, while another user was searching for some data. If the remove operation begins, and it ends up being a two-child case, then while the pointers are being relinked, the search may occur in the tree while it is being updated. The search may end up yielding the wrong result due to looking at the tree when it wasn't ready for searching.
SkipLists work decently well with this problem because as long as we move pointers in the correct order, even if we are in the middle of updating the SkipList, the end result of a search will be correct since we won't skip data by accident. However, we should note that for many reasons, SkipLists are not often the tool of choice when it comes to concurrent access operations; we just think it's a cool way to introduce the idea of concurrent access if you've never seen it before.
- SkipLists are simple to implement: We will see later on in the course multiple variations of BSTs that have some kind of balancing scheme in order to help search times. However, these balancing schemes can be tricky to implement, whereas SkipLists are relatively simple to implement and understand. It still has a worst case behavior similar to BSTs, but its reliance on probability can often allow it to perform comparably to these balancing schemes.
- SkipLists are a nice introduction to randomization in computing: This is probably the truest answer for why these are taught in many introductory courses and placed in a similar manner. The fact is that SkipLists are one of the more niche data structures taught in this course. This is in contrast to the rest of the course, which takes a generalist approach, allowing you to build up your toolbox of sledgehammers that you can bring out in a variety of situations and contexts.
Randomized computing is an extremely interesting area that is actively being researched and used today, but as you may expect, much of it is locked behind a decently hefty mathematical background and maturity. So, although we won't get to see much randomization in this course, SkipLists are an example of a randomization concept that is simple enough to see in an introductory course like this.
Food for Thought - Randomness In Computing
As we mentioned in the previous SkipList unit, we won't be able to spend much time on randomized computation due to insufficient mathematical background. We would like to take the time to question the use of randomness in computing at all. Randomness is a concept used to model real-world phenomena such as coin tosses or dice rolls. It makes sense to us that if we flip a coin that is equally weighted, we expect about half of the flips to be heads, which we call a "probability of 1/2."
But does it make sense to assume that computers should have access to randomness? To put it another way, is it fair to assume that computers can model random behavior like flipping a coin? The reason we're doubting this is because computers operate in a deterministic way, meaning that the code they execute will always proceed the same way (no randomness). So, can computers model true randomness?
Another related question is whether or not randomness "helps" computation. That is, if we assume that we have access to randomness, can we come up with solutions to problems that are "better" than purely deterministic ones (i.e. faster or more accurate)? We hope to address these two questions here in a limited capacity.
- Access to randomness in computing: This is an open question in computer science, whether or not computers can generate truly random results. If you call a random number generator (RNG) in most languages, it will use what is called a Pseudo-RNG (PRNG), which generates numbers in a completely deterministic way that appears random for most applications. Usually, they have some base number called a seed that is used in large and complex arithmetical operations to "spread out" the numbers in order to appear random. The problem with this approach is that since it's completely deterministic, if you know the seed and the model used, then you can reverse-engineer all of the randomness the PRNG gives.
Some RNGs make use of real-world phenomena that are very complex in nature such as atmospheric noise or thermal fluctuations in the air as a source of randomness. Whether these phenomena can truly be called random is a valid question, but they avoid the reverse engineering problem that PRNGs have.
- Usefulness of randomness in computing: Yet again, this question is a major open question in computer science. There are many examples where problems have nice, simple, efficient, and elegant randomized solutions which have seemingly no deterministic counterpart. For example, consider the problem of primality testing: checking whether or not a number is a prime number. We've known for a while of several randomized solutions that are simple to implement and efficient, but it was a major hurdle to find a deterministic solution. Such a solution was eventually found, but it's much more difficult to implement and not as efficient.
For many of these problems, it appears that randomness plays a key role in solving them efficiently. Perhaps we haven't found the right ideas, or haven't thought out of the box to come up with a deterministic solution. We have nothing that proves for sure whether or not randomness improves or doesn't improve the power of computing. Thus, it remains an open problem.
M9 - (2-4) Trees
In this module, we continue our exploration of trees by taking a look at a very different type of tree from what we've seen before, a non-binary tree type known as (2, 4) trees. These (2, 4) trees inherit a similar order property found in BSTs, but (2,4) trees have flexible nodes that allow for multiple data and multiple children per node. (2,4) trees have a more strict shape property, requiring that every leaf node resides at the same depth. Thus, making the depth of (2,4) trees logarithmic.
(2,4) trees are the last type of tree we will be covering in the course sequence, and it also marks the end of the data structures half of the sequence! With the completion of this module, you'll have built up a formidable toolkit of data structures that you can draw from when problem solving in your programming work. Of course, we cannot cover every data structure that might be useful. But with the knowledge and maturity you have built from this course sequence, we know you have the skills to tackle other data structures on your own as you encounter them in your work. Nevertheless, this module also features some optional "Food for Thought" glimpses at splay trees and Red Black trees, which are other types of BSTs that maintain their operations through various rotations.
Learning Objectives
Students will:
- Encounter their first tree type that is not binary, (2, 4) trees.
- Gain familiarity with the order, shape, and node properties of (2, 4) trees, and why they lead to logarithmic search times.
- Diagram the add operation and its procedure, along with how it fixes overflow violations via promotion/split operations.
- Understand the complex inner workings of the remove operation, along with how it fixes underflow violations via transfer and fusion operations.
Intro to 2-4 Trees
We begin this lesson by looking at the properties of (2, 4) trees (sometimes referred to as 2-3-4 tree). These trees are a generalization of BSTs because they allow multiple data to be stored in sorted order in a node, anywhere from 1 to 3 data. (2, 4) trees are a special case of multiway search trees, which allow multiple data to be stored in a node in generality. They're called multiway search trees because if we have 𝑥1<𝑥2<⋯<𝑥𝑚 in a node, then this defines
𝑚+1 possible areas for data to be. So, a node with 𝑚 data can have 𝑚+1 children, which hold data in the specified range (the namesake of (2, 4) trees is that every node can have between 2 and 4 children).
Multiway search trees that enforce a strict balancing scheme are known as B-trees. As we will see, (2, 4) trees are always perfectly balanced in terms of tree height, so they are also a special case of B-trees.
Strict shape property achieves O(log(n)) operations.
Balancing in a (2, 4) Tree
In the AVL module, we saw the we cannot always maintain perfect balancing of heights in a BST due to the problem we see below. Perfect balancing can only occur for certain values of 𝑛.
Well, (2, 4) trees solve this problem by allowing more than 1 data to be in a node. Naturally, the time complexity of searching now depends on both the height as well as the number of data allowed per node. Allowing up to 3 data is a nice compromise between the two costs while allowing us to perfectly balance the tree efficiently. Of course, there will be imbalances in the sense that some nodes will be more filled than others, but from the perspective of heights, the tree is perfectly balanced.
Searching in a (2, 4) Tree
In a BST node, there was only one data, 𝑥. The way we decided whether to go left or right was to check if our new data
𝑦<𝑥 or 𝑦>𝑥. Searching in a (2, 4) tree follows the same principle. Suppose we have a 4-node as shown below. (The 4-node contains 3 data and has 4 children.)
The direction in which we search for the data is determined by "area" of the search where the data may be located. The decision process is similar for 3-nodes, and, of course, 2-nodes. Let's look at an example. Suppose we have the (2, 4) tree given below and want to look for the data 16. Then, the path followed in the diagram is precisely the search path we would take. The red numbers denote where we compared the data, and the green numbers denote where we found the data.
Food for Thought - B-Trees: Querying Large Databases
As we mentioned, (2, 4) trees are a generalization of B-trees that allow up to 4 children per node. Another way of saying this is that a (2, 4) tree is a B-tree of order 4. For those curious about the name of B-trees, there is actually no official consensus for what the B stands for, so assume it to be whatever you like!
As we mentioned, B-trees have a tradeoff that they inherently work with. The higher the order of the tree, the shorter the tree. However, the higher the order of the tree, the more data per node, and therefore more operations to process each node. In short, we're just shifting the cost of searching from traversal of the tree to a sorted list-like structure in a node. That begs the question, what good is that? As it turns out, B-trees are very commonly used for large databases!
Suppose we have a very large database that we'd like to perform many queries on. This database is much larger than what we can expect to store in active memory, so we're going to have to store this somewhere else which has a much larger storage capacity. Historically, this is traditionally done with disk drives (though in more recent times, SSDs are becoming more and more popular/viable). Reading/writing to a disk is orders of magnitude slower than doing the same with RAM, so we want to minimize how many reads we need to query the database.
As with many things in computer science, hierarchical structure is what makes B-trees so useful. B-trees organize hierarchically into blocks (the nodes and different levels). As it turns out, while reading things from a disk is on the order of milliseconds (yes, this is slow for a computer!), this is actually the time it takes to read an entire block of memory rather than a specific memory location. So, it will be much faster to query data in the same block than querying different blocks (such as in a BST) for each query. What the order of B-tree is used will naturally depend on the system, so that has to be tuned for B-trees to be used optimally.
As time moves forward, technologies improve, and what works best before might not work as well now. As we mentioned, SSDs are becoming much more commonplace, and reading/writing to SSDs is indeed faster compared to disk drives. However, this is still much slower than working directly with RAM. Additionally, there are plenty of things that benefit from the hierarchical nature of B-trees, so it's unlikely that B-trees will be going away anytime soon. Not to mention that adoption of newer practices is surprisingly slow in a space like this where technology improves so quickly!
Adding and Overflow
In this lesson, we will leverage how to search in a (2, 4) tree in order to add data to a (2, 4) tree. Similarly to BSTs, we always add new data to a leaf node. Unlike BSTs, the (2, 4) add operation places the data within the leaf node, rather than add a new child node to the leaf node. This is how (2, 4) trees preserve the tree shape and order properties. However, the node size property may be violated as the node "overflows" with data. If the node size property is violated, then a promotion (sometimes called a split) is performed to resolve the violation. Promotion propagates the overflow upwards into parent nodes until a valid (2, 4) tree is formed.
Overflow occurs. Either middle data will be promoted.
20 is promoted, rest is split left and right into child nodes
Overflow occurs. Third data is promoted, and node is split in two.
12 is promoted. Note how each child sit less than, between, or greater than or parent data.
Overflow occurs. Third data value (45) is promoted
Overflow when 36 is added.
36 is promoted. Root now hits overlow.
Second promotion occurs for 36 and left right sets are split.
Add the data 1,2,…,10 in sorder order. Used the 2nd data to promote. How much of each node is filled? What do you notice about the balance of data (in terms of fill percentage)? What will happen as you continue to add in sorted order?
Use 3rd data promotion to add the final 10:
Removing and Underflow, Example 1
The most complex remove of all data structures presented thus far is discussed in this lesson. We will examine how removing works in a (2, 4) tree and the complications that occur. Just as there was overflow when adding, we have underflow conditions when removing. Underflow is remedied by two operations, transfer and fusion. Transfer is the easier remedy of the two and it is always preferred, if possible. The remove procedure can appear very complicated, so take time to look at the process one step at a time. This is where you will see the value of the vistool. Spend lots of time playing with the vistool in the lab for this lesson to get a handle of what is happening.
No underflow, just remove value.
Depending on implementation, use successor or predecessor to fill remove data.
Successor replaces removed data. This causes underflow (*). In an underflow state, the empty node checks whether adjacent nodes have more than 1 data in it.
To preserve order of values, data is rotated from adjacent node to parent, and parent data moves to empty node.
Removing 31 triggers underflow, but adjacent node only has one entry. This triggers a fusion: parent node’s data and adjacent node’s data is fused to empty child node. Note: Fusion can potentially cause a chain of events.
Removing 49 triggers underflow; adjacent node only has one and parent only has one data.
In this case, we demote the parent node data to empty child node. Parent node is now empty.
Child nodes are fused together, and empty parent node is collapsed.
This triggers a chain of events where root node must handle underflow. 36 is demoted and adjacent nodes are fused.
Root collapses, and [5,36] now becomes the root. We end with a valid 2-4 tree.
Removing and Underflow, Example 2
Replace with successor, triggers underflow (empty former 32 node).
Fusion: Bring down 36. Fuse 36, 40 together. Parent node is now empty.
Root node (32) moves down to empty node.
Bring up sibling data that’s closest to the (former) parent (32), which is 20.
Child leaf of 24, 28 transfers to new parent (32)
2-4 Removing Flow Chart
Food for Thought - Link between (2, 4) Trees and Red Black Trees
In an optional section of the previous module, we discussed the advantages and disadvantages between AVL trees and Red Black trees. We also promised a brief discussion on the workings of a Red Black tree, which we will resolve here. As it turns out, (2, 4) trees and Red Black trees are intimately linked (this is in fact by design by the creators of Red Black trees). So much so that often, when students learn Red Black trees, they will first learn (2, 4) trees as a primer to ease them into it. We will discuss briefly the relationship between the two here, and we'll leave the inner workings/rotations of a Red Black tree for you to learn on your own.
So, what exactly is a Red Black tree? A Red Black tree is a BST that keeps track of the color of a node, allowing it to be red or black. These colored nodes must satisfy certain properties to be a Red Black tree, which we state below. These trees are often taught using sentinel/phantom nodes, which are nodes that have null in them. With this in mind, all leaves are sentinel/phantom nodes.
- Each node must have a color, either red or black.
- All leaf nodes (sentinels) are black nodes.
- If a node is red, then both of its children must be black.
- Every path from a node to one of its descendent leaves (sentinel nodes) must have the same number of black nodes.
Notice that there was nothing in the requirements about the tree being balanced! Unlike an AVL which derives its logarithmic height from a definition of balanced, Red Black trees derive their logarithmic height directly from (2, 4) trees. For every (2, 4) tree, there are multiple Red Black trees that are equivalent to it, and for every Red Black tree, there is a single (2, 4) tree that it corresponds to. Roughly, each black node and its red children form a node in a (2, 4) tree (there is some nuance to this, which we omit for brevity). This relationship is what guarantees that every Red Black tree has height at most 2 log(𝑛). In a sense, the add/remove operations of a (2, 4) tree are also very much related to the operations of Red Black trees.
Food for Thought - Splay Trees
For many data structures, we study how they perform in the average case when the input distribution is uniformly random (each ordering is equally likely). However, this is almost never the case in the real world. A very good example of this is caches.
Here's an example of a cache, a web cache. When you want to go to a webpage using your computer, your computer must somehow get your request to see the content on that page to its destination. This is not done magically, your computer forwards your request to a router first. (Disclaimer: This is a simplification of how this works. Networking is a vast topic, and you could take several classes learning how computers communicate to solve tasks like this one!) This is done by that router as well; each router forwards the request until we reach the final router, which has access to the destination.
This process is generally pretty short (to us), but it can actually take a while in computer time, long enough that we can notice the delay. Web searching is something that we expect to be instantaneous, so we want to minimize this as much as possible. This is where a web cache comes in.
A web cache keeps track of content from websites that have been requested previously. If a request is made for new content, then the web cache has no choice but to go through the standard forwarding process. However, once it's received this content, it can store it in case another request for the content comes by later. This is obviously a lot of data, so web caches will only keep track of recent data.
So, now if we request something, the first router will check the web cache to see if it has what we're looking for. If it does, then it can just immediately give us what we're looking for. This on average saves quite a lot of time!
So, now that we have an idea of what one type of caching is, let's look at one potential data structure. Splay trees are another type of BST that performs rotations to maintain its structure. However, unlike AVL trees and (2, 4) trees, it doesn't attempt to maintain a strict balancing scheme. Instead, it works under the following principle: "If data is queried in the BST, then we want to perform a series of rotations so that the data is now at the root." These rotations are referred to as splaying, and the different types are zig, zig-zig, and zig-zag.
So, what does this accomplish exactly? Over time, data that is queried often (popular data) will be kept closer to the root, while data that is queried very rarely will sink to the bottom of the tree. Data kept closer to the root have shorter search times, so popular requests will be answered quickly. In other words, if there is a bias in the input distribution (how often something is queried), then splay trees will do very well.
What about if the input distribution is drawn uniformly at random? Then splay trees won't do as well as their balanced counterparts since splay trees don't have strict balancing (the degenerate tree is still possible). However, while the worst case of a single search operation is 𝑂(𝑛), we can show that any sequence of search operations will average𝑂(log(n)) time. So, it does have a worst case of amortized 𝑂(log(𝑛))! We haven't seen amortization outside of the context of array resizing, so if you want to see amortization in action with a more complex procedure, then look into splay trees!
END OF PROBLEM SET 2
Module 10 - Sorting
Module Overview
This module marks the start of the second half of the course sequence, algorithms! An algorithm is a finite procedure/mechanism used to solve a problem or perform some computation. In a sense, much of this course sequence has already been about algorithms; however, we have only encountered algorithms with respect to data structures. Now we are flipping perspectives, where we will examine how to solve problems using all of the tools (data structures) that we have added to our toolbox.
Our first meeting with algorithms will be sorting algorithms. Sorting algorithms will span both module 10 and 11. In this module, we cover the more simple, "naïve" sorts to get our feet wet. We will refer to these sorting algorithms as iterative sorts since they all inherently rely on making multiple iterations of the data, fixing unsorted pairs. Once you're familiar with basic sorting algorithms, we'll dive into more complex and efficient algorithms in the next module.
Learning Objectives
Students will:
- Encounter the naïve 𝑂(𝑛2) sorts: bubble sort, insertion sort, selection sort, and cocktail shaker sort.
- Understand the inherent limitations of the iterative strategy used by the naïve sorts with respect to their time complexities.
- Reason about properties of sorting algorithms beyond time complexity: stability, memory placement, and adaptability.
- Learn about the idea of algorithms as solutions to problems in their own right, rather than details of data structures.
Introduction to Sorting
Algorithmic Problems and Solutions
Now that we've begun our journey into the algorithm half of the course, it's worth the time to take a moment and consider what is the goal. For the first half of the course sequence, we built up a large toolkit of techniques and data structures that could be used in a variety of contexts.
In this second half of the course, we turn to solutions for specific problems rather than general data structures. As such, for each problem we try to solve, it's a good idea to formally state the format of the problem. For example, when we say "sorting," we're referring to the following goal:
Input: An array of 𝑛 data that have some predefined consistent ordering (i.e. the data is Comparable).
Output: The same array of 𝑛 data that has been permuted to be in ascending order as determined by the predefined ordering.
This specification may appear unnecessary, but it is essential to understand the nature of the problem. For example, we have specified that the output array is to be the same array in memory as the input array, or that we have specified the sorted order to be ascending. Neither of these specifications make the sorting problem more difficult than before, but we must be sure of what exactly is the goal and how the input is specified. For example, if we require that the input is an array of integers instead, then the problem becomes easier (in a sense) as we'll see in module 11.
Analyzing Sorting Algorithms
- TIme Complexity
- Stability
- Adaptivity
- Place
To become a problem solver, there are core skills one needs to develop. This development in skills comes from taking the time to understand
- the general behavior of an algorithm
- the edge cases that the algorithm must solve
- the issues that arise during the implementation
Bubble Sort
Key Idea: Start from front and compare first two items. If they’re in wrong order swap, if not move on. Keep looping in incremented index until reaching end of array.
Optimizations
Optimization: If swap is made boolean is set to true
Optimization: When the last swap has occurred.
From Sarakrishna slides:
Time Complexity of Bubble Sort
- Best case: Array already sorted. Just loop once.
- Worse-case: Reverse sorted array.
- Double-nested loop must run all the way through, O(n) * O(n) = O(n^2)
Bubble Sort Example
Example 1
First outer loop
Orange: Cells swapped
Blue: Sorted cells
- At end of while loop, everything after cell 5 is in order, so we set stopIndex at cell 5.
- Outer index is still set to 0
Second outer loop
Third + Fourth outer loop
Example 2
- No sorting necessary, but still needs one full iteration to determine that array is sorted.
Insertion Sort
Pseudocode
From Sarakrishna slides:
Key Idea: Sort of reverse of Selection Sort, start from front and push minimum to front.
- Assumes first item is sorted.
- Unlike bubble sort, a fixed number of iterations are done for insertion sort.
Time Complexity
Examples
Food for Thought - Beyond Traditional Computation: Streaming Algorithms
Insertion sort has a somewhat interesting property that we'd like to point out here. In the modern world, datasets are starting to become very large, to the point that it might not be realistic to expect a machine to be able to store the entire dataset in active memory to work with. So, rather than being given all of our input all at once, we instead received data in a streaming fashion? There are various proposed models, but we give one below.
It's worth taking some time digesting the streaming model and the nuances involved. To reiterate, the input size might be very large, so even storing it is prohibitively expensive. So, the best that we could hope for is to "summarize" or "sketch" the important characteristics of the data as we receive the data in the stream. Ideally, the computation time for each item (the update time) should be small too, hopefully not much more expensive than 𝑂(1). Then, at the end of the stream, we use our sketch to finish the computation and solve our problem. If you reflect on various problems you know how to solve, then you'll soon find that this model is very difficult to work in and is an active area of research!
For example, suppose the problem is "How many distinct items are in the dataset, assuming there can be duplicates?" In a traditional model of computation, we can solve this in 𝑂(𝑛) on average (you have the tools to solve this problem if you've completed courses I and II, so try to think of a solution!) As it turns out, if we want to solve this problem exactly, then we need Ω(𝑛) memory too, which is no good in the streaming model! We can do better, but only if we allow some error in our answer. Giving approximate answers and using randomization are common occurrence in the literature of streaming algorithms.
Let's circle back to insertion sort, which is what triggered our discussion of streaming algorithms in the first place. In this model, the problem of "Sort the dataset." doesn't make too much sense anymore since we can't store the entire dataset anyway. Instead, let's relax our problem to answering the question "Is the dataset in streaming order sorted?" This is a significantly easier problem, we just store the maximum item seen so far. If the next data is smaller, we know the dataset is not sorted. This is reminiscent of insertion sort, except that we don't bubble backwards to insert the data in the correct position. In fact, if we remove the space restriction and only look at the data stream portion of the model, then insertion sort is the only algorithm we'll see in the course that can sort the data as the stream progresses.
To the best of our knowledge, streaming algorithms were first discussed by Munro and Paterson in 1978, but the idea and model was popularized in 1996 by Alon, Matias, and Szegedy. We'll cite these papers in the module review section later if you're interested in taking a peek, though keep in mind that they are rather theoretically-oriented. Streaming algorithms have many applications, but they are often useful for networking applications, in monitoring network traffic for abnormalities or other properties of interest.
Selection Sort
Pseudocode
Max approach
Min approach
Saikrishna slides
(Uses minimum index approach)
Key Idea: Start from back and push max to the end.
- outer loop goes backwards from the end
- inner loop starts from index 0 to n
Time complexity
- Best case is NOT O(n)
- Unstable: No previous information about the array is kept. This is why its always a double nested loop thus O(n^2).
Example
Example of long-swaps - why this algo is unstable
Food for Thought - Beyond Traditional Computation: Sublinear Algorithms
Selection sort is the first sorting algorithm we've seen where the best case is not
𝑂(𝑛). In a sense, bubble sort and insertion sort have optimal best cases because it takes
Ω(𝑛) to read the input! We've seen some operations that do better, but that's only if we assume strong things about our input (for example, binary search takes 𝑂(log(𝑛)), but a sorted input is a rather strong assumption). Like in streaming algorithms, this lower bound based on reading the input is only true in the traditional model of computation. What if our data set is large enough that even reading the entire dataset is too expensive? Then, we need to look at sublinear algorithms, which have time complexity smaller than 𝑛. In this note, we'll be looking at a specific class of sublinear algorithms, known as property testers.
A property tester is an algorithm that checks if the input satisfies some property. For example, we may ask whether our array is sorted or not. Since we're not even reading the entire input, we cannot hope to answer correctly for every possible input. So, we need some parameter
𝜖 to quantify what inputs our property tester should work well for as well as some metric for how "far" our input is from satisfying the property. For example, our distance function might be "How many swaps are needed at a minimum to sort the array?" and 𝜖=12 would be saying our algorithm should be able to distinguish a sorted array from an array that requires half of all possible swaps to sort with high probability.
Similar to streaming algorithms, this is a very difficult model of computation to work with, and it is a relatively newer area of research. After all, we are only reading a small fraction of the input and trying to determine whether the input satisfies a certain property from that small amount of information. By the nature of things, many properties seem to require reading the entire input to test for them well, though research is actively working to better understand where this line in the sand is. This answer is rather unfortunate, but it is not entirely unexpected. There are many properties out there where there is no "locality" information; learning about data in one location may tell us very little about the rest of the data.
Cocktail Shaker Sort
Pseudocode
(With last swapped optimization)
Key Idea: Modification of bubble sort
- Two iterations of bubble sort each iteration, one bubbling the maximum to the end of the array, and one bubbling the minimum to the front of the array.
- By doing so, we mitigate some of the more pathological cases of bubble sort.
- Outer loop
end > start
- Inner loop - 2 loops, front to end + end to front
Time Complexity
Best case: Already sorted array, O(n) - one iteration and done.
Worst case: Reverse sorted array, O(n^2). Same as bubble sort.
Benefit to bubble sort: Sorted array except last index which is minimum.
- Bubble sort will have to O(n^2)
- Cocktail shaker will get this done in two iteration, since it goes forward and backwards.
- First iteration - Backward swap will continually move the min to the front of array.
- Second iteration - check all is sorted.
- Example: Array size of 5 (2,3,4,5,1)
- Bubble sort: comparison count = 10
- Cockatil sort: comparison count = 9
Example
2nd nested loop starts from index 7 (where swap occurred)
End of iteration 1 - last location of swap is index 4. 2nd iteration’s start index is 4.
Last swap was index 6 so backward starts there.
Can set a terminating condition, or just let it run to its end.
Iterative Sorts: Review
Bubble sort bubbles the maximum item to the end of the subarray. The subarray's search space decreases based on where the last swap occurred.
Insertion sort maintains two parts, the left part is relatively sorted, and the right part has not been looked at yet. The border between the two sections moves rightward in each iteration, and the item that has switched sides is inserted into its correct relative position in the sorted part.
Selection sort looks for the maximum item in the subarray and swaps it into its correct position. The search space decreases by 1 each time.
Cocktail shaker sort is a modified bubble sort, where each iteration is composed of two bubble sort iterations in opposite directions. The first iteration bubbles the maximum to the end, and the second does the opposite. The search space's bounded are modified based on where the last swap occurred in each half iteration.
Takeaways:
- Bubble and Insertion Sorts are considered adapative, in that the algorithms will recognize if the array is already sorted and terminates early.
- Selection Sort perfectly demonstrates how inefficient an algorithm can be.
The Usage of the Iterative Sorts
The four iterative sorts that we've covered in this module are more demonstrative than anything else. They are rarely ever used in practice because quadratic performance on average is expensive for sorting. There are two cases where they might be used.
- We need a simple implementation of a sorting algorithm, and the amount of data is small enough that quadratic performance isn't bad.
- The size of the data is small enough that the overhead of implementing more complex sorting algorithms is not worth the asymptotic performance gain.
In most cases, one of the sorts in the next module (or a variation) are used instead since they are more efficient. However, it is worth it to take the time and see why the iterative sorts don't perform very well. Each of them are simple and intuitive, but it's very common that the first ideas you come up with that are clean and simple will be inefficient and naïve. It is as important to see algorithms with instability and quadratic efficiency in order to recognize these characteristics in other algorithms and avoid them.
Each of these algorithms have the problem that in the worst case, the search space decreases by 1. In other words, we sort everything together, meaning that each iteration must go through (most) of the data. So, what if instead, we tried to split up the data, and sort the smaller parts? This way, while we'll have to sort multiple parts, these parts will be smaller, so each sort will have to inspect less data. Then, we can combine the sorted parts together in an efficient way. This is precisely a divide and conquer approach, which we'll be exploring in the next module!
Module 11 - Divide and Conquer Sorts
Key Idea:
- Use recursion to tackle sorting (instead of iterative)
- Recursion avoids looking at the same data too many times.
Learning Objectives
Students will:
- Apply the divide and conquer paradigm to develop sorting algorithms that beat the quadratic time complexity of the iterative sorts.
- Learn about a host of more advanced sorting algorithms: Merge Sort, Quick Sort, and LSD Radix Sort.
- See how algorithmic techniques for one problem might be flexible enough to apply to others (Quick Sort v. Quick Select).
- Realize the limitations of comparison based sorting and how other ideas might aid us in getting more efficient algorithms.
Merge Sort
Key Idea: Break up problem into halves recursively, then merges them back together as it bubbles up.
Pseudocode
Part 1: Partitioning
- Base case: Length = 1. Just return.
- Find midpoint
- Split array into two halves
- Apply same method on two halves
Part 2: Merging
- Set pointers for subarrays (highlighted)
Part 3: Emptying one of the subarray when the end of the other subarray is reached.
CSVizTool:
Saikrishna Slides
Time Complexity
- Non-adaptive: No way to detect if array is already sorted
- Out-of-place: Data is being copied over to new array. Not good if low memory or accessing memory is slow.
Example
Left side:
Right side:
Merging example using i and j
another example one level above
Example from CSViztool (2,3,4,5,1)
- See how it handles sub-array that are out of order (2,3 & 1,4,5). It essentially moves i, j pointers and does comparison.
The Efficiency of Merge Sort
An O(𝑛log(n)) time complexity is a bit difficult to visualize, so let's take a closer look at where this time complexity comes from.
Merge sort has two phases per recursive call, the splitting and merging phases. Splitting occurs before we make the recursive calls, while merging occurs after recursive calls end, which is the "unraveling" of the structure. When analyzing the time complexity, we are interested in the entire tree. Each "node" in the tree contributes a cost to the time complexity, and our job is to do the bookkeeping of summing it all up.
In the splitting phase, suppose we're working with a subarray of length 𝑚. Then, we take 𝑚/2 time to copy the left half of the subarray into a new array, and we take another 𝑚/2 time to copy the right half. In total, this is m time for a length 𝑚 subarray. However, if we sum the total lengths of subarrays on each level, this total is 𝑛. So, each level of this tree contributes 𝑂(𝑛) time to the total cost. There are 𝑂(log(𝑛)) levels to the splitting side of the tree, so this becomes 𝑂(𝑛 log(𝑛)) for our total time complexity.
In the merging phase, it's very similar in analysis. For a length 𝑚 subarray, it takes 𝑚/2+𝑚/2=𝑚 time to merge the left and right halves together since they're already sorted. Once again, there are 𝑛 data on each level, and there are 𝑂(log(𝑛)) levels, giving a time complexity of 𝑂(𝑛 log(𝑛)). Summing these two phases together just increases the constant factor, giving an overall 𝑂(𝑛 log(𝑛)).
What about the space complexity? We can actually get away with having one extra length 𝑛 array as a buffer to copy elements into/out of. This gives us a space complexity of 𝑂(𝑛). In fact, even if we had allocated two new length 𝑚/2 arrays at the splitting step of a length 𝑚 array, we'd still only have 𝑂(𝑛) space complexity. There is a bit of nuance to this, so we will defer this discussion to the quick sort lesson, where it will be more meaningful.
In-Place Quicksort
Note: Version taught in course: randomized inplace quick sort using the hoare partition scheme
- Rather than explicitly splitting into halves and merging afterwards, quick sort picks a pivot element 𝑝. Then, it partitions the array so that all data 𝑑<𝑝 is to the left of the pivot, and all data 𝑑>𝑝 is to the right of the pivot.
Algorithm
Part 1: Pivot Selection
Base case: length is 1 or less
- End index can be inclusive/exclusive (course teaches inclusive)
- Pivot - element to sort the array by
- Selected by random index between start & end.
- Auto swapped with element at front of array.
Part 2: Partitioning
Takeaway: Move items greater than pivot to right of array, smaller items go left.
- “crossed” i moves strictly past j
Part 3: Placement of pivot and recursive calls
- Recursion excludes the pivot
CSViztool
Saikrishna Slides
Time Complexity
Best case: Array cut in half
Worse case: Pivot always min/max value of subarray. Degenerates into selection sort. Fixed by randomizing pivot selection.
Example
(Complicated, refer to CSViztool QuickSort to understand)
- How many comparisons are made on the input array [1,2,3,4,5,6]? 10
- How many comparisons are made on the input array [6,5,4,3,2,1]? 11
- How many comparisons are made on the input array [4,3,6,5,2,1]? 12
The Space Complexity of Algorithms
Space complexity is a bit more nuanced when we discuss algorithms rather than data structures. Generally, when people talk about the space complexity of an algorithm, they are talking about what is called the auxiliary space complexity. This is the space complexity of an algorithm if we don't count the space needed for the input and also don't count the space needed for the output. So, in other words, we're looking at the working memory for the algorithm to work beyond the necessary input/output.
This may seem strange at first, but there's a very good reason for this distinction, one we can see looking at the various sorting algorithms. If we included the input and output, then all of these sorting algorithms would have space complexity 𝑂(𝑛), which is quite a heavy price. This measure doesn't distinguish between sorts like bubble sort which have minimal memory footprint and merge sort, which uses a lot of memory due to the extra space needed for array copies. There would also be no way to talk about algorithms that use sublinear space if we didn't make this distinction.
Another nuance that might seem strange is that the space complexity is the maximum amount of space used at any given time during execution of the algorithm. The other option would be to consider all space used throughout an algorithm. We define space complexity this way because from the computer's perspective, if memory is not being used by the algorithm, then that memory is "freed" for the rest of the system to use. This means that the space complexity of merge sort is indeed 𝑂(𝑛) rather than 𝑂(𝑛log(𝑛)) since the maximum memory used at any given time is
Finally, there is one other nuance that has to do with lower level details. Recursion and method calls in general are not a free resource in terms of space. The program needs some way of remembering where execution stopped when a new method call was invoked, so it stores whatever state information is necessary to remember this on the call stack. So, the recursion depth of an algorithm contributes to the space complexity as well. In the case of merge sort, the max recursion depth is 𝑂(log(𝑛)), which is much smaller than 𝑂(𝑛), so we didn't need to worry about it.
However, quick sort is inplace, so this recursive cost is non-trivial. On average, quick sort's recursive depth will be 𝑂(log(𝑛)), and in the pathological worst case, it will be
𝑂(𝑛). So, the space complexity of quick sort is on average logarithmic, and in the worst case linear.
(New) HeapSort
Key Idea:
- Build a min heap (either by adding naively one by one or by using the BuildHeap algorithm).
- Remove (dequeue) from the heap until the heap is empty, which gives us the smallest item remaining in the heap after each call to remove.
- Building heap: O(n) using BuildHeap
- Time complexity: O(n log (n)) for all cases.
- Adaptive: No
- Stable: No (can’t guarantee relative ordering)
- Significance: A lot of sort algorithms are O(n log(n))
Class implementation uses PriorityQueue (MinHeap) to sort the data.
Time Complexity
Example
Limits of Comparison-Based Sorting
Key Idea:
"For any deterministic, comparison-based sorting algorithm A that is always correct, there exists some input x such that A(x) takes Ω(nlog(n)) comparisons to execute, and therefore its worst case time complexity is Ω(nlog(n))."
- Statement applies to any deterministic, comparison-based sort, and always successfully sorts the array.
- Not about the algorithm, but the fact that there is always going to be some input that will result in Ω(nlog(n))
- This is called a lower bound statement about a class of algos.
High-Level Idea Summary:
- Each comparison provides limited information about the input's configuration.
- The algorithm must distinguish between all possible configurations to sort correctly.
- There are (n!) possible configurations for (n) distinct objects, due to (n) choices for the first position, (n-1) for the second, and so on. This results in a significant number of configurations.
- Each comparison contributes a small amount of information.
- It requires at least Ω(nlog(n)) comparisons to identify the input configuration and sort correctly.
Final takeaway: We can prove a stronger statement. Highlighted part is important.
"For any (potentially randomized) comparison-based sorting algorithm that is always correct, the number of comparisons it must make on average is at least Ω(nlog(n)), meaning the average case is Ω(nlog(n))."
- “Randomized” - allows inclusion of quicksort
- “On average” - not just worst case.
LSD Radix Sort
Pseudocode
(Designed for sorting numbers in base 10)
CSViztool
Saikrishna Slides
Notes:
- Buckets: If sorting only positive integers, there’d only be 10 buckets. If including negative, it is 19 [-9 to 9]
for iteration <- 0, k
: k = count of digits in longest number (largest magnitude)
- 1st loop: Iterate through data in array and adds each to their respective buckets based on current digit.
- Calculating digit (base 10): 1st digit is the least significant digit (one’s place). Buckets sorted by ones place, then ten’s place and so on...
- How to calculate digit? Use combo of division and mod operators.
- Modding n by 10 = result always the digit in the one’s place
- Dividing n by 10 = the one’s place is removed
- Dividing n by 100 = both one’s and ten’s places are removed
- Tip: Creating a power of ten variable outside of loop and incrementing as necessary may be helpful.
- Once digit is obtained, place data value into the correct bucket.
- Tip: Think about why it would be necessary to add nine when determining the bucket...
- 2nd loop: After placing all value into proper bucket, loop through each LinkedList and add them to the out of place array.
- Inner while loop allows skipping empty buckets.
- Tip: Make sure to add in the order they were placed into the bucket.
Tip: “One common mistake students make with radix sort is the termination condition when implementing base 10 LSD radix sort. Students notice that in many cases, if all items falls into the 0 bucket, then the result is sorted. However, this is not always the case!”
Time Complexity
Take aways:
- Works best on large arrays of integers where k (count of digits in longest number) is small compared to n (number of integers)
- Doesn’t work well on small arrays where k is large and n is small.
Example
(Note: Example only uses positive integers, so bucket width of 9 is used (use 19 for negative)
Iteration 1
Step 1) Put each index item in buckets (LinkedList) according to 1st digit.
Step 2) From 0, overwrite items in original array with items from buckets.
- For buckets with more than one item, it acts like a queue and adds to next index in original array.
- Empty buckets are skipped over.
Iteration 2
Step 3) Put each index item in buckets according to 2nd digit.
- Now each bucket is ordered by up to the 2nd digit
Step 4) From 0, overwrite items in original array with items from buckets.
Iteration 3
Step 5) Put each index item in bucket according to 3rd digit.
Step 6) Empty buckets. Array is now sorted.
Food for Thought - MSD Radix Sort
- MSD = Most significant digit sort; Starts from left side of integers.
- Applied recursively - For each iteration i:
- From left to right in our array, move the integers into our buckets based on the digit value of what's in the 𝑖th position from the left.
- For each bucket, recursively call MSD radix sort on that bucket for iteration 𝑖+1.
- Take the sorted data in each bucket and dequeue all items like in LSD radix sort from left to right into our array.
- Higher footprint than LSD because buckets need to be created for each stage of recursion.
- Properties:
- MSD is somewhat adaptive - It can stop sorting early if the numbers are of varying lengths.
- MSD radix sort can give you an "almost sorted" array by doing a few iterations. Helpful if we only need a rough sort based on order of magnitude.
- This version of MSD radix sort is stable, though there are other variants that use less memory at the cost of stability.
Quick Select (Kth selection problem)
Key idea: Use a derivative of an algorithm that solves one problem in order to solve a different problem.
K-th selection problem:
- Input: An array comparable data of length n
- Output: k-th smallest data in the array.
- If duplicates present, they count in the numbering.
- Input array may or may not be modified.
Pseudocode
Base case: If recursion goes all the way to base case of QuickSort (k-th smallest element)
Time Complexity
Example
Goal: Find 3rd smallest item
Iteration 1: First do same process as QuickSort
Once i and j cross, QuickSelect deviates:
- Compare j to k-1. If k-1 < j recurse left. If k-1 < j recurse right.
Once k-1 is found (e.g. 3 - 1 = 2), we terminate.
Food for Thought - Deterministic Pivot Selection
Q: Instead of random pivots, why not deterministic.
- Example: Last item in subarray - if sorted, we will encounter worst case.
- Example: “median of 3” - take first, last and middle item of subarray, calculate median.
- Example: “median of medians”:
- Group the 𝑛 elements of the subarray into groups of size 5. There are roughly 𝑛/5 of them.
- Find the medians of each of these size 5 groups.
- Compute the median of these 𝑛/5 medians computed from those groupings. This part is done recursively.
- Use the computed "median of medians" as the pivot for the next iteration of quick sort / quick select.
- Pros: Worst case is only O(n). Cons: High overhead
Module 12 - Pattern Matching Algorithms
Problem Statement
Input: A small string that is called the pattern and a longer string called the text. The pattern is of length 𝑚, and the text is of length 𝑛. In some special cases, we may also receive the alphabet with which the characters in the pattern and string are drawn from of size 𝑠.
Output: There are two primary variations of this problem. For the single occurrence version, we output the location (text index) of any occurrence of the pattern as a substring of the text. In the all occurrences version, we output a list of all locations (text indices) for occurrences of the pattern as substrings in the text. These occurrences can overlap, so if the pattern were 𝑎𝑏𝑎, then the text 𝑎𝑏𝑎𝑏𝑎 would have two occurrences to output.
Brute Force Algorithm
Pseudocode
- Align the pattern
- Compare characters
- If no match, shift
- If match, compare next characters
- If all characters match, pattern is found.
Example
Time Complexity of Brute Force
Note: Main cases covered in course:
- Pattern doesn’t exist in text
- Pattern exists in text + we want to find a single occurrence
- Pattern exists in text + we want to find all occurrences
(Course does not focus on average case)
Boyer-Moore Preprocessing
- Key idea: Preprocesses the pattern, but not the text
- Pro: Performs best when there are many characters in the text that do not occur in the pattern.
- Boyer-Moore uses “bad character rule” - Skips past character not found in pattern.
- Course covers the “simplified” BM algo. Skips over “good suffix rule”.
Last Occurrence Table
Diagram Example:
- Hash map contains index of index of last occurrence of each character
- Asterisk represents all characters not in pattern
Pseudocode
CSVizTool
Building Last Occurrence Table:
Bayer-Moore Tracing
Key Idea:
- Create last occurrence table to optimize shifts past mismatches.
- Moves right to left in the pattern.
- If there is a match, continue comparing text and pattern.
- If there is a mismatch:
- If character is in the last occurrence table, shift to the right to align the pattern with the text
- If character is NOT in the alphabet, shift pass the mismatched text entirely
- If aligning the pattern results in a backwards shift, just manually shift forward by 1.
Pseudocode
- Instantiate last occurrence table
- Align index 0 of pattern with index 0 of text
- While loop: Start comparisons from index m-1
- Keep decrementing j. If j less than 0 a pattern match is found.
- If no match, query last occurrence table with char in text that mismatched.
- “shift” indicates index where shift should take place, then set i to shift
CSViztool
Pattern shift cases:
Example 1
- Example of case 3 shifting: If text contains character not in pattern, shift the length of the pattern to the right.
- Reaching index 0 means a pattern match is found.
Example 2
Time Complexity
- Works well with large alphabets. Increases chance of case 3 shifts.
- Best case (all occurrences): Scenario where we are making lots of case 3 shifts.
- Worst case: Need to shift by one each character in text.
KMP Preprocessing - Failure Table
Key idea: Preprocesses pattern AND text
Failure Table
Creating Failure Table by Inspection
Pseudocode - KMP Failure Table
3 cases in while loop: Compare pattern at index i to pattern at index j
- Case 1: Character in p[i] equals p[j]
- Set f[j] to i+1 (length of prefix)
- Case 2: Character in p[i] doesn’t equal p[j] and i at 0
- ie. Character don’t match, no prefix built
- Case 3: p[i] does not equal p[j] but i is not at 0
- i.e. Characters don’t match but already building up a prefix
Example
Complicated, watch video: https://gatech.instructure.com/courses/406950/pages/kmp-failure-table-algorithm?module_item_id=4026224
CSViztool
Tip:
The example presented in the video, the case where 𝑝[𝑖] !=𝑝[𝑗] and 𝑖 !=0 may not be useful. The pseudocode tells us that in this case, we set 𝑖←𝑓[𝑖−1], but why is that? This is because there can be sub-prefixes that may be a potential match. To see this in action, try to compute the failure table of the pattern 𝑎𝑏𝑎𝑎𝑏𝑎𝑏 by the algorithm. Your final result should be [0,0,1,1,2,3,2], which you can confirm by inspection using the definition of what the failure table entries should represent.
Time Complexity
Efficiency:
- j moves forwards m times
- j stays in place at most m times
Big-O: Failure table construction is O(m + m) → O(m)
KMP Search Algorithm
Comparing Boyer-Moore and KMP
(Errata: “To clarify, the first bullet point on the right should read pattern index - 1
”)
Pseudocode
Handling multiple matches
- If we want to modify it for all occurrences, then upon finding a match, we set patternIndex←𝑓[𝑚−1].
- The reason for setting it to 𝑓[𝑚−1] is that a match for the entire string can be thought of as a mismatch at the imaginary pattern index 𝑚 since we know indices 0,…,𝑚−1 all match. This just extends the final case in the pseudocode with 𝑗=𝑚.
CSVizTool
Example
Complicated - watch video: https://gatech.instructure.com/courses/406950/pages/kmp-tracing-example?module_item_id=4026230
Time Complexity
- KMP is linear in worst case (unlike Boyer-Moore)
Rabin-Karp
Key idea:
- improving the brute force algorithm by adding a "screening" step to decide if we should do a brute force comparison or not between the pattern and text window.
- This is accomplished by comparing hashes of the pattern and the text window. In particular, the algorithm uses a very robust hash function known as the Rabin Fingerprint rolling hash, which can efficiently update the hash along the text as the window moves along.
- Used to detect plagiarism
Pseudocode
Rabin-Karp Rolling Hash
- h(x) - Typically use ASCII value for hashing
- 3rd bulletpoint - used to prevent collisions
Example:
Updating the Hash:
Example:
Summary:
Rabin-Karp Tracing
Key Idea: Rolling has, if hash matches (e.g. “cab” vs “acb”) do a inner brute force loop.
(To record all occurrences, just store the indexes where a match occurs while looping)
Efficiency
- Assumes that computing the hash for a single character is O(1)
- “We should also note that though the Rabin-Fingerprint rolling hash is a very good hash function, there is still the problem of integer overflow, so there can still be collisions. They would be extremely unlikely, but that is why the worst case is as stated. In most contexts though, the average case would instead be linear. In fact, the original paper was in reference to the uniform average case, which they did show was indeed linear.”
Pattern Matching Summary
Boyer-Moore
- Industry standard
- Efficient for most cases, O(m+n / m)
- This is sublinear in the length of text
- Degenerates toward brute force for smaller alphabets
- Versatile (lots of mods and variations)
- Works best in scenario with large alphabets where we can skip a lot of them (e.g. finding a word in text for a language with a large alphabet)
KMP
- First worst case linear time pattern matching algo.
- Works better than BM if alphabet is small.
- Can be used in streaming since it never goes backwards
Rabin-Karp
- Efficiency: Worst case O(mn) but generally linear using a good hashing algorithm
- Space: Small memory footprint. BM and KMP require storing tables of length O(m) or O(s)
Module 12 Review
Big-O Summaries
M13 Graph Algorithms
Terminology
- Binary trees and LinkedLists are graphs
- Graph: G = (V, E)
- Order(G) = |V| . Number of vertices
- Size(G) = |E|. Number of edges
- Degree of a vertext: deg(v). How many edges a vertex has.
- Edge: e = (u,v) where u, v are vertices
- “Incident on a vertext” , directly connected
- Directed: All edges are one-way
- Undirected: All edges are two-way
- “Path” - set of edges that connect a pair of vertices
- “Length of a path” - number of edges traversed
- “Simple path” - path does not repeat vertices
- Edge weights: Quantify cost of traversing an edge
- Dense - number of edges is close to maximal number of edges
- Sparse - number of edges is minimal
- Disconnected - a vertex is disconnected from rest of graph
- Connected - all vertices are connected.
- “Simple” - doesn’t have self-loops or parallel edges
- “Non-simple” - has self-loop, parallel edges
- “Cyclic” - vertices are repeated through a graph
- Can go through paths in a cycle
- “Acyclic” - path has no cycles
- Tree is a connected, acyclic graph
Graph Terminology and Conventions
Undirected v. Directed Graphs
- Undirected: edges in E are defined as unordered pair {u, v} where u, v are vertices in a vertex set V
- {u, v}, and {v, u} represent same edge object
- Directed (digraphs): edges in E are distinct ordered paris (u, v) where u is the origin and v is destination.
- (u, v) and (v, u) represent different edge objects
Paths, Trails, Cycles, Circuits, and Walks
Path: sequence of non-repeated adjacent vertices (aka “simple path”)
Trail: repeat vertices allowed, but edges can’t repeat (aka “trail”)
Walk: Trail where edge can also repeat.
Cycle: Path where 1st and last vertices are adjacent (and can connect)
Circuit: Trail where 1st and last vertices are adjacent.
Simple Graphs, Self-Loops, and Parallel Edges
Now let's look at different types of edges and see how they affect a graph:
- self-loop is an edge where the start vertex and the destination vertex are the same.
- Parallel edges are where there exist multiple edges between two vertices with the same orientation.
- A graph is simple if there are no self-loops and no parallel edges, and when we say the term "graph," we are usually referring to a simple graph.
- When we want to specifically allow self-loops and parallel edges, we instead refer to it as a multigraph.
Graph Complexity
- Growth functions are in terms of |𝑉| and |𝐸| (verex set / edge set)
- Some sources will forego the cardinality notation and just use 𝑉,𝐸 directly to refer to the number of vertices and edges.
- Another common convention is to define 𝑛=def|𝑉| and 𝑚=def|𝐸|.
Connectedness in Graphs
- In an undirected graph, the graph is connected if for every pair of vertices, there exists a path between the two.
- For digraphs
- Path from 𝑢 to vertex 𝑣 in a digraph does not imply that a path from 𝑣 to 𝑢 exists.
- If a digraph is connected in the undirected sense, where the edge orientations are removed, then the graph is connected and it is referred to as weakly-connected.
- If the graph is connected in the sense that every pair of vertices has a path between them, then the graph is considered strongly-connected.
Dense v. Sparse Graphs
- A graph is dense if m=Ω(n2)
- A graph is sparse if m=O(n)
To motivate these definitions, let's first consider a simple graph where the number of edges is maximized, a complete graph
Kn.
- The number of edges this graph has is a combinatorial problem. We have 𝑛 vertices, and we need to count how many ways to pair them up, ignoring the ordering of the pairing. This is precisely (n2)=2n(n−1)≈n2edges.
- The sparse definition gives us a way of determining whether the number of vertices is the dominant growth term or the number of edges is.
Putting this in the context of graph complexity:
- When the graph is not simple, then 𝑚 and 𝑛 can be independent from each other since we can always add parallel edges or self-loops to increase 𝑚.
- When the graph is simple, there is in fact a hard combinatorial relationship between the two, namely that in general, m=O(n2).
- If graph is dense, then we have that m=Θ(n2)
- If graph is sparse, we have that m=O(n).
- A disconnected graph is always sparse.
Common Graph Families
- Top-left: complete graph (also called clique) 𝐾𝑛 on 𝑛 vertices.
- The complete graph is just an undirected graph where every edge is present.
- If directions are added to each of the edges in the complete graph, then the graph is called a tournament.
- The upper right graph in the figure below is a cycle graph 𝐶𝑛 on 𝑛 vertices.
- In the video, we defined a cyclic graph as a graph allowing a cycle, a very loose definition. However, in other areas, the term "cyclic graph" is interpreted as the cycle graph specifically.
- The lower left graph in the figure above is a tree
- In graph theory, a tree is an acyclic, connected graph.
- In a sense, a tree is a graph for which we've maintained connectedness with the minimum number of edges. One can prove that any tree of 𝑛 vertices will have 𝑚=𝑛−1 edges.
- If we drop the requirement of connectedness and just maintain that the graph is acyclic, this is also called a forest.
- Lower right graph in the figure is the Petersen graph.
- The Petersen graph will not be important for this course, but if you're a theory student interested in learning graph theory, this graph is a nice small graph to have around because it serves as a counterexample for many graph problems. The Petersen graph has multiple copies of C5 as subgraphs; try to find them!
Graph Representations
Graph ADT Procedures
Graph Representation - Considerations
- Efficient data storage - need auxiliary data structures when adding / removing vertices and edges
Adjacency Matrix
- Space Complexity - O(|V|^2)
- Not efficient if vertices added
Adjacency List
- Space complexity: O(|V| + |E|)
- Works well when graph is sparse
- (Type used in this course)
Edge List
- Space complexity: O(|E|)
- Efficient way if only edges need to be considered.
- Completely omits isolated vertices
Depth-first Search (DFS)
Implementations:
- Non-recursive - similar to BFS
- Recursive
Non-recursive DFS
(algorithm not covered in course)
CSVizTool
Recursive DFS
Pseudocode
- Uses a wrapper method that calls the above
CSVizTool
Example
CSVIsTool
Efficiency
- Dependent on data structure used
- Assume that the visited set and recursive call stack implemented in the algorithm have
𝑂(1) standard operations.
- Actions in DFS:
- Visit each vertex V
- Consider all edges E
- Efficiency: O(∣V∣+∣E∣)
Notes
- Suppose that we are given the following vertex list that is the output of DFS on some unknown, undirected, connected graph: [𝐴,𝐸,𝐹,𝐷,𝐻,𝐵,𝐶,𝐺]. Briefly describe how the graph could look like. Note that there may be more than 1 possible graph. HINT: What data structure would the graph resemble?
- A: Graph would resemble a tree
- Recursion vs Stack method - Given same starting point yields different order of retrieved lists
- Why? Because stack method places vertices onto a stack and popped from the end (LIFO style). The last neighbor pushed will be the first visited when the stack is popped, which can lead to a different visitation order.
- How to keep same order: When using a stack, push the neighbors of each vertex in the reverse order of how you would explore them recursively. Ensure that the order in which neighbors are considered is consistent between both implementations.
Applications of DFS
- Use to detect if a graph is connected.
- If a graph is connected, DFS can find if there exists a path from one vertex to another
- Use for cycle detection.
- Mixed with no.1, can detect if graph is a tree.
- Obtain a spanning tree of the graph, ie. a subgraph of the original for which all vertices are connected with the smallest number of edges.
- Detect if a graph is bipartite, ie. we can partition vertices into two sets where there are no edges between vertices in the same set.
- Simulate decision for AI for games with structure.
- Topological sorting on directed acyclic graphs (DAGs). Can use DFS to get ordering of vertices based on edge orientations.
- Obtain a meta-graph of strongly connected components in a digraph.
- We can treat each strongly-connected components as a single vertex of a new meta-graph.
Breadth-first Search (BFS)
Key Idea: Visit all vertices reachable through connections from some indicated start vertex.
- Both DFS and BFS are exhaustive search algorithms. If there exists a path to a vertex, both BFS and DFS will find it.
- In this course, ties are broken by alphabetical order
- Note: Not all edges will be traversed.
Requirements: 1) Queue, 2) Visited set, 3) Starting vertex
Pseudocode
CSVizTool
Efficiency
- Assume visited set and queue implemented in the algorithm have O(1) standard operations
- Considerations are 1) visit each vertex, and 2) consider all edges
- Efficiency: O(|V| + |E|)
BFS vs DFS
BFS when to use:
- If it is known that the vertex is relatively close by compared to the size of the graph.
- If we are doing a traversal of a tree from some root node and the depth of the tree is large
DFS when to use:
- If each node has many neighbors, then a BFS will quickly blow up in space usage since we need to store all of these neighbors in a queue. Relevant to AI where decision space is infinitesimally large.
Summary:
- The heuristics we have, may tell us where to search, in which case we may switch between the two strategies depending on the graph's structure.
- Main challenge with graphs is scale (ie. social networks)
- As you learn more about algorithms and graphs once you finish the course and are out the algorithmic world, you may find a slant towards DFS over BFS.
M14 - Dijkstra’s Shortest Path
Algorithm
Key Idea: Solve the single source shortest paths problem in a graph.
- Dijkstra's algorithm solves the shortest paths problem where all weights are non-negative, and there is a single source.
- Warning: Only works on non-negative weights
There are variations of the shortest paths problem, based on certain parameters:
- Are there negative edge weights present in the graph?
- Are we trying to find the shortest path from a single vertex to another known vertex? (Single source, single destination)
- Are we trying to find the shortest path from a single vertex to all other reachable vertices? (Single source)
- Do we want the shortest path for all pairs of vertices? (All pairs)
Requirements: 1) Priority queue, 2) Visited set, 3) Map, 4) Source vertex
Pseudocode
- Tip: Priority queue always prioritize vertex with shortest path.
- Tip: If PQ have vertices with equal length, it doesn’t matter which you visit first.
- Univisited vertex (disconnected) get a distance of inifinity
Dijkstra's Algorithm: Vertex Distance Updates
Once a vertex becomes visited, is that vertex's distance ever updated again? Why or why not?
Key Concepts
- Greedy Algorithm: Dijkstra's algorithm operates on a greedy method, selecting the optimal choice at each step with the goal of finding the most optimal solution to the problem as a whole.
- Relaxation Process: Throughout its execution, Dijkstra's algorithm maintains and updates the shortest known distance from the source to each vertex. This process, known as relaxation, adjusts the distances based on newly discovered paths that offer a shorter route to a vertex.
- Finalization of Shortest Distances: When a vertex is visited, meaning it is removed from the priority queue, its shortest distance from the source is considered finalized. This is because the algorithm ensures that by the time a vertex is visited, all possible paths to it have been considered, and the shortest one has been chosen.
- Assumption of Non-negative Weights: Dijkstra's algorithm assumes that all edge weights in the graph are non-negative. This assumption is crucial for the algorithm's guarantee that once a vertex's shortest distance is determined, it will not be updated or reduced further by any other path.
Why Distances to Visited Vertices Are Not Updated
- Optimal Path Guarantee: By the time a vertex is visited, Dijkstra's algorithm has already guaranteed that the shortest path to that vertex has been found. This is due to the greedy nature of the algorithm and the process of relaxation.
- Non-negative Edge Weights: The absence of negative edge weights ensures that discovering a new path to a visited vertex will not provide a shorter distance than what has already been determined.
- Efficiency and Correctness: Not revisiting vertices or updating their distances once finalized ensures the algorithm's efficiency and correctness, preventing unnecessary computations and ensuring that the shortest paths are correctly identified.
Advantages of Updating Distance Map in Dijkstra's Algorithm
Dijkstra's algorithm will update the distance map whenever a shorter path is found, regardless of whether or not the vertex has been enqueued or not. What is the advantage of doing this?
Dynamic Path Optimization
- Immediate Shortest Path Updates: Continuously updating the distance map ensures that the shortest path information is always current, reflecting the most efficient paths discovered up to that point.
Enhanced Accuracy
- Guaranteed Optimal Paths: By updating distances whenever shorter paths are found, the algorithm guarantees the accuracy of the final shortest path calculations, ensuring they are indeed the shortest.
Improved Efficiency
- Reduction of Redundant Exploration: This approach minimizes unnecessary exploration of longer paths, focusing computational resources on promising paths, thereby improving the algorithm's overall efficiency.
Priority Queue Effectiveness
- Optimized Vertex Selection: Updating the distance map enhances the priority queue's ability to accurately prioritize vertices for exploration, directing the search more effectively towards the shortest paths.
Algorithmic Flexibility
- Adaptive Path Finding: The ability to dynamically update path lengths allows the algorithm to adapt to new information, refining its search strategy as it progresses through the graph.
Application on Digraphs
Is there any reason that Dijkstra's algorithm would fail for digraphs compared to undirected graphs?
- Dijkstra's algorithm is fully capable of handling digraphs, provided that all edge weights are non-negative.
Motivating Dijkstra’s Algorithm
- Assumption: The shortest path between a pair of vertices must be composed of the shortest path to a neighbor plus the incident edge.
- Math:
- Given vertex v1 and vk,
- Let shortest path distance: d(v)=(v1…vk)
- Let weight of edge between vertices if adjacent: w(va,vb)
- Dijkstra’s assumption: d(vk)=d(vk−1)+w(vk−1,vk)
- How assumption is used: Processes into 3 subsets, “visited”, “frontier”, “unexplored”
- “Visited” = out of priority queue. Algo guarantees if vertex exited PQ, it found shortest path to that vertex.
- “Frontier” = vertices in priority queue
- “Unexplored” = vertices that we know exist, but not in priority queue
Efficiency of Dijkstra
Given assumptions:
- Visited set and distance map are backed by a HashSet and HashMap, which have O(1) operations.
- Priority queue is backed by binary heap.
Space complexity:
- PQ could contain O(|E|) entries.
- Each time a new path is considered and added to the priority queue, it is considered a new edge as an extension from the vertex.
- No new paths are added to the priority queue, if the vertex is already visited.
- A visited vertex means we found a smaller distance earlier in the search, and it guarantees we do not reuse edges.
- PQ has add and remove operations. Add is O(|E|) and remove is O(log |E|).
- Thus, Efficiency=O(∣E∣+log∣E∣)
Side Note: Some sources may list the time complexity as either 𝑂(|𝑉|2) or 𝑂(|𝐸|+|𝑉|log|𝑉|). The first expression is what happens if an adjacency matrix is used; the cost of going through neighbors dominates the log factors. The second expression is what happens if a Fibonacci heap is used since the amortized cost of adding/updating in a Fibonacci heap is Θ(1). Please keep in mind that Fibonacci heaps are almost never used in practice and are largely for theoretical purposes of bettering the time complexity.
Optimization
Decoupling “visited” vertices from updates to distance map:
- Rather than only updating the distance map when we visit a vertex and achieve the optimal shortest path to that vertex, we can update the distance map as we find new paths and add them to the priority queue.
- This way, we can use the distance map as a criteria for adding to the priority queue to shrink the number of paths we add to the priority queue.
- This can also let us do away with the visited set if needed, reducing the amount of space by a constant factor.
Update priority queue updates with a different priority:
- If this optimization is done, then the priority queue's size will never exceed O(|V|), which is ab improvement over the previous optimization.
Negative Edge Weights and Cycles
Negative Edge Weights
- Dijkstra's Algorithm Limitation: Dijkstra's algorithm cannot handle graphs with negative edge weights effectively.
- Negative Cycles vs. Negative Weights: It's crucial to distinguish between graphs with negative cycles and those with merely negative edge weights but no negative cycles.
Negative Cycles
- Definition: A negative cycle exists in a graph if there's a cycle where the sum of the edge weights is negative.
- Impact on Shortest Path Problem: The presence of a negative cycle makes the shortest path problem unsolvable in a meaningful way, as traversing the cycle can indefinitely decrease the path length.
- Directed vs. Undirected Edges: Negative edge weights in undirected graphs inherently lead to negative cycles, making them problematic.
Handling Negative Edge Weights
- Directed Graphs Without Negative Cycles: For directed graphs that have negative edge weights but no negative cycles, alternative algorithms to Dijkstra's must be used.
- Bellman-Ford Algorithm: Efficient for single-source shortest paths in graphs with negative edge weights, capable of detecting negative cycles. Time complexity: (O(VE)).
- Floyd-Warshall Algorithm: Suitable for all-pairs shortest paths, works with negative edge weights and also detects negative cycles. Time complexity: (O(V^3)).
Summary
- Dijkstra's Algorithm: Not suitable for graphs with negative edge weights due to its inability to handle negative cycles and incorrect path calculations.
- Alternative Algorithms:
- Bellman-Ford: Good for single-source shortest paths with negative edge weights, includes negative cycle detection.
- Floyd-Warshall: Applicable for all-pairs shortest paths, handles negative edge weights and identifies negative cycles.
- Efficiency: Both Bellman-Ford and Floyd-Warshall are less efficient than Dijkstra's but necessary for certain graphs with negative edge weights.
Food for Thought - Heuristics and Search Algorithms
Foundations
- Dijkstra's Algorithm: Basis for several search algorithms.
- Impact on AI: Influences algorithms like UCS and A* for navigating state spaces.
AI Applications
- Graphs: Represent AI states and decisions.
- Search: AI uses these algorithms to reach goal states, e.g., maze exit.
Key Algorithms
- UCS: Focuses on reaching a single destination. It's a simpler form of Dijkstra's for large graphs.
- Heuristic Search: Uses heuristics to guide the search, notably in A*.
Heuristics
- Role: Guide search algorithms towards the goal.
- Example: Manhattan distance helps estimate grid distances by providing general direction of an exit for instance. Thus aiding A* - It can optimize c(S) + h(S) (cost func + heuristics func)
A* Algorithm
- Heuristics: Combines cost so far and estimated cost to goal.
- Performance: Depends on the heuristic. A* can outperform UCS and Dijkstra's with a good heuristic.
Conclusion
- Evolution: From Dijkstra's to heuristic-based searches like A*.
- Heuristics Importance: Crucial for efficient, goal-directed search in AI.
Efficiency in Graphs - Review
M15 Minimum Spanning Trees (MSTs)
MSTs
- Tree - acyclic connected graph
- Spanning tree - subgraph that is a tree while connecting every vertex in the graph
- Minimum spanning tree - spanning tree where sum of edge weights is the smallest possible
- Can be multiple spanning trees in a graph
- (Minimum spanning forest - when MST is applied to disconnected graph)
- Subgraph - Subset S of a graph G whose V and E are all subsets of G
Prim’s Algorithm
Key Idea
Key Idea: Very similar to Dijkstra, but only considers shortest distance of immediate edge (not cumulative like Dijkstra)
Errata:
- The fourth bullet point in the left box should state that the algorithm builds the MST outwards from a single component, one vertex at a time.
- A clarification to the third bullet point in the left box, MSTs in general do not exist for unconnected graphs. The third bullet point in the right box should state "shortest edge," not "shortest path."
Observing Graphs
Graph cut - takes a subset of v’s and all e’s connecting them
Requirements for Prim’s Algorithm
Pseudocode
Efficiency
Time Complexity
- Similarity to Dijkstra's: Time complexity for both algorithms will be the same, 𝑂(|𝐸|log|𝐸|).
- Priority Queue Analysis: The analysis hinges on the priority queue's implementation, typically a binary heap with O(|E|) entries and removals.
- Efficiency with Decrease-Key: With the decrease-key operation, the complexity is O((∣V∣+∣E∣)log∣V∣).
Java's PriorityQueue
- Lack of Decrease-Key: Java's PriorityQueue does not support the decrease-key operation
Cut Property
- Foundation of Prim's: The cut property states that any Minimum Spanning Tree (MST) must include the minimum cost edge crossing any cut of the graph.
- Algorithm Mechanism: Prim's algorithm operates by maintaining a set of visited vertices, creating a cut between visited and unvisited vertices. The priority queue, or "frontier," contains edges across this cut, prioritizing the minimum cost edge to expand the MST.
Key Points
- Prim's algorithm is efficient for constructing MSTs, leveraging a priority queue and the cut property.
- The algorithm's time complexity is closely related to that of Dijkstra's, with the primary difference being the focus on building an MST rather than finding shortest paths.
- The absence of the decrease-key operation in Java's PriorityQueue affects the algorithm's presentation and efficiency.
Example
Watch video:
Final MST has minimum distance to get from one v to another v
Warning: There is a point we'd like to make it clear that the shortest path problem solved by Dijkstra's algorithm and the MST problem solved by Prim's algorithm are mostly unrelated to each other. Albeit, the pseudocode of the two algorithms appears similar.
Greedy Paradigm
Definition
- Greedy Algorithm: Makes decisions based on local optimal choices at each step.
Examples
- Prim's and Dijkstra's: Both are greedy algorithms focusing on local optimality for MST and shortest paths, respectively.
- Kruskal's Algorithm: Another example, clearly demonstrating the greedy approach by selecting the smallest edge not forming a cycle.
Characteristics
- Simplicity: Conceptually straightforward, focusing on the best current option.
- Efficiency: Often provide polynomial time solutions.
Limitations
- Not Always Optimal: Local choices may not lead to the best global solution.
- Traveling Salesman Problem (TSP): Illustrates the failure of greedy strategies to guarantee optimal solutions.
Greedy Strategy in TSP
- Approach: Choose the nearest unvisited neighbor at each step.
- Outcome: Can result in suboptimal or even the worst possible route.
Advantages Despite Limitations
- Foundation for Complex Strategies: Greedy methods can be a starting point for more sophisticated algorithms.
- Good Approximations: Often yield solutions close to optimal in practice.
- Efficiency: Provide a balance between solution quality and computational cost.
Conclusion
- Greedy algorithms, while not always yielding optimal solutions, offer a simple and efficient framework for approaching a variety of problems. They serve as a valuable tool in algorithm design, especially when combined with other strategies to improve approximation quality.
Krukal’s Algorithm
Overview
Pseudocode
CSVizTool
Example
Video:
- Edges in orange are included in MST
- Edges in black were in minimum priorirty queue, but never considered.
Algorithm Steps
- Add every edge of the graph into a priority queue
- While the PQ != empty && MST size is not reached, dequeue an edge from PQ. If dequeued edge doesn’t form a cycle, add the edge to the MST
Motivation
- Cycle property of MSTs: Cycle property states, “if we consider any cycle in the graph and look at the edge of max weight in the cycle, then that edge won’t be contained in any MST.
- How cycle property proves correctness: Any edge that creates a cycle in the MST, based on cycle property, is excluded from the MST.
- Any dequeued edge before maximum weight edge, based on cycle property of MSTs, had lower weight.
Motivation for Disjoint Sets
- Visited sets don’t work: Visited sets, as used in algorithms like Depth-First Search (DFS), Breadth-First Search (BFS), Dijkstra's, and Prim's, are designed to track the nodes that have been explored from a single source point, ensuring that the algorithm does not revisit nodes and potentially create cycles or redundant paths. These algorithms build their solution outward from this single source, expanding the frontier of visited nodes until the desired goal (e.g., spanning tree, shortest path) is achieved.
- Kruskal’s use of multiple origin sources: Kruskal's algorithm does not build the MST from a single source. Instead, it considers all edges in the graph, sorted by weight, and adds them to the MST one by one, provided they do not create a cycle. This process does not inherently rely on a concept of "visited" nodes in the same way.
- Kruskal’s use of disjoint sets: Instead of using visited sets to prevent cycles, Kruskal's algorithm uses a Disjoint Set (Union-Find) data structure. This structure efficiently tracks which vertices are in which components of the growing forest. When considering an edge, if both vertices belong to the same set (component), adding that edge would create a cycle, so it is skipped.
- Edge vs Vertice Focus: Kruskal’s algo is more edge-centric while other algos are vertex-focused.
Example of where a visited set will fail (DF edge will create a cycle)
Disjoint Steps
Disjoint Set (Union-Find) Data Structure
- Purpose: Addresses the MST cycle issue by managing a partition of a universe set (U), ensuring efficient updates as subsets merge.
- Partition Definition: A collection of disjoint subsets of (U), where no element appears in more than one subset.
- Example: Given (U = {1, 2, 3, 4}), a valid partition is {{1}, {2, 3}, {4}}, while {{1, 2}, {2, 3}} is invalid due to overlap.
Kruskal's Algorithm Context:
- Universe Set (V): The set of vertices in the graph.
- Initial Partition: Each vertex is in its own subset, representing the state before any edges are added to the MST.
- Edge Addition: As edges are added, they connect vertices, effectively merging the subsets (clusters) those vertices belong to.
- Goal: Efficiently manage these merges to ensure no cycles are formed in the MST, using the Disjoint Set structure to track connected components.
Tree-based Solution
Data Structure Requirements:
- Check if 2 vertices are connected
- Merge/union connected components together efficiently so that all vertices belong to the same component.
Operations:
find(v)
: Check a “representative vertex” for 2 vertices and check if they are the same.
- How: To find the representative of 𝑣, follow the outgoing edge path to each ancestor until the root is reached. The root is the component's representative, so return the root. To implement this procedure, we will need efficient access to where 𝑣 is located in the disjoint set forest. This is accomplished by using a HashMap from vertex objects in the graph to node objects in the disjoint set forest.
union(u,v)
: Combine vertices with the same representative vertex
- How: To merge the sets that 𝑢 and 𝑣 belong to, first compute 𝑢𝑟𝑜𝑜𝑡←find(u) and 𝑣𝑟𝑜𝑜𝑡←find(v) to obtain their roots. Next, make either 𝑢𝑟𝑜𝑜𝑡 a child of 𝑣𝑟𝑜𝑜𝑡 or vice versa. This will ensure that the representative of the component is unique, and, as more and more clusters are unioned together, the number of children will continually increase.
How the operation works:
- Top left: Graph G
- Top right: Disjoint sets representing clusters in the graph
- Top row: Illustrates adding of an edge between vertex C and A to the MST, where C and A aren’t in the same cluster.
- Bottom row: Illustrates how clusters are merged.
Efficiency
- In the worst case, it is 𝑂(|𝑉|) for both find and union operations.
- Merging the trees once the roots have been found is 𝑂(1), but finding the roots can be up to 𝑂(|𝑉|)
Path Compression and Rank
- Goal: Improve
find
operation efficiency in Disjoint Set.
- Method: During a
find
operation, make each node on the path from a vertex to the root directly point to the root. This reduces the path length to 1 for future operations.
- Benefits: Significantly speeds up subsequent
find
operations by flattening the tree structure.
- Non-Recursive Alternatives: Path splitting and path halving (not covered in this course).
Union by Rank
- Goal: Optimize the
union
operation to maintain a balanced tree structure.
- Method: Use the height (or "rank") of trees to decide which root becomes the parent during a union. The root of the tree with the larger height becomes the parent of the root with the smaller height.
- Rank: An efficient approximation of the tree's height, used because path compression makes accurate height maintenance cumbersome.
- Implementation Changes:
- Initial Rank: Set to 0 for all nodes.
- Union Operation: Compare ranks to decide the parent-child relationship. If ranks are equal, choose either, but increment the rank of the new parent.
New find
and union
Summary
- Path Compression and Union by Rank are key optimizations for Disjoint Sets, enhancing the performance of
find
and union
operations.
- These optimizations help maintain a more flat and balanced tree structure, ensuring efficient operation of the Disjoint Set, especially in applications like Kruskal's algorithm for finding the Minimum Spanning Tree (MST).
Efficiency of Kruskal's Algorithm
Disjoint Set Optimizations: The find
and union
operations have an amortized cost of O(alpha |V|) where (alpha) is the inverse Ackermann function. This function grows extremely slowly, making these operations nearly constant time O(1) for all practical purposes.
Kruskal's Algorithm Efficiency:
- Cycle Detection: O(1)
- Adding edges to priority queue: Use BuildHeap algo which is O(|E|). If we end up dequeueing all edges from PQ, then worst case time complexity is O(|E| log |E|).
Overall Complexity: Since dequeuing and cycle detection dominate, the overall time complexity of Kruskal's algorithm is O(|E| log |E|).
Simplification for Simple Graphs: If the graph is simple (no self-loops or parallel edges), then |E| = O(|V|)^2. Simplifies to O(|E| log |V|)
Prim vs Kruskal’s Algorithm
Time Complexity
- Prim's Algorithm:
- General: (O(|E|\log|E|))
- With decreaseKey: (O((|V|+|E|)\log|V|))
- Kruskal's Algorithm: (O(|E|\log|E|))
Dense Graphs
- Both Algorithms: (O(|E|\log|E|)) for dense simple graphs.
- Prim's Advantage: Performs better in dense graphs, especially non-simple ones, due to fewer global edge dequeues and cycle checks.
Sparse Graphs
- Prim's: Becomes (O(|V|\log|V|)) due to (|V|) term dominance.
- Kruskal's: Remains (O(|E|\log|E|)). Generally performs better for sparse graphs, but struggles with disconnected graphs by processing every edge.
Special Cases
- Pre-sorted Edges: Kruskal's efficiency improves to (O(|E|)).
- Disconnected Graphs: Kruskal's naturally handles Multiple Spanning Forests (MSF); Prim's requires adjustments.
- Memory Considerations: Prim's may be preferred for its lower active memory requirement.
Other Considerations
- Borůvka's Algorithm (wiki): Not as well-known but significant for parallel computing.
Conclusion
- Choice Depends on Graph Density and Structure: Prim's is generally better for dense graphs, while Kruskal's is suited for sparse or when edges are pre-sorted.
- Other Algorithms: Exist and may be more suitable for specific scenarios, such as parallel computing.
M16 - Dynamic Programming
Key Idea: Store the solutions to subproblems so that when they are needed for reuse, we do not recompute the subproblem
Naive Fibonacci:
DP Fibonacci:
- typically, you find the recursive solution, then apply DP to it.
Dynamic Programming: More Formally
Dynamic Programming (DP) Overview:
- DP is a generalization of divide and conquer.
- Examples include the failure table construction for KMP and Dijkstra's algorithm.
- DP is effective for problems with overlapping subproblems, unlike divide and conquer algorithms like merge sort.
Fibonacci Example:
- Computing the nth Fibonacci number recursively leads to exponential time complexity (O(2^n)) due to overlapping subproblems.
- DP addresses this by storing solutions to subproblems (memoization), turning a recursive solution into a top-down DP approach.
- Top-down DP involves checking a stored array for the solution before computing; if not found, compute recursively, store, and return the solution.
- Bottom-up DP computes subproblems iteratively in order of complexity, starting with base cases, which is often more efficient than recursion.
Top-Down vs. Bottom-Up DP:
- Top-down is intuitive and useful when only a subset of subproblems needs to be computed.
- Bottom-up is preferred for its efficiency due to iteration over recursion.
- The choice depends on the specific problem and the need to compute all or some subproblems.
Nuances and Benefits of DP:
- DP solutions are easy to implement but identifying and combining subproblems is the challenge.
- Optimal substructure is key: solving optimal subproblems leads to solving the larger problem optimally.
- DP increases space complexity but significantly reduces time complexity, which is often a worthwhile trade-off.
- Turns O(2^n) time and O(1) space algo to a O(n) time and O(n) space algo.
- Space is generally cheaper than time, so this is often not an issue.
Underlying Implicit DAG in DP:
- Viewing subproblems as a directed acyclic graph (DAG) helps in understanding and applying DP.
- Edges represent dependencies (recursive calls), and nodes represent subproblems.
- A DAG ensures there's an order to compute subproblems, crucial for both top-down and bottom-up approaches.
- Top-down DP performs a depth-first search on the DAG, storing solutions, while bottom-up pre-sorts subproblems for efficient computation.
- This perspective aids in algorithm development, not implementation.
Key Takeaway: Dynamic Programming is a powerful technique for solving problems with overlapping subproblems and optimal substructure, using either a top-down (with memoization) or bottom-up approach to efficiently compute solutions while managing increased space complexity.
The P = NP Problem
Introduction
- Context: The P = NP problem is a central question in computer science, particularly in complexity theory.
- Importance: It distinguishes between "easy" problems (with polynomial time solutions) and "hard" problems (where no polynomial time solution is known).
Definitions
- Decision problem: a problem that has a yes or no answer.
- E.g. "Is the number composite?" or "Is there an occurrence of the pattern in the text?" are decision problems.
- P = NP is defined in terms of decision problems.
- P: Class of decision problems solvable in polynomial time.
- NP (nondeterministic polynomial time): Class of decision problems where, if the answer is YES, it can be verified in polynomial time.
- E.g. For the question "Is the number 𝑛 composite," a proof could be the prime factorization of 𝑛, and the verifying algorithm would just be to multiply the factorization together and check if we got back 𝑛.
- NP-Complete: Problems in NP for which every problem in NP reduces to them. Solving any NP-complete problem efficiently would solve all NP problems efficiently.
- NP-Hard: Problems as hard as the NP-complete problems but not necessarily in NP (may not be decision problems).
- If a problem𝐵 is in NP-complete, then for any problem𝐴 in NP, we have 𝐴≤𝐵.
- In plain-speak: These are problems where, if we obtained a polynomial-time algorithm for it, then we'd automatically have a polynomial-time algorithm for every problem in NP.
- If a problem satisfies the requirements of NP-completeness, but is not in NP (for example not being a decision problem), then it is called an NP-hard problem.
Why do we care
- NP is interesting because every problem in NP can be solved in exponential time. Thus, there can only be so many solutions.
- P = NP problem is asking: "Are these classes of problems the same?"
- we should see that𝑃 ⊆𝑁𝑃 because if the answer is YES, then the algorithm we used to solve the problem can be used to verify correctness.
- The more interesting question is 𝑃 ⊇𝑁𝑃.
- This direction is asking: "If we have an efficient way to check if an answer is correct, does that mean there exists an efficient way to compute the answer?"
Hard vs. Easy Problems
- Reductions: Used to compare problem difficulties. A problem A reduces to B if solving B (assumed easy) helps in solving A.
- Efficiency: An algorithm is considered efficient if it runs in polynomial time. However, not all polynomial time algorithms are practically efficient.
The P = NP Question
- Essence: Asks whether every problem whose solution can be quickly verified (NP) can also be quickly solved (P).
- Implications: A proof that P = NP could revolutionize computing, affecting fields like cryptography. Conversely, proving P ≠ NP would confirm the inherent difficulty of certain problems.
Consequences of P = NP
- Cryptography: Most modern cryptography relies on P ≠ NP. If P = NP, many encryption methods would become insecure.
- Computational Limits: Proving P ≠ NP would clarify the limits of what can be computed efficiently.
Perspectives
- Majority View: Most computer scientists believe P ≠ NP, suggesting a fundamental gap between problem-solving and solution verification.
- Philosophical Reflection: Scott Aaronson highlights the profound implications of P = NP for creativity, problem-solving, and human achievement.
Conclusion
- The P = NP problem remains unsolved, with significant implications for computer science, cryptography, and our understanding of computational complexity.
Exploratory Lab: Modeling Dynamic Programming
Problem 1: Largest Contiguous Sum
Suppose we have an array𝐴 of integers of length 𝑛. We'd like to compute the largest contiguous sum of the array, which is a block of numbers for which the sum is maximized. For example, suppose our array is [−1,52,−1,4,5,−3,1,3]. Our largest contiguous sum is denoted having a value of 5. In the worst case, you can always choose the empty sum, which gives a value of 0. Come up with a dynamic programming algorithm to compute the value of the sum.
- Hint 1: The straightforward solution we have in mind has a time and space complexity of 𝑂(𝑛). However, the space usage can easily be reduced to 𝑂(1).
- Hint 2: Rather than trying to have the subproblems 𝑆[𝑖] represent the largest contiguous sum present in the subarray 𝐴[0,…,𝑖], instead make it a more local measure that more easily allows for combining subproblems.
- Hint 3: For the subproblems, let 𝑆[𝑖] represent the largest contiguous sum ending at index 𝑖 in the subarray𝐴[0,…,𝑖].
Solution:
Scratch Pad:
- Loop over sets of 2 to n sets where n is sum of entire array.
- Each smaller loop keeps the sum of 2 saved, which is then used by loop of 3 sets
- Keep an additional data structure storing the set of number with the greatest max
so far.
More explanation of solution:
- Definition of (S[i]): (S[i]) represents the largest contiguous sum ending at index (i) in the subarray (A[0..i]). This means (S[i]) is the maximum sum of a subarray that ends exactly at position (i).
- Recurrence Relation: The key part of the solution is the recurrence relation: S[i+1]=max{S[i]+A[i+1],A[i+1]}.
- This relation states that the largest sum ending at (i) is either:
- the element at (i) itself (if adding the previous largest sum ending at (i-1) would decrease the sum), or
- it's the sum of the element at (i) and the largest sum ending at (i-1).
- Bottom-Up Computation: The solution is computed in a linear, bottom-up fashion, starting from the beginning of the array and moving towards the end. This means you calculate (S[0]), then (S[1]), and so on, using the recurrence relation to build up the solution.
- Returning the Result: The final result, which is the largest contiguous sum in the entire array, is the maximum value in the S array. This is because the largest sum subarray could end at any index.
Problem 2: Longest Increasing Subsequence
Suppose that we have an array𝐴 of length 𝑛 as input consisting of integers. A subsequence of the array is a sequence of some of the array entries in the order that they appear. For example, suppose that we have the array [1,7,3,5,2,8,10,24,−1,−5,4]. Then, 3,2,−1,−5,4 is a valid subsequence since the numbers appear in the sequence in array order, 1,7,10,8 is not a valid subsequence. An increasing subsequence is a subsequence where the numbers go in (strictly) increasing order. Come up with a dynamic programming algorithm to compute the length of the longest increasing subsequence of the array.
- Hint 1: The solution we have in mind has a time complexity of 𝑂(𝑛^2) and a space complexity of 𝑂(𝑛).
- Time Complexity (O(n^2)): This suggests that the algorithm likely involves a nested loop where for each element in the array, you're performing some operation that involves the other elements as well, leading to a quadratic time complexity.
- Space Complexity (O(n)): This indicates that you need an auxiliary array (or similar data structure) of size proportional to the input array to store information useful for solving the problem, such as the length of the longest increasing subsequence up to each index.
- Hint 2: Unlike the largest contiguous sum problem, we have some more leeway in allowing for potentially costly operations to combine subproblems. Consider non-𝑂(1) ways to combine subproblems.
- This hint suggests that unlike some dynamic programming problems where combining subproblems is straightforward and can be done in constant time, in this problem, combining subproblems might involve operations that take more than constant time, possibly involving iterations or comparisons that are dependent on the size of the subproblems.
- Hint 3: For the subproblems, let 𝑆[𝑖] represent the longest increasing subsequence for the subarray 𝐴[0,…,𝑖]. How can we compute 𝑆[𝑖+1] in 𝑂(𝑛) time using subproblems 𝑆[𝑗] for𝑗≤𝑖?
- Defining (S[i]): The hint defines (S[i]) as the length of the longest increasing subsequence for the subarray ending at index (i). This is a classic approach in dynamic programming where you solve smaller subproblems (in this case, finding the longest increasing subsequence up to each index) and use those solutions to build up to the final solution.
- Computing (S[i]) in (O(i)) Time: To compute (S[i]), you need to look at all previous (S[j]) values where (j < i). For each (j), you check if you can extend the longest increasing subsequence ending at (j) by including the element at (i) (i.e., if the element at (i) is greater than the last element of the subsequence ending at (j)). This involves iterating over all previous indices to find the best (S[j]) to extend, which takes (O(i)) time for each (i).
- Combining Subproblems: The process of finding the maximum (S[j]) that can be extended by the current element involves comparing the current element with elements at previous indices and possibly updating (S[i]) based on those comparisons. This is the "potentially costly operation" mentioned in Hint 2, as it involves iterating over previous subproblems to find the optimal one to extend.
Solution:
Scratch pad:
- Initial thought would be to approach the problem recursively, similar to merge sort.
- Loop i starts a A[0] and continues
- Loop j accumulates integers from i until array is non-increasing (n-1 > n)
- Longest series from A[n] is saved.
- Continue doing this, while checking if some longest running series is longer than
the current max. Replace if it is longer.
Problem 3: Rod Cutting
We have a rod of length 𝑛 centimeters. We'd like to chop up this rod into smaller segments to sell at the market, but the market only accepts rods of lengths ℓ1,…,ℓ𝑘 as marketable items. All of these lengths are integral in nature. Rods of length
ℓ𝑖 sell for cost 𝑐𝑖. Come up with a dynamic programming algorithm to maximize our potential profit.
- Hint 1: The solution we have in mind has a worst case time complexity of at most 𝑂(𝑘𝑛) and a space complexity of 𝑂(𝑛).
- Hint 2: Unlike the largest contiguous sum problem, we have some more leeway in allowing for potentially costly operations to combine subproblems. Consider non-𝑂(1) ways to combine subproblems. Additionally, a top-down approach may be easier to come up with here.
- Hint 3: For the subproblems, let 𝑆[𝑖] represent the maximum profit we can make with a rod of length 𝑖.
Solution:
Longest Common Subsequence
Definitions
How do we find the LCS?
Pseudocode
CSVizTool:
Uses:
- Used in revision control (Git, Google docs)
- Use to protect privacy of patients when patient d ata is used in research.
- Widely used in DNA sequencing.
Brief Example of LCS
Tracing Forward (orange arrow):
- Note the “empty” character at row 0 column 0.
Tracing Backward (blue arrow): LCS is [A, D]
More Complex Example
Case where there are multiple LCS
Tracing Forward:
- Helps to visualize outline regions where values change.
Tracing Backward: LCS is [T, O, A]
In-Depth Example of LCS
This example drives home the point that finding a longest common sequence is NOT unique.
Multiple paths can be traced in this example:
Blocking value changes makes it easier to visualize where the LCS backtrace can go.
- Any time a diagonal movement occurs, we add a character to the LCS.
One path, [C A C E]
Another path, [C, A, D, E]
Another path, [C A D E] - same LCS but different path.
and more....
Food-for-thought: Knapsack Problem
What is Knapsack (0-1 Knapsack version)
- We're a burglar shoplifting in a store (don't do this at home, kids!).
- Want to make as much money as we can from our shoplifting endeavor, so we've brought along a large knapsack to store the items.
- Our knapsack has a weight capacity of 𝑊, and the items have integral weights of 𝑤1,…,𝑤𝑛.
- The item with weight 𝑤𝑖 can be sold for value 𝑣𝑖.
- There is only one of each item (hence the name 0-1 knapsack).
- Q: What is the maximum value we can obtain from our burglary attempt?
Solution
- Tabulate a table 𝑆 of dimension (𝑊+1)×(𝑛+1), where 𝑆[𝑖][𝑗] represents the maximum value we can obtain with a knapsack of capacity 𝑖 and using the first 𝑖 items. This makes our base case: 𝑆[𝑖][0]=0 for all 0≤1≤𝑊.
- For convenience of notation, if we try to access 𝑆[𝑖][𝑗] and 𝑖 becomes negative, then just have it return 𝑆[0][𝑗].
- For our optimal substructure, we have 𝑆[𝑖][𝑗]=max{𝑆[𝑖][𝑗−1],𝑆[𝑖−𝑤𝑗][𝑗−1]+𝑣𝑗} since we can either take item 𝑗 or not. This can be computed easily bottom-up (or top-down), and we can return 𝑆[𝑊,𝑛] as our answer. This solution has a time complexity of 𝑂(𝑛𝑊).
How is this NP-Complete?
- Polynomial-Time Algorithm for NP-Complete Problems: The initial confusion arises from presenting a polynomial-time algorithm for a problem that is considered NP-complete. This seems to contradict the widely held belief that NP-complete problems do not have polynomial-time solutions unless P = NP.
- Distinction Based on Input Representation: The subtlety lies in how the input size is represented and understood. For many algorithms, including the one discussed, the complexity is polynomial in terms of the numerical value of the input (e.g., the weight capacity of the knapsack, (W)), not in the length of the input's representation (the number of digits or bits needed to represent (W)).
- Exponential in the Size of Input Representation: The algorithm's runtime is polynomial in the value of (W), but since (W) is represented in binary (or any base greater than 1), the actual size of (W) in terms of input bits is logarithmic ((log W)). Therefore, the algorithm is exponential in the size of the input representation, not polynomial.
- Weakly-NP-Complete Problems: The knapsack problem is highlighted as an example of a weakly-NP-complete problem. This category includes problems for which polynomial-time algorithms exist but only when the problem is expressed in terms of the numerical value of the inputs, not the length of their representation. This is a crucial distinction because it means that while these problems can be solved in polynomial time for small numerical values, the solution times can grow exponentially with the size of the input representation, maintaining their NP-complete status under standard definitions.
- Implications for P = NP: The existence of polynomial-time algorithms for weakly-NP-complete problems does not imply that P = NP. This is because the polynomial time is relative to the numerical value of the input, not the length of the input's representation, which is the standard measure used in complexity theory.
Bellman-Ford Algorithm
Pseudocode
The underlying principle of the Bellman-Ford algorithm is simple: every shortest path between two vertices will use at most ∣V∣−1 edges. With this in mind, here's the algorithm.
def bellmanFord(source):
Initialize dist, a mapping from vertices to distances.
for each vertex v:
distMap[v] = ∞
dist[source] = 0
for i = 1, 2, ..., |V| - 1:
for each directed edge (u, v, w):
if dist[v] > dist[u] + w
dist[v] = dist[u] + w
return dist
- The algorithm goes through all |𝐸| edges |𝑉|−1 times, updating the shortest path found up to that point in the algorithm.
- By doing so, on iteration 𝑖 the algorithm computes the shortest paths from the source to all other vertices using at most 𝑖 edges.
- Since every shortest path will use at most |𝑉|−1 edges, by the end, we will have computed all of the shortest paths
- In terms of dynamic programming, this algorithm is formulated bottom-up, where the subproblems are defined by the destination and the max edge length of the path.
- Complexity:
- Time: O(|V| * |E|) - nested loop
- Space: O(|V|) - space needed to store the distance map
Negative Cycle Detection
- How: Run the algorithm for |𝑉| iterations rather than |𝑉|−1 iterations. In final iteration, if distance map is updated due to finding a shorter path, that means a negative cycle is present.
- Why does this work? We found a shorter path uses |V| edges, which is only possible if we repeated at least one edge in the path.
Example
Below is an example worked out on a small graph. As you can see, on iteration 𝑖, we just find the shortest path that uses 𝑖 edges and update it if it's shorter.
Food-for-thought: Application: Routing Protocols
Routing protocols like Dijkstra's algorithm and Bellman-Ford algorithm play a crucial role in small internal networks. Networks are modeled as weighted graphs with routers as vertices and physical links as edges. The weights can represent distance, cost, etc. The main goal of routing algorithms is to guide routers on forwarding packets based on routing information.
The goal of routing algorithms is to help routers determine what link to send packets along based on the packet's routing information (source, destination, etc.). Each router has only a local view of the network, seeing only what routers it is physically linked to. This is usually done by forwarding tables, which tell the router where to send the packet based on that information. The problem is: How can each router make informed decisions on where to forward packets with only a local view of the network? Below are two approaches.
Centralized Approach (Link-State)
- Mechanism: Each router maintains a global network view.
- Algorithm: Utilizes Dijkstra's algorithm for routing tables.
- Maintenance: Global view updated through broadcasts or centrally at a computing center.
- Challenges: Keeping the network view updated.
Decentralized Approach (Distance-Vector)
- Mechanism: Routers operate without a global network view.
- Algorithm: Employs Bellman-Ford algorithm for finding shortest paths.
- Operation: Routers update routing tables iteratively based on shortest paths.
- Challenges: Ensuring path accuracy and timeliness without a global view.
Both algorithms, despite their simplicity, are vital for efficient network routing, each fitting different network management strategies.
Floyd-Warshall Algorithm
What is it:
- Solves the all-pairs shortest paths problem and works with negative edge weights
Defining the Algorithm
Key traits:
- Works under the assumption that the graph is represented using an adjacency matrix
- Underlying principle is simple:
- Any shortest path can be written as the sum of two shortest paths between some intermediate vertex.
- If not, then we could improve the shortest path by splitting that path along the intermediate vertex and using the shorter one.
- Even if the shortest path is a direct edge (𝑢,𝑣,𝑤), we can decompose this as the shortest path from 𝑢 to 𝑢 of length 0, then the shortest path from𝑢 to 𝑣 of length 𝑤.
How it works:
- Defines a distance matrix to store all shortest distances between pairs of vertices found so far, and initializes to adjacency matrix.
- Subproblems defined in terms of what vertices are allowed to be intermediate points.
- E.g. For convenience, let's label the vertices 0,1,…,|𝑉|−1 so that we don't need to worry about mapping vertices to integers. In the first iteration, we compute the shortest paths going through only vertex 0 as an intermediate point. In the second iteration, we compute the shortest paths going through only vertices {0,1} as intermediate points.
- The process is repeated for |V| iterations until in iteration |V|, we compute shortest paths going through any vertex as an intermediate point.
- Key point: Shortest paths are split along intermediate vertices, which is O(1) per pair per iteration cost.
def floydWarshall(adjMatrix):
Initialize dist, a |V| x |V| matrix initialized to take the values of adjMatrix.
for each vertex vj:
for each vertex vi:
for each vertex vk:
if dist[vi][vk] > dist[vi][vj] + dist[vj][vk]
dist[vi][vk] = dist[vi][vj] + dist[vj][vk]
return dist
Complexity
- The algorithm iterates through all pairs of vertices |V| times.
- There are O(∣V∣2) pairs (including degenerate pairs of the same vertex), so the time complexity is O(∣V∣3).
- The space complexity is O(∣V∣2)
- This seems expensive, but consider that the problem has a lower bound of Ω(∣V∣2) to just write down the output. This is much more efficient.
- In terms of dynamic programming techniques, this algorithm is formulated bottom-up, where the subproblems are defined by the set of intermediate vertices allowed and the pair for the shortest path.
Negative Cycle Detection
- If there is a negative cycle, then some negative distance path will be found for some vertex pair (𝑖,𝑖). These are the values along the diagonal of the distance matrix.
- If the diagonal values ever change/become negative, then that signals to the algorithm that there is a negative cycle
- If we let the algorithm keep running in perpetuity, then we'd find that all paths in the graph would become shorter and shorter forever.
Example
Hamilton and Euler: "Food for Thought"
Hamilton Cycles and Euler Circuits
- Hamilton Cycle: A cycle that passes through all vertices exactly once. Determining if a graph has a Hamilton cycle is an NP-complete problem.
- Euler Circuit: A circuit that passes through all edges exactly once. A graph has an Euler circuit if all vertices have an even degree and the graph is connected, which can be checked in polynomial time, making this problem in P.
Key Differences
- Complexity: Euler circuit problems are solvable in polynomial time, while Hamilton cycle problems are NP-complete.
- Criteria for Existence: Euler circuits have a simple, easily verifiable criterion (even degree and connectivity), unlike Hamilton cycles.
Similar Formulations, Different Difficulties
- Despite their similar formulations, these problems differ significantly in difficulty. This highlights that similar-looking problems can have vastly different computational complexities.
- Shortest vs. Longest Path: Another example of similarly formulated problems with different difficulties. The shortest path problem is in P, while the longest path problem is NP-Hard.
Module 15 Review
Key Idea: One useful property of dynamic programming is that when applicable, it can turn simple exponential time algorithms into polynomial time ones
DP Algorithms we covered:
- Dijkstra
- KMP
- LCS
- Bellman-Ford
- Floy-Warshall
Big O - Complexity
- Given strings of length (m) and (n), the LCS algorithm has a time complexity of (O(mn)) and a space complexity of (O(mn)).
- Given a graph (G=(V,E)), the Bellman-Ford algorithm solves the single source shortest path problem with a time complexity of (O(|V| * |E|)) and a space complexity of (O(|V|)).
- Given a graph (G=(V,E)), the Floyd-Warshall algorithm solves the all-pairs shortest path problem with a time complexity of (O(|V|^3)) and a space complexity of (O(|V|^2)).