CS8001 - ODA Notes

Problem Sets - what to expect

Table of Contents


M3 - Stacks/Queue ADTs

Priority Queues

We will very briefly discuss a tangentially related concept to Queues, called the Priority Queue. The topic is placed here to help broaden you awareness of the ideas that can emerge from the topic. This is an important growth mindset technique to think about data structures and behavior in order to create different structures. We will not be discussing the typical implementation of them until later in Module 6
Screen Shot 2024-05-29 at 11.28.11 PM.png
Screen Shot 2024-05-29 at 11.29.05 PM.png
Screen Shot 2024-05-29 at 11.29.16 PM.png
Screen Shot 2024-05-29 at 11.29.46 PM.png
Screen Shot 2024-05-29 at 11.30.12 PM.png
Screen Shot 2024-05-29 at 11.30.57 PM.png
Screen Shot 2024-05-29 at 11.31.29 PM.png

Priority Queues: A Linear Model?

In the video, we briefly mentioned that Priority Queues are a linear ADT. This is partially true since we can think of the data being placed linearly in sorted order based on priority. For queues, this was precisely the case, except that the priority was the order in which data was enqueued. However, in practice, a linear model doesn't work very well for implementing Priority Queues. Let's consider the idea of implementing a Priority Queue by maintaining a sorted list. We have two options to consider, an array-backed sorted list or a linked sorted list.

For the array-backed version, it would similar to an ArrayQueue where the element with the highest priority could be at the front, while the element with lowest priority could be at the back. With this model, dequeuing and peeking would be 𝑂(1) operations like in the ArrayQueue. However, in general, enqueuing would be an 𝑂(𝑛) procedure, even if we consider amortizing the resize cost. This is because if we wanted to enqueue, we'd need to maintain the sorted order of the data, so we'd need to shift data around to place it. There are other variations we could use such as flipping the priority order and keeping the data zero-aligned, but they all lead us to similar conclusions.

For the linked version, it would again be similar to a LinkedQueue where the element with highest priority would be at the head while the element with lowest priority would be at the tail. Once again, peeking and dequeuing operations are 𝑂(1), but enqueuing would be an 𝑂(𝑛) operation since we need to find the location to insert the data.

So, with our current range of ideas, it appears that implementing a Priority Queue efficiently is out of reach. We will begin developing the ideas that we do need starting with Module 6, in the near future!

Deques (double-ended queue “deck”)

We end the section by discussing a natural extension of both Stacks and Queues, the Deque ADT. Rather than specializing in a specific add and remove behavior like Stacks and Queues, Deques support add and remove operations from either end of the data. The techniques involved in implementing the deque have already been discussed, so we will simply upcycle the ideas to show how it can be done.
Screen Shot 2024-05-29 at 11.51.08 PM.png
Screen Shot 2024-05-29 at 11.51.59 PM.png
Screen Shot 2024-05-29 at 11.51.21 PM.png
Screen Shot 2024-05-29 at 11.52.21 PM.png
Screen Shot 2024-05-29 at 11.52.32 PM.png
Screen Shot 2024-05-29 at 11.53.20 PM.png
Screen Shot 2024-05-29 at 11.54.26 PM.png
Screen Shot 2024-05-29 at 11.55.14 PM.png
Found in online purchasing system.

Java's % Operator

Screen Shot 2024-05-29 at 11.44.15 PM.png

CSVIS

Screen Shot 2024-05-29 at 11.56.31 PM.png
Screen Shot 2024-05-30 at 12.04.14 AM.png
Screen Shot 2024-05-30 at 12.04.37 AM.png

Screen Shot 2024-05-30 at 12.05.03 AM.png

M4 - BST Introduction

Introduction to Tree ADT

Screen Shot 2024-05-30 at 12.53.05 AM.png
Screen Shot 2024-05-30 at 12.53.18 AM.png
Screen Shot 2024-05-30 at 12.55.29 AM.png
Screen Shot 2024-05-30 at 12.57.38 AM.png

Was that an ADT?

In the first few modules, we made a large distinction between ADTs and data structures. To be clear, ADTs are high-level descriptions of a model, whereas a data structure is the low-level, detail-oriented implementation of a model. Up until now, the distinction was stark because there were multiple implementations for each ADT, and it served as a nice way of introducing the concepts.

However, moving forward, we'll be much looser with that distinction. We do this for multiple reasons, however, the primary reason is that the distinction will more inhibit learning than helping it in the topics to come. Some of the upcoming topics can be difficult to grasp. We would prefer for you to focus on these topic complexities, rather than worry about the semantics of whether a Tree is an ADT or a data structure (it can be either depending on the language used). We will still distinguish between the ADTs in the diagrams shown in the module review sections, but they are not as important as they were in previous modules.

Binary Trees

Screen Shot 2024-05-30 at 1.00.14 AM.png
Screen Shot 2024-05-30 at 1.01.49 AM.png
Screen Shot 2024-05-30 at 1.02.56 AM.png
Screen Shot 2024-05-30 at 1.03.18 AM.png
Screen Shot 2024-05-30 at 1.03.57 AM.png

Binary Search Trees

Screen Shot 2024-05-30 at 1.05.56 AM.png
Screen Shot 2024-05-30 at 1.07.24 AM.png
Screen Shot 2024-05-30 at 1.07.47 AM.png

CSVIZ

Screen Shot 2024-05-30 at 1.11.28 AM.png

Screen Shot 2024-05-30 at 1.12.14 AM.png

Screen Shot 2024-05-30 at 1.12.46 AM.png

Preorder Traversal

Screen Shot 2024-06-01 at 6.12.28 PM.png
Screen Shot 2024-06-01 at 6.14.17 PM.png
Screen Shot 2024-06-01 at 6.16.17 PM.png
(when left mark is reached, record data)

Postorder Traversal

Screen Shot 2024-06-01 at 6.17.22 PM.png
Screen Shot 2024-06-01 at 6.19.39 PM.png
Screen Shot 2024-06-01 at 6.21.54 PM.png
(when right mark is reached, record data)

Inorder Traversal

Screen Shot 2024-06-01 at 6.22.54 PM.png
Screen Shot 2024-06-01 at 6.24.48 PM.png
Screen Shot 2024-06-01 at 6.26.07 PM.png

Levelorder Traversal

Screen Shot 2024-06-01 at 6.27.50 PM.png
Screen Shot 2024-06-01 at 6.30.01 PM.png
Screen Shot 2024-06-01 at 6.31.36 PM.png

Choosing a Traversal

We have covered four different traversals for Binary Trees in this module. It is important to note that these traversal concepts can easily be extended to general trees. After all, three of the traversals are derived from simply changing up the order the "look at node" operation is performed with respect to the recursive calls. The procedures within each traversal are defined for Binary Trees. However, the properties of the traversals are exemplified when specifically examining the data returned from BSTs.
Let's take the time to highlight some of their uses and distinguishing properties. So, what makes each of these traversals unique?
We hope that this helps you keep track of what each traversal does. But more importantly, we hope that you look at these four traversals and think about what else you can change up to get the behavior and ordering that you want. For example, you could try traversing right first before traversing left in an inorder fashion to get the list is descending order. The procedures we've given you are the bare bones skeletons, which you can start modifying and tailoring to your needs as they arise, so have some fun experimenting!

M5 - BST Operations

We are going to explore the power of the tree data structure, specifically Binary Search Trees, BSTs. We discuss the motivation for BSTs and how they relate to the binary search algorithm. We present efficient procedures for searching, adding, and removing in a BST. Similar to how binary search works, BSTs work by splitting the data into subtrees with about half of the data as the level above it, yielding running times on average.
At the end of the module, we introduce an example of a probabilistic data structure known as a SkipList at the end of the module. Though the basis of SkipLists and their structure is different from BSTs, they are actually very similar in how their search procedure works, which is why we've chosen to introduce them here.

Learning Objectives

BST Searching

Screen Shot 2024-06-01 at 7.07.09 PM.png

Best and Average Case Complexity Analysis

Screen Shot 2024-06-01 at 7.11.25 PM.png
Screen Shot 2024-06-01 at 7.11.38 PM.png

BST Adding

Screen Shot 2024-06-01 at 7.12.53 PM.png
Screen Shot 2024-06-01 at 7.15.30 PM.png
Screen Shot 2024-06-01 at 7.16.00 PM.png
Screen Shot 2024-06-01 at 7.18.24 PM.png

Pointer Reinforcement Pseudocode

Screen Shot 2024-06-01 at 7.22.28 PM.png
Screen Shot 2024-06-01 at 7.19.44 PM.png

BST Removing

Screen Shot 2024-06-02 at 3.55.50 PM.png
Screen Shot 2024-06-02 at 3.57.37 PM.png
Screen Shot 2024-06-02 at 3.58.23 PM.png
Screen Shot 2024-06-02 at 4.00.17 PM.png
Screen Shot 2024-06-02 at 4.01.11 PM.png
Screen Shot 2024-06-02 at 4.01.52 PM.png
Screen Shot 2024-06-02 at 4.03.06 PM.png
Screen Shot 2024-06-02 at 4.24.10 PM.png

Screen Shot 2024-06-02 at 4.23.53 PM.png

Pointer Reinforcement Remove Pseudocode

Screen Shot 2024-06-02 at 4.04.55 PM.png
Screen Shot 2024-06-02 at 4.06.30 PM.png
Screen Shot 2024-06-02 at 4.08.50 PM.png

END OF PROBLEM SET 1

M6 - Heaps

Let's look at the "cousin" to BSTs in the Binary Tree group, (binary) Heaps. Heaps are a binary tree-type data structure that prioritize access to either the smallest or largest item in the data structure depending on the heap type. In a sense, heaps are a specialist data structure most commonly coupled with the Priority Queue ADT. Whereas, BSTs are a generalist data structure useful for a lot of different operations.

An interesting caveat to heaps is that unlike BSTs, they are commonly implemented using backing arrays. There is an additional shape property constraint for heaps on top of the binary tree shape which lends itself to an array implementation. This allows us the opportunity to explore how an array implementation may work for a tree-type data structure. We'll explore this through three operations: adding, removing, and building a heap from a group of unordered data.

Learning Objectives

Students will:

Intro to (Binary) Heaps

Here we present an introduction to a new data structure, (binary) heaps. The term binary is in parentheses because there are multiple types of heaps, and we are discussing a particular heap. Binary heaps are the most common type of heap. Thus, when a heap is mentioned, most often it is usually referring to a binary heap.

Heaps are another binary tree type of data structure with the additional shape property of completeness. We will study this concept in this lesson. Completeness is a very nice property that lends itself to an array-implementation, which is usually faster at a low-level when compared to a tree node implementation. Heaps also have a different order property when compared to BSTs. The heap order property changes the way we need to look at basic operations such as add and remove.
Screen Shot 2024-06-02 at 4.46.37 PM.png

Order Property

In a heap, the parent node is always larger (in the case of a maxheap) or smaller (in the case of a
minheap) than its children. There is no relationship between the children.

Shape Property

A heap is always complete. At any point, each level of the tree is completely filled, except for the
last level which may be filled from left to right with no gaps.
Screen Shot 2024-06-02 at 4.47.17 PM.png
Screen Shot 2024-06-02 at 4.48.21 PM.png
Screen Shot 2024-06-02 at 4.48.09 PM.png
Screen Shot 2024-06-02 at 4.51.13 PM.png
Screen Shot 2024-06-02 at 4.52.01 PM.png

Heap Operations

Adding

Screen Shot 2024-06-02 at 5.02.57 PM.png
Screen Shot 2024-06-02 at 5.01.07 PM.png
Screen Shot 2024-06-02 at 5.02.04 PM.png
Screen Shot 2024-06-02 at 5.58.10 PM.png

Removing

Screen Shot 2024-06-02 at 5.07.27 PM.png
Screen Shot 2024-06-02 at 5.06.10 PM.png
Screen Shot 2024-06-02 at 5.05.57 PM.png
Screen Shot 2024-06-02 at 5.06.52 PM.png
Screen Shot 2024-06-02 at 5.07.12 PM.png
Screen Shot 2024-06-02 at 5.59.57 PM.png

Performance

Screen Shot 2024-06-02 at 6.00.36 PM.png

New Operations? New Heaps - Food for Thought

Screen Shot 2024-06-02 at 5.30.15 PM.png

Build Heap Algorithm

Screen Shot 2024-06-02 at 5.32.37 PM.png
Screen Shot 2024-06-02 at 5.33.07 PM.png
Screen Shot 2024-06-02 at 5.34.04 PM.png
Screen Shot 2024-06-02 at 5.34.25 PM.png
Screen Shot 2024-06-02 at 5.36.07 PM.png
Screen Shot 2024-06-02 at 5.36.56 PM.png
Screen Shot 2024-06-02 at 5.38.08 PM.png
Screen Shot 2024-06-02 at 5.38.56 PM.png
Screen Shot 2024-06-02 at 5.39.35 PM.png
Screen Shot 2024-06-02 at 5.40.09 PM.png
Screen Shot 2024-06-02 at 5.40.43 PM.png
Screen Shot 2024-06-02 at 5.41.39 PM.png
Screen Shot 2024-06-02 at 5.41.47 PM.png
Screen Shot 2024-06-02 at 5.42.08 PM.png
Screen Shot 2024-06-02 at 5.43.18 PM.png

M7 - Hashmaps

Now, we take a look at HashMaps (HashTables), one of the most commonly used data structures in computer science. HashMaps are very useful if we want to search for a specific data in our data structure. This is helpful when we'd like to check for membership of data where we're using the search data as a unique identifier for some other data (in which case the searched data is called a key, and the implemented ADT is the Map ADT), and with the Set ADT.

Looking back at our toolbox, the best time complexity we have for this problem so far is on average 𝑂(log⁡(𝑛)) via BSTs. Remarkably, we can do better, with 𝑂(1) search times on average! In this module, we'll be introducing the big idea of how we achieve this incredible efficiency, known as hashing. We won't be rigorously proving the average efficiency, but we hope to provide you with an understanding of the mechanisms of hashing, the flexibility such a technique brings, as well as some intuition for why the idea works.

Learning Objectives
Students will:

Introduction to HashMaps

Screen Shot 2024-06-11 at 9.43.22 PM.png
Screen Shot 2024-06-11 at 9.45.05 PM.png
Screen Shot 2024-06-11 at 9.46.01 PM.png
Screen Shot 2024-06-11 at 9.48.26 PM.png

The Map and Set ADTs

Let's first recap what we know about the Map ADT. Maps are defined by multiple key-value pairs (denoted
<K,V>), satisfying the following properties:
Now, what operations do maps typically support? There are other operations such as returning the size or clearing the data structure that might be useful, but we omit them for brevity.
Now, what exactly is a set? The Set ADT can be thought of as exactly what a Map is, except we only have keys as our data and no values. If you're familiar with what a set is in math, then you can think of them as much the same thing here, except that these sets are typed to hold only one type of data. Similarly, you may sometimes hear people say the term dictionary to refer to a Map; in particular, Python uses the term dictionary to refer to this idea in the language.

If you think back to some of our earlier data structures such as BSTs, you can think of them as implementing the Set ADT or the Map ADT. If we search in a BST to check for membership of data, then it's a Set. If we're looking up some identifier tied to other data (such as an id number tied to a user record), then it's a Map. For most of the examples in this module, we will omit the values and only look at the keys. This is because there is very little restriction imposed on the values, so any search algorithm will depend solely on the keys.

Spec'ing our HashMap

In the video in the previous section, we touched briefly on the idea of collisions, and as you progress through the module, you will find that collisions are the source of most inefficiencies in HashMaps. Most of our time will be spent on how to resolve collisions (collision resolution strategies), but the truth of the matter is that much of the work of a good HashMap occurs even before all of these collison strategies! The old adage "Prevention is the best medicine" applies here very well, so let's take a quick look at some things to consider when designing a HashMap.

Picking Hash Functions

The first thing we're going to look at is hash functions. Formally, a hash function is a function ℎ:𝐾→𝑍, which means a mapping from keys to integers. This step is necessary because our keys might not even be numerical like the phone number example from the video. For example, if our key is a String, then we need a function to convert from a String to an Integer.

As for the requirements of this function, we technically only require that if two keys, 𝑘1 and 𝑘2, are equal, then their hash function evaluations, ℎ(𝑘1) and ℎ(𝑘2) are equal. So, technically, the zero-function, where we map every possible key to 0 is a valid hash function. However, it's not a very good hash function. We want the reverse to be as close to true as possible as well: if the hash function evaluations are equal, then the keys are also equal. In some cases, this may be possible, but in many cases, it may not be possible without infinite memory (or may not be possible at all). After all, what good is a perfect hash function if we can't store it in memory?

Let's take a look at our example of converting String keys into Integers. One reasonable way to do this is to just assign each character a numerical value, then add those values up. For example, if we assigned the letter A the value of 1 and the letter B the value 2, then the string ABAB would have the hash value of 1+2+1+2=6. This is an okay hash function, but it has a weakness. It doesn't take the ordering of characters into account at all, so any string of length 4 with an equal number of A and B characters will collide into the same hash value. One way we can fix this is by taking a page out of our number system and multiplying these values by something based on their position before adding it up. For example, if we multiplied it by powers of 10 based on the position from right to left, then we get 1×103+2×102+1×101+2×100=1212.

This was just one example of a hash function from Strings to Integers, but there are plenty of other reasonable ones as well. In Java, hash functions are written in the hashcode() method, which is why Java always tells you to override it when you override the equals() method!

TLDR on .hashcode():
There are strategies to create a hashcode implementation that aims at minimizing collisions. To see more about it, take a look at the class below (credit for the file to our colleagues in CS 1331). This class has an implemented class Vehicle which includes the overridden .hashcode() method which is aimed at minimizing collisions.
Vehicle.java 

Other Considerations

Once we've picked our hash function, we also have three other things to consider: compression function, table size, and load factor. The compression function's job is to shrink and shift the output of the hash function so that it fits into the range backing table (array). This is because for most applications, it's not reasonable to have a table of capacity the largest hash function value. The most common compression function you will see is to simply mod by the table length, though there can be others.

Since the compression function often uses the table size in order to "spread out" our data in the specified range, it's also common to have the table size be a prime number in order to minimize collisions due to compression. There are again, other choices for table sizes, but most often, HashMap implementations will "lazily attempt" to use a prime number for the table length without actually checking if it's prime or not. Another popular choice for table sizes is to use a power of two since it makes the modulo operation more efficient (this is out of scope of the course, but prime numbers are the most expensive for divide-type operations, whereas powers of two are efficient thanks to bitmasks).

Finally, we have the load factor, which is defined as the size over the capacity. This parameter has to do with how high we're willing to let the load factor become before we resize the table, and it varies quite a bit in implementation. It's also very dependent on how we decide to handle collisions. The intuition here though is that as the load factor increases, the table will become inundated with entries, and it becomes more likely for a collision to occur. If we keep the threshold low, then we can almost guarantee 𝑂(1) search times, but we may have to resize more often, and we will use more memory than we may want to.

As you can see, a lot goes into designing a good HashMap. There are countless papers and studies out there looking at what hash functions work best in practice and in theory, so it's a very interesting topic to look at! It also speaks to how important these ideas are; these ideas are quite flexible and general, so they can be applied in data structures other than HashMaps.

External Chaining

Now begins our exploration into various collision handling policies. These policies can be broadly categorized into two different types:
We will discuss one closed-addressing policy and three different open addressing policies. The first policy we discuss is the simplest one in this lesson, external chaining, a form of closed addressing.

What is External Chaining?


Screen Shot 2024-06-11 at 10.05.18 PM.png

Resizing with External Chaining

Screen Shot 2024-06-11 at 10.11.08 PM.png

Linear Probing, Example 1 (Adding)

What is probing?
What is Linear Probing?
Screen Shot 2024-06-11 at 10.21.12 PM.png

Screen Shot 2024-06-11 at 10.21.28 PM.png

Screen Shot 2024-06-11 at 10.21.49 PM.png

Screen Shot 2024-06-11 at 10.22.25 PM.png

Wraparound technique:  50 ends up at first index (idx + 6 steps % 7).
Screen Shot 2024-06-11 at 10.23.15 PM.png

Linear Probing, Examples 2 and 3 (Searching + Removing)

The next operations we take a look at are searching and removing from a HashMap. As it turns out, removing from a HashMap doesn't work if we remove items from the array normally. So, we perform what we call is a soft removal. Soft removals are where we leave the entry there, but mark it with a flag to denote that the entry is removed. Such entries are called DEL markers (some sources refer to them as tombstones ). These are entries that exist in the array in order to assist in HashMap operations, but from the user's perspective, looking at the HashMap as a black box, these entries do not exist at all since the user has already removed them.

Removing from HashMap

Screen Shot 2024-06-11 at 10.28.43 PM.png

Putting into DEL Index

Probing Scenarios:

Example steps:
Screen Shot 2024-06-11 at 10.32.15 PM.png

Resizing Backing Array (for Linear Probing)

When backing array surpasses threshold:

The Efficiency of DEL Markers

One question to keep in mind, as you consider the use of DEL Markers for removing, is how these DEL markers impact efficiency. The put/remove/search operations all have termination conditions for whether the key is found in the map or not. If the key is in the map, then all is well. However, the more complex problem is when the key is not in the map. There two termination conditions for when the key is not there, whichever comes first:
The second case is especially pathological, but it can in fact happen even if the Max Load Factor threshold is set reasonable. As an example, suppose we have a table of length 13, and one by one, we add and remove the keys
0,1,…,11,12 from the table. This leaves us with a final table of size 0 with no null entries, all while never having the load factor be larger than 0.08. This configuration is terrible because it guarantees that any put/remove/search is guaranteed
𝑂(𝑛) performance; just goes to show that DEL Markers worsen the efficiency of our operations. This is also why when we resize, we get rid of any DEL Markers in the process.

There are ways to try to avoid this, such as including DEL Markers in the load factor calculation, or triggering an intentional resize or reconstruction of the table if we know beforehand we'll be removing a lot. We won't be delving into more detail than this, but it is worth keeping in mind the cost that DEL Markers have on performance.

Quadratic Probing

Previously, we saw the most basic of open addressing techniques, linear probing. If you do some examples with it, you will find that linear probing has the problem of developing contiguous blocks of data. These blocks of data can be problematic as they get larger, because any search for data in these indices will necessarily continue until we hit the end of that block.

image.png
For example, in the diagram above, indices 3 - 7 are occupied by either data or DEL Markers. If a put/remove/search operation occurs at any of these indices, then the operation continues until it reaches index 8, the green cell, giving a degenerated linear performance. This problem is known as primary clustering (or turtling), and the effect can be mitigated by quadratic probing, which is seen in this lesson.

What is Quadratic Probing?

Example 1

Screen Shot 2024-06-11 at 11.00.19 PM.png

Screen Shot 2024-06-11 at 11.00.41 PM.png

Screen Shot 2024-06-11 at 11.01.09 PM.png

Solutions to Infinite Probing

What happens when quadratic probing keeps finding occupied indices (infinite probing)?  (See value 22 in Example 1)

The Complexities of Quadratic Probing

As we saw in the video, while quadratic probing is a simple modification of the linear probing schemes, it brings with it some more complexities and complications with it. For one, it was proposed to solve the problem of primary clustering, but quadratic probing suffers from something called secondary clustering, which is where keys belonging to different indices collide in quadratic steps rather than linear ones. It isn't as obvious as in linear probing since there isn't a huge block of data, but it does still exist.

The most pervasive issue of quadratic clustering is the infinite probing issue seen in the video. There are two approaches to this problem:
The first one is a natural solution to the problem, but it's not without its potential issues as well. A resize operation will involve rehashing keys into a new table, but rehashing can again cause the infinite probing issue. There are some pathological cases where you can setup resizes within resizes, nested however many times you like, though this is quite rare.

The second option will have to be a bit mysterious since a number theory background is not a prerequisite for this course, but some hashing schemes can eliminate the infinite probing case entirely. For example, if we enforce that our table length is a prime number, and we maintain a max load factor of 0.5, then we can guarantee that it will not happen.

Double Hashing

What is Double Hashing?

Example 1

Screen Shot 2024-06-11 at 11.27.29 PM.png

Screen Shot 2024-06-11 at 11.27.51 PM.png

Screen Shot 2024-06-11 at 11.28.14 PM.png

Screen Shot 2024-06-11 at 11.28.43 PM.png

Screen Shot 2024-06-11 at 11.29.27 PM.png

More thoughts on double hashing

As we saw in the example, double hashing can be a powerful framework, so long as it is implemented carefully and efficiently. With a good secondary hash function (one that spreads out the keys and is relatively independent of the primary hash function), even pathological cases where we have many collisions can be prevented unlike linear and quadratic probing.

We would like to make a few remarks that are a bit more subtle, some of which are out of the scope of the course before moving on:
This brings us to the end of the collision strategies we'll be covering in this course. In the next lesson, we'll be doing a wrap-up to help compile all the things we've discussed in this module!

Collision Handling: A Summary

We've covered a lot of ground in this module, the basics of HashMaps, in addition to the four different collision handling strategies. We presented them in the following order: external chaining, linear probing, quadratic probing, and double hashing. This ordering was chosen for two reasons: (1) we went from simpler concepts to more complicated concepts and (2) we went in (rough) order of popularity.

In practice, you will mostly see external chaining and linear probing (or some variation of them). External chaining is very nice because it is an extremely simple idea. If there are multiple collisions, then just attach the new item to a chain at that index. The primary reason you would not want to use it is something we discussed a while back, locality of reference. Computers are designed to access adjacent/nearby memory locations quickly, which the open addressing schemes can take advantage of. On the other hand, external chaining necessary involves dynamic allocation for the chaining, so it cannot take advantage of this. Linear probing is the simplest scheme to take advantage of this, so it's another popular choice.

So, what about quadratic probing and double hashing? We made a few arguments for why they might be useful in their respective lessons, but now it's time to discuss why they might not be very popular.
Hopefully this helps conceptualize the different collision strategies, giving you an understanding of what techniques are out there, and why they are (or are not) used in practice. If you're a theory-oriented student, you might find the theory of
𝑘-independent hashing
 interesting. This concept analyzes the contexts in which we can guarantee 𝑂(1) performance (at least in expectation).  But enough theory-crafting, let's take a look at what some popular programming languages use in their standard implementations for HashMaps.

Food for Thought: What do actual programming languages use?

Let's look at three different languages that have some standard implementations: Java, Python, and C++.
Just looking at these three implementations gives us some interesting strategies and design choices made by the developers, which reminds us that these strategies are not set in stone. It's also good to note that these designs and standards can change over time, so it's possible these will change with more data and later version numbers.

Food for Thought: A Hashing Application: Bloom Filters

The only example of a hashing based data structure/algorithm that will be seen in this course sequence is what has been covered thus far for HashMaps in this module. However, hashing is a very popular and flexible idea which is used in many different applications. We would be remiss if we did not look at any of these applications. Let's take a brief glimpse at a data structure that uses hashing, bloom filters.

We want to set the scene first to motivate bloom filters. Suppose we are given a database that hosts a lot of tagging information. This database takes in some text input, then it returns, say the first 10 results with that tag sorted by some metric, back to the user. This database is hosted on some server that is relatively far away from you, and your internet speed is slow.

In this context, performing a search can be an expensive operation (the delay to send search request, search the database, and receive the reply may take a while). We would like to avoid sending the request if the search would yield 0 results. Is this possible without hosting the entire database closer to you? The answer is yes! (But you knew that already didn't you?)

Rather than answering the question of "Is this tag in the database?" correctly all the time, what if we allow incorrect answers rarely? More specifically, let's say we allow false positives, which is when the tag is not in the database, but we incorrectly output that it is. A bloom filter does precisely that.
image.png

In the flow chart above, we have three possible outcomes. If the bloom filter returns no, then we can be sure the answer is correct. If the bloom filter returns yes, then it could be wrong, so we will go ahead and send the search request. In the true positive case, the actual cost is high, but that cost was necessary, so the opportunity cost is low. In the false positive case, the actual cost is high when the search was not necessary, so the opportunity cost is also high. If we can make false positives rare, then on average, we'll be sending requests mostly when necessary, and not sending requests for most negative answers, which is good!

How does a bloom filter achieve this? Why, hashing of course! Since our database is very large, we don't want our backing table to actually store 𝑛 items for our bloom filter. Instead, we'll have use a bit array of length 𝑚 where 𝑚 can be set independently of the number of items, and the array starts with all entries as 0. Additionally, rather than one hash function, we'll need 𝑘 hash functions (again, 𝑘 can be set as you want). To make things more precise, these hash functions are ℎ1,…,ℎ𝑘, where ℎ𝑖:𝐷→{0,1,…,𝑚−1} for all 𝑖=1,…,𝑘, and 𝐷 is our data space.
image.png

When adding a new tag to the database, we compute the 𝑘 hash functions for the tag, then we set all of those indices to 1. For querying to see if the tag exists, we compute the 𝑘 hash functions, and we check if all of those indices are 1. If any are 0, then we know for sure the tag isn't in the database. If they're all 1, then it's possible for a false positive since an index could've been flipped to a 1 bit due to a different data. Removing from a bloom filter is possible, but it requires other modifications.

We won't go into performance here, but with the proper setting of 𝑚 and 𝑘, along with well-chosen hash functions, bloom filters do a very good job with less than a 1% false positive error rate. Hopefully, this demonstrates to you just how powerful and flexible an idea hashing is!

CS1332 - Secure Computing Considerations

Deletion versus Purging & New Memory Acquisition
Associated with Module 1: Arrays and ArrayLists
The most basic operations for many of our data structures involve adding elements, accessing those elements, and (eventually) deleting those elements. Many of our students don't realize that many complex systems often "delete" data elements simply by removing the user's ability to access those items through the normal interface, while the actual data still resides in memory. Since the actual data might be sensitive, then it's important to understand the difference between deletion (i.e., simply removing access) versus purging as deliberately overwriting and/or modifying the sensitive data so that it is effectively unreadable if accessed later by unauthorized parties. This can be demonstrated in many of our homework assignments where we require the students to overwrite the to-be-deleted values before modifying the access structures themselves. We could further expand on this by having the students measure the increased number of steps needed to effect purging versus deletion, and discuss how this impacts the overall efficiency of the data structure operations.

This can also be tied into the acquisition of new memory. In many cases, a request is made to the operating system when new memory is needed. One example is the Java Memory Model, where new() requests return a reference to a block of memory with the requested size, but only if such a portion of memory is free. From a security and functionality perspective, it's important to ensure that a new block has actually been assigned before attempting to dereference the results. Also, the Java Memory Model normally purges any potentially sensitive data by overwriting the contents of the portion of memory being returned with zeros.

Pointers and References
Associated with Module 2: Pointers, References & Singly Linked Lists
We make a general distinction between pointers and references. Pointers refer directly to locations in memory. In languages (i.e, C) that allow use of "generic pointers", the flexibility of being able to dereference arbitrary memory locations also allows potential misuse by programs and processes that (attempt to) access memory locations for which they are not authorized. Some languages (e.g., Java) don't support this "generic pointer" approach, and instead provide references. With references, only three general types of operations are allowed: (1) an object reference can be assigned "null", which effectively means that there is not any reference being made to a valid object; (2) an object reference can be assigned to a new() object, where the operating system creates the new instance of the object in memory (barring any exceptions or other anomalous conditions); or, (3) an object reference can be assigned to an another existing object reference (normally of the same type), such that the object will then have two or more references (i.e., "aliases"). The intent of references is to avoid some of the security- and memory management-related problems of allowing generic pointers.

Hashing
Associated with Module 6: Introduction to HashMaps and External Chaining (Closed Addressing)
We use the term "hashing" to define the general process of using some set of mathematical transformations (e.g., formulas) to convert the value of an element (i.e., "key") into the address where that key should be stored for later access/lookup. We will discuss how the term "hashing" is also used to refer to a different set of transformations for a given element. More specifically, users are often advised to "check the hash" of a file that they've downloaded to ensure that file's authenticity. In these cases, a hash function is used to generate a "signature" for the given element, which is often a file, and where the file is often significantly larger in size than a key. Also, the desired properties of the "security-based" hash result are different. For the "addressing-based" hash formula, the main concern is generating as much of a uniformly distributed set of addresses as possible, to ensure that elements are distributed as evenly as possible across the address space as possible to minimize collision effects as well as the average access/lookup times. For the "security-based" hash formula, the main concern is that the hash result should be extremely sensitive to any changes in the file, and yield a (clearly and distinctly) different result if the file's contents have been modified. Given that many Computer Science terms are "overloaded" in practice, it would be worthwhile for students to recognize this distinction in usage.

Randomness
Associated with Module 7: SkipLists & Probabilistic Data Structures
The algorithms that we study are mainly deterministic - each step of the algorithm leads to a single, distinct set of changes to the computation state. We will discuss how the properties of randomness can be leveraged in the implementation of various algorithms (e.g., Skiplists). From a performance standpoint, the randomness can be used to ensure that the elements being stored are distributed to across the data structure to support a relative fast O(log n) access/lookup time.
And from a security standpoint, the use of randomness makes it significantly more difficult for an adversary to insert elements into the data structure in a way that will deliberately degrade the performance. For example, inserting elements into a Binary Search Tree (BST) structure in sequential (or reverse sequential) order will cause the access/lookup performance of the BST to degrade towards that of a list-like structure - from O(log n) to O(n). Similar "attacks" against a Skiplist, however, would be unsuccessful.

M8 - AVLs

Module Overview

The biggest drawback that we encountered for BST operations was that the height of a BST can degenerate to
due to imbalance in the tree. We left the idea of a "balanced tree" intentionally vague back then because there are different ways to define balancing.

In this module, we'll be covering a sub classification of BSTs called a AVL Trees, which are self-balancing trees. The name of the tree itself comes from its creators Adelson-Velsky and Landis. AVLs formally define what a "balanced tree" is, and they impose the additional restriction that by the end of any operation, the tree must remain balanced. For us, we'll be looking at how this applies to the add and remove operations specifically, and how the pointer reinforcement technique helps us implement the needed mechanisms (known as rotations) in a clean and intuitive way.

Learning Objectives

Students will:

Intro to AVLs

Now, we can begin to explore AVL trees by looking at how we define the structure of an AVL. Since AVLs are a sub classification of BSTs, they share the same order and shape properties. However, AVLS have an additional shape property to restrict the height of the tree. Let's take a look at how AVLs define balancing through the concepts of heights and balance factors.
Screen Shot 2024-06-17 at 10.51.58 PM.png
Screen Shot 2024-06-17 at 10.53.51 PM.png
Screen Shot 2024-06-17 at 10.54.33 PM.png
Screen Shot 2024-06-17 at 11.00.26 PM.png
Tips:
Screen Shot 2024-06-17 at 11.01.22 PM.png
Screen Shot 2024-06-17 at 11.02.10 PM.png

The Definition of Balanced for AVL Trees

In the video, we saw that balanced trees for an AVL satisfy the following definition:
|node.balanceFactor|≤1 for every node. We also saw a brief justification as to why we chose 1 rather than 0. There are many configurations where it's just not possible to have a perfectly balanced tree. For example,
image.png

As we see in the diagram above, we have a perfectly balanced tree with 𝑛=3 (it's so beautiful!). However, this configuration can only happen when 𝑛 is of the form 𝑛=2𝑘−1. Obviously, by adding one more data leads to an unbalanced tree in some way. Thus, we must accept some imbalance in the world of AVLs. The goal is to minimize the sum of the depths of all nodes, which in turn minimizes the imbalance encountered in the tree.

In the diagram below, we add 2 to the original tree. The tree on the left is BST. However, by including the balance found in AVLS, it leads us to the tree on the right.
image.png


There you go! Perfectly balanced, as all things should be. But wait. At what cost? We got a perfectly balanced tree by minimizing the sum of the node depths, but look at what happened. All the data noted in red are data that moved while maintaining the order property of a BSTs. The balance process made a single add operation, an 𝑂(𝑛) operation, just to update the tree, which is exactly what we were trying to avoid in the first place.

So, maybe minimizing the imbalance isn't the best thing to do. If we tolerate a little more imbalance, then we begin approaching the definition of balanced for an AVL. AVLs allow the depths of leaves to differ by at most 1 (which is what the balance factor definition says). If we "lazily update" by adding to the natural spot in a BST, then we can significantly save cost because updates will not occur as often. When an update does occur, it will be very efficient as is shown in the next lesson. AVL trees hit that sweet spot between allowing some imbalance while keeping operations efficient. It can in fact be shown that the height of an AVL tree is at most 1.44 log⁡(𝑛), which is a pretty tight tree!

AVL Single Rotations

Properties

Screen Shot 2024-06-17 at 11.06.42 PM.png
Screen Shot 2024-06-17 at 11.09.57 PM.png
Screen Shot 2024-06-17 at 11.06.56 PM.png
Screen Shot 2024-06-17 at 11.10.24 PM.png
Tips:

Rebalancing after removing a node

Tips:
Screen Shot 2024-06-17 at 11.14.08 PM.png
Step 1 - Remove node 1, update BF and Height of parent.
Screen Shot 2024-06-17 at 11.15.35 PM.png
Step 2 - BF of parent node is ←1.  Apply left rotation
Screen Shot 2024-06-17 at 11.16.05 PM.png
Step 3: Link parent node 2 to left child (node 3) of child node (node 4)


Screen Shot 2024-06-17 at 11.16.24 PM.png
Step 4 - Link child node 4’s left child to be its former parent.  Update weights.

AVL Double Rotations

Properties

Screen Shot 2024-06-18 at 9.16.58 PM.png
Screen Shot 2024-06-18 at 9.17.57 PM.png
Right-Left Rotation:  If balance factor of our unbalanced node is +2 and balance factor of right child is +1.
Left-Right Rotation:  BF of unbalanced node >1 and left child BF < 0.
Properties:
Time Complexity:

Example: add(54)

Screen Shot 2024-06-18 at 10.38.23 PM.png
Screen Shot 2024-06-18 at 10.38.42 PM.png

1) Add 54 to BST
Screen Shot 2024-06-18 at 9.20.13 PM.png
2) Update imbalance
3) Update balance
Screen Shot 2024-06-18 at 9.22.53 PM.png
Screen Shot 2024-06-18 at 9.23.18 PM.png
Screen Shot 2024-06-18 at 9.23.38 PM.png
We observe that node 50 is unbalanced.
4) Apply double rotation if parent node is >= 2.  
Screen Shot 2024-06-18 at 9.24.17 PM.png
Parent node is -2
Screen Shot 2024-06-18 at 9.32.31 PM.png
Left rotate on 50. 

Screen Shot 2024-06-18 at 9.26.33 PM.png
Update node 44’s BF + height
Screen Shot 2024-06-18 at 9.24.35 PM.png
Right rotate on 68
Update BF + Height
Screen Shot 2024-06-18 at 9.25.41 PM.png
Node 62 moves up, 50 becomes left child.
Screen Shot 2024-06-18 at 9.24.48 PM.png
Node 62 moves up, and 68 becomes right child.

Screen Shot 2024-06-18 at 9.26.04 PM.png
Update BF + height.  Return node 62 as node 44’s right child.

Rotation Table

In the past, students have told us that the table presented in the video for rotation types is very useful as a study tool. We provide it here so that you can reference it as needed. Do keep in mind that the shape shown is the simplest case; there will likely be children nodes or entire subtrees in an actual implementation that need to be properly managed.
image.png
Qs:  What needs to be changed with a BST to become an AVL tree

Logistics of an AVL

There are some final logistics behind the functionality of an AVL tree based on the rotations that need to be covered.

The Number of Rotations per Add/Remove

Let's get a better handle on how these rotations and updates occur in an AVL tree. Based on the algorithm, we recurse down the tree to find the location to add/remove. Once we've done the standard BST add/remove operation, on the way back up the tree (unwinding the call recursion), we perform updates to the parent(s) node(s). The height and balance factor is updated for every node on the return path up, but a rotation won't be needed at every node. Each rotation is
𝑂(1), and so is every update, but it might be a good idea to see how often these rotations actually occur. We know that the number of rotations can be at most 𝑂(log⁡(𝑛)) since the height is 𝑂(log⁡(𝑛)), but perhaps the number of rotations is far less than this.

That is precisely what ends up happening with the add operation. As it turns out, each add operation will trigger at most one rotation. This is because when adding data, any rotation will reset the unbalanced subtree to have back its original height. On the other hand, any remove operation can trigger𝑂O(log⁡(n)) rotations in the worst case; remove operations do not have the same nice property. When a rotation is performed due to a removal, the new height of the unbalanced subtree may not regain its original height, propagating height changes up the tree.

Let's try out an example with the add method. The steps involved here will be detail oriented, so take things one step at a time and make sure you understand each step of the reasoning before continuing to read! Consider the diagram below, where we are seeing the first rotation triggered by an add operation up the tree. We can do a similar reasoning for the other rotation types, so let's keep it simple and assume it was a right rotation. We will derive that each of these heights and balance factors shown must be as shown by only assuming that the height of node is ℎ+1 and that a right rotation is necessary (so 𝐶's balance factor is 2).

image.png

We immediately know that the height of node 𝐵 is ℎ and the height of 𝑇4 is ℎ−2 due to the balance factor and height of 𝐶. Additionally, this imbalance was caused because of the data added. If any subtree's height changed, it was because the height increased. This also means that the new data must be in the subtree of 𝐵 rather than 𝑇4. The balance factor of 𝐵 can be 0 or 1, but we've written in the image that it must be 1. Why is this? Well, we only added a single data. This means that only the height of 𝐴 or the height of 𝑇3 could have increased, not both, so the balance factor of 𝐵 cannot be 0 and must be 1 (because it's a left rotation). This implies that the new data must be in the subtree of 𝐴 (if it was in the subtree of
𝑇3, then the tree would've been unbalanced before the add), giving heights of ℎ−1 for 𝐴 and ℎ−2 for 𝑇3. 

After the rotation, 𝑇1,𝑇2,𝑇3, and 𝑇4 are completely unmodified, and, thus, their heights are unmodified. The heights of 𝑇3,𝑇4 determine the height and balance factor of node 𝐶, which is now ℎ−1 and 0 respectively. The height of 𝐴 is also unchanged. This allows us to compute the height and balance factors of the new root 𝐵, which are ℎ and 0 respectively.

This is exactly what we were looking for because the height of this unbalanced subtree was ℎ+1, and before we added the new data, the height was one less, which is ℎ. We've restored the original height, so the rest of the nodes up the tree will not change heights and balance factors!

The Efficiency of an AVL Tree

AVL trees guarantee that the height of the tree is 𝑂(log⁡(𝑛)), so operations like search, add, and remove now have a worst case of 𝑂(log⁡(𝑛)) since the balancing is all done in 𝑂(log⁡(𝑛)) time along the same path taken to get to the location. To contrast, this is relative to the worst case of 𝑂(𝑛) for those same operations in a normal BST. Additionally, since the height is stored in each AVL node in order to maintain balancing information, computing the height of an AVL tree is now 𝑂(1) as opposed to 𝑂(𝑛) like before. This all seems like a straight improvement compared to a BST, but there are some caveats and asterisks here.

We need to keep in mind that when analyzing the efficiency of some tree operation, we need to be cognizant of how it might differ between an AVL and a BST, both efficiency and correctness. For example, traversals would be
𝑂(𝑛) for both since the balancing structure doesn't change the algorithm or efficiency. However, if an operation would modify the tree structure in some way, then it may be harder to do in an AVL since we must maintain proper balancing by the end of the operation. For example, removing all leaves in a BST can be done reasonably easily in
𝑂(𝑛) time using a single traversal. For an AVL, this may not be so obvious anymore since removing nodes may trigger rotations, which will change the leaves of the tree midway. In other words, correctness may be harder to achieve due to rebalancing, which may worsen the time complexity.

If the main things we're doing are adding, searching, removing, and traversals though, then we're golden!

Food for Thought - Self-Balancing BSTs

Self-Balancing BSTs: There's Two!?

When you look up self-balancing BSTs, you will most likely see two types of BSTs come up: AVL trees and Red Black trees. This course does not cover Red Black trees (perhaps in the future if there is sufficient demand, but there are currently no plans for it), primarily for two reasons:
We will talk a bit about how Red Black trees work, but we'll defer that discussion to the next module because Red Black trees are heavily related to (2, 4) trees. For this reason, if you want to look into Red Black trees, we highly suggest you wait until after the next module.

We mentioned earlier that AVL trees can be shown to have a worst case height of around 1.44log⁡(𝑛). You may wonder how do Red Black trees fare against this? Red Black trees can be shown to have a worst case height of around
2log⁡(𝑛)). In other words, the balancing of a Red Black tree is looser than an AVL tree. Because of this, if you know you'll just be performing a lot of lookups, you'd prefer AVL trees of the two.

AVL trees usually store balancing information like heights or balance factors, whereas Red Black trees only store one bit, which is the "color" of the node, red or black. This may give Red Black trees the edge in memory, but AVL trees can be optimized to just store the balance factor, which is 2 bits. Red Black trees tend to be better in terms of general purpose, which is why they are the more commonly implemented of the two, but the differences between the two are not that significant in terms of performance for most cases.

A performance analysis by Ben Pfaff back in 2003 studied 4 different BST variants in a variety of settings, so you might find that interesting. It's an experimentation paper, so it's fairly readable, and it's interesting to see in what contexts each BST type excels. We'll cite it in the module review.

You may have noted that we said 4 BST variants. These are the regular BST, AVL, Red Black, and Splay trees. We'll also briefly discuss splay trees in the next module, which are one of the more interesting BST variants because they yield efficient operations without maintaining strict balancing!

M99 - SkipLists

Screen Shot 2024-06-18 at 10.56.51 PM.png
Screen Shot 2024-06-18 at 10.58.15 PM.png
Screen Shot 2024-06-18 at 10.58.31 PM.png
Screen Shot 2024-06-18 at 11.01.07 PM.png
Screen Shot 2024-06-18 at 11.02.23 PM.png
Errata: Slide should say, “If data > right, then continue moving right” and “If data < right, move down a level”
Screen Shot 2024-06-18 at 11.03.40 PM.png
Screen Shot 2024-06-18 at 11.04.38 PM.png
Screen Shot 2024-06-18 at 11.08.05 PM.png
Efficiency:
Errata: Best case is O(1) for all, if data we're looking for is the leftmost entry of the SkipList that has been promoted O(log(n)) times.

The Purpose of a SkipList

If you are like most students, you may come out of this lesson thinking "Cool, but what is the point of SkipLists, and why are they placed here categorically?" This line of thinking is certainly valid and one we hope that you take as you try to piece together the big picture of this course. To answer these questions, we give a few different answers, which we hope will stimulate some interesting thought and discussion.

Food for Thought - Randomness In Computing

As we mentioned in the previous SkipList unit, we won't be able to spend much time on randomized computation due to insufficient mathematical background. We would like to take the time to question the use of randomness in computing at all. Randomness is a concept used to model real-world phenomena such as coin tosses or dice rolls. It makes sense to us that if we flip a coin that is equally weighted, we expect about half of the flips to be heads, which we call a "probability of 1/2."
But does it make sense to assume that computers should have access to randomness? To put it another way, is it fair to assume that computers can model random behavior like flipping a coin? The reason we're doubting this is because computers operate in a deterministic way, meaning that the code they execute will always proceed the same way (no randomness). So, can computers model true randomness?

Another related question is whether or not randomness "helps" computation. That is, if we assume that we have access to randomness, can we come up with solutions to problems that are "better" than purely deterministic ones (i.e. faster or more accurate)? We hope to address these two questions here in a limited capacity.

M9 - (2-4) Trees

In this module, we continue our exploration of trees by taking a look at a very different type of tree from what we've seen before, a non-binary tree type known as (2, 4) trees. These (2, 4) trees inherit a similar order property found in BSTs, but (2,4) trees have flexible nodes that allow for multiple data and multiple children per node. (2,4) trees have a more strict shape property, requiring that every leaf node resides at the same depth. Thus, making the depth of (2,4) trees logarithmic.
(2,4) trees are the last type of tree we will be covering in the course sequence, and it also marks the end of the data structures half of the sequence! With the completion of this module, you'll have built up a formidable toolkit of data structures that you can draw from when problem solving in your programming work. Of course, we cannot cover every data structure that might be useful. But with the knowledge and maturity you have built from this course sequence, we know you have the skills to tackle other data structures on your own as you encounter them in your work. Nevertheless, this module also features some optional "Food for Thought" glimpses at splay trees and Red Black trees, which are other types of BSTs that maintain their operations through various rotations.

Learning Objectives

Students will:

Intro to 2-4 Trees

We begin this lesson by looking at the properties of (2, 4) trees (sometimes referred to as 2-3-4 tree). These trees are a generalization of BSTs because they allow multiple data to be stored in sorted order in a node, anywhere from 1 to 3 data. (2, 4) trees are a special case of multiway search trees, which allow multiple data to be stored in a node in generality. They're called multiway search trees because if we have 𝑥1<𝑥2<⋯<𝑥𝑚 in a node, then this defines
𝑚+1 possible areas for data to be. So, a node with 𝑚 data can have 𝑚+1 children, which hold data in the specified range (the namesake of (2, 4) trees is that every node can have between 2 and 4 children).

Multiway search trees that enforce a strict balancing scheme are known as B-trees. As we will see, (2, 4) trees are always perfectly balanced in terms of tree height, so they are also a special case of B-trees.

Screen Shot 2024-06-19 at 7.56.46 PM.png
Screen Shot 2024-06-19 at 7.56.57 PM.png
Screen Shot 2024-06-19 at 7.57.23 PM.png
Strict shape property achieves O(log(n)) operations.

Balancing in a (2, 4) Tree

In the AVL module, we saw the we cannot always maintain perfect balancing of heights in a BST due to the problem we see below. Perfect balancing can only occur for certain values of 𝑛.
image.png
Well, (2, 4) trees solve this problem by allowing more than 1 data to be in a node. Naturally, the time complexity of searching now depends on both the height as well as the number of data allowed per node. Allowing up to 3 data is a nice compromise between the two costs while allowing us to perfectly balance the tree efficiently. Of course, there will be imbalances in the sense that some nodes will be more filled than others, but from the perspective of heights, the tree is perfectly balanced.

Searching in a (2, 4) Tree

In a BST node, there was only one data, 𝑥. The way we decided whether to go left or right was to check if our new data
𝑦<𝑥 or 𝑦>𝑥. Searching in a (2, 4) tree follows the same principle. Suppose we have a 4-node as shown below. (The 4-node contains 3 data and has 4 children.)
image.png

The direction in which we search for the data is determined by "area" of the search where the data may be located. The decision process is similar for 3-nodes, and, of course, 2-nodes. Let's look at an example. Suppose we have the (2, 4) tree given below and want to look for the data 16. Then, the path followed in the diagram is precisely the search path we would take. The red numbers denote where we compared the data, and the green numbers denote where we found the data.
image.png

Food for Thought - B-Trees: Querying Large Databases

As we mentioned, (2, 4) trees are a generalization of B-trees that allow up to 4 children per node. Another way of saying this is that a (2, 4) tree is a B-tree of order 4. For those curious about the name of B-trees, there is actually no official consensus for what the B stands for, so assume it to be whatever you like!

As we mentioned, B-trees have a tradeoff that they inherently work with. The higher the order of the tree, the shorter the tree. However, the higher the order of the tree, the more data per node, and therefore more operations to process each node. In short, we're just shifting the cost of searching from traversal of the tree to a sorted list-like structure in a node. That begs the question, what good is that? As it turns out, B-trees are very commonly used for large databases!

Suppose we have a very large database that we'd like to perform many queries on. This database is much larger than what we can expect to store in active memory, so we're going to have to store this somewhere else which has a much larger storage capacity. Historically, this is traditionally done with disk drives (though in more recent times, SSDs are becoming more and more popular/viable). Reading/writing to a disk is orders of magnitude slower than doing the same with RAM, so we want to minimize how many reads we need to query the database.

As with many things in computer science, hierarchical structure is what makes B-trees so useful. B-trees organize hierarchically into blocks (the nodes and different levels). As it turns out, while reading things from a disk is on the order of milliseconds (yes, this is slow for a computer!), this is actually the time it takes to read an entire block of memory rather than a specific memory location. So, it will be much faster to query data in the same block than querying different blocks (such as in a BST) for each query. What the order of B-tree is used will naturally depend on the system, so that has to be tuned for B-trees to be used optimally.

As time moves forward, technologies improve, and what works best before might not work as well now. As we mentioned, SSDs are becoming much more commonplace, and reading/writing to SSDs is indeed faster compared to disk drives. However, this is still much slower than working directly with RAM. Additionally, there are plenty of things that benefit from the hierarchical nature of B-trees, so it's unlikely that B-trees will be going away anytime soon. Not to mention that adoption of newer practices is surprisingly slow in a space like this where technology improves so quickly!

Adding and Overflow

In this lesson, we will leverage how to search in a (2, 4) tree in order to add data to a (2, 4) tree. Similarly to BSTs, we always add new data to a leaf node. Unlike BSTs, the (2, 4) add operation places the data within the leaf node, rather than add a new child node to the leaf node. This is how (2, 4) trees preserve the tree shape and order properties. However, the node size property may be violated as the node "overflows" with data. If the node size property is violated, then a promotion (sometimes called a split) is performed to resolve the violation. Promotion propagates the overflow upwards into parent nodes until a valid (2, 4) tree is formed.
Screen Shot 2024-06-19 at 8.14.55 PM.png
Overflow occurs.  Either middle data will be promoted.
Screen Shot 2024-06-19 at 8.15.32 PM.png
20 is promoted, rest is split left and right into child nodes
Screen Shot 2024-06-19 at 8.16.25 PM.png
Overflow occurs.  Third data is promoted, and node is split in two.
Screen Shot 2024-06-19 at 8.17.29 PM.png
12 is promoted. Note how each child sit less than, between, or greater than or parent data.
Screen Shot 2024-06-19 at 8.18.08 PM.png
Overflow occurs.  Third data value (45) is promoted
Screen Shot 2024-06-19 at 8.18.36 PM.png
Overflow when 36 is added.
Screen Shot 2024-06-19 at 8.18.59 PM.png
36 is promoted.  Root now hits overlow.
Screen Shot 2024-06-19 at 8.19.29 PM.png
Second promotion occurs for 36 and left right sets are split.
Screen Shot 2024-06-19 at 8.20.01 PM.png
Add the data 1,2,…,10 in sorder order.  Used the 2nd data to promote. How much of each node is filled? What do you notice about the balance of data (in terms of fill percentage)? What will happen as you continue to add in sorted order?
Screen Shot 2024-06-19 at 8.22.18 PM.png

Use 3rd data promotion to add the final 10:
Screen Shot 2024-06-19 at 8.25.06 PM.png

Removing and Underflow, Example 1

The most complex remove of all data structures presented thus far is discussed in this lesson. We will examine how removing works in a (2, 4) tree and the complications that occur. Just as there was overflow when adding, we have underflow conditions when removing. Underflow is remedied by two operations, transfer and fusion. Transfer is the easier remedy of the two and it is always preferred, if possible. The remove procedure can appear very complicated, so take time to look at the process one step at a time. This is where you will see the value of the vistool. Spend lots of time playing with the vistool in the lab for this lesson to get a handle of what is happening.
Screen Shot 2024-06-19 at 8.30.16 PM.png
Screen Shot 2024-06-19 at 8.30.57 PM.png
No underflow, just remove value.

Screen Shot 2024-06-19 at 8.31.15 PM.png
Depending on implementation, use successor or predecessor to fill remove data.
Screen Shot 2024-06-19 at 8.32.14 PM.png
Successor replaces removed data. This causes underflow (*).  In an underflow state, the empty node checks whether adjacent nodes have more than 1 data in it. 
Screen Shot 2024-06-19 at 8.34.02 PM.png
To preserve order of values, data is rotated from adjacent node to parent, and parent data moves to empty node.
Screen Shot 2024-06-19 at 8.35.22 PM.png
Removing 31 triggers underflow, but adjacent node only has one entry.  This triggers a fusion: parent node’s data and adjacent node’s data is fused to empty child node.  Note: Fusion can potentially cause a chain of events.
Screen Shot 2024-06-19 at 8.37.41 PM.png
Removing 49 triggers underflow; adjacent node only has one and parent only has one data.
Screen Shot 2024-06-19 at 8.38.50 PM.png
In this case, we demote the parent node data to empty child node.  Parent node is now empty.
Screen Shot 2024-06-19 at 8.39.24 PM.png
Child nodes are fused together, and empty parent node is collapsed.
Screen Shot 2024-06-19 at 8.41.04 PM.png
This triggers a chain of events where root node must handle underflow.  36 is demoted and adjacent nodes are fused.
Screen Shot 2024-06-19 at 8.41.54 PM.png
Root collapses, and [5,36] now becomes the root.  We end with a valid 2-4 tree.
Screen Shot 2024-06-19 at 8.42.30 PM.png

Removing and Underflow, Example 2

Screen Shot 2024-06-20 at 9.43.42 PM.png
Replace with successor, triggers underflow (empty former 32 node).
Screen Shot 2024-06-20 at 9.43.58 PM.png
Fusion: Bring down 36.  Fuse 36, 40 together. Parent node is now empty.
Screen Shot 2024-06-20 at 9.44.21 PM.png
Root node (32) moves down to empty node.
Screen Shot 2024-06-20 at 9.45.19 PM.png
Bring up sibling data that’s closest to the (former) parent (32), which is 20.
Screen Shot 2024-06-20 at 9.46.24 PM.png
Child leaf of 24, 28 transfers to new parent (32)
Screen Shot 2024-06-20 at 9.47.16 PM.png

2-4 Removing Flow Chart

Screen Shot 2024-06-20 at 9.46.56 PM.png
image.png

Food for Thought - Link between (2, 4) Trees and Red Black Trees

In an optional section of the previous module, we discussed the advantages and disadvantages between AVL trees and Red Black trees. We also promised a brief discussion on the workings of a Red Black tree, which we will resolve here. As it turns out, (2, 4) trees and Red Black trees are intimately linked (this is in fact by design by the creators of Red Black trees). So much so that often, when students learn Red Black trees, they will first learn (2, 4) trees as a primer to ease them into it. We will discuss briefly the relationship between the two here, and we'll leave the inner workings/rotations of a Red Black tree for you to learn on your own.

image.png

So, what exactly is a Red Black tree? A Red Black tree is a BST that keeps track of the color of a node, allowing it to be red or black. These colored nodes must satisfy certain properties to be a Red Black tree, which we state below. These trees are often taught using sentinel/phantom nodes, which are nodes that have null in them. With this in mind, all leaves are sentinel/phantom nodes.
Notice that there was nothing in the requirements about the tree being balanced! Unlike an AVL which derives its logarithmic height from a definition of balanced, Red Black trees derive their logarithmic height directly from (2, 4) trees. For every (2, 4) tree, there are multiple Red Black trees that are equivalent to it, and for every Red Black tree, there is a single (2, 4) tree that it corresponds to. Roughly, each black node and its red children form a node in a (2, 4) tree (there is some nuance to this, which we omit for brevity). This relationship is what guarantees that every Red Black tree has height at most 2 log⁡(𝑛). In a sense, the add/remove operations of a (2, 4) tree are also very much related to the operations of Red Black trees.

Food for Thought - Splay Trees

For many data structures, we study how they perform in the average case when the input distribution is uniformly random (each ordering is equally likely). However, this is almost never the case in the real world. A very good example of this is caches.

Here's an example of a cache, a web cache. When you want to go to a webpage using your computer, your computer must somehow get your request to see the content on that page to its destination. This is not done magically, your computer forwards your request to a router first. (Disclaimer: This is a simplification of how this works. Networking is a vast topic, and you could take several classes learning how computers communicate to solve tasks like this one!) This is done by that router as well; each router forwards the request until we reach the final router, which has access to the destination.
image.png

This process is generally pretty short (to us), but it can actually take a while in computer time, long enough that we can notice the delay. Web searching is something that we expect to be instantaneous, so we want to minimize this as much as possible. This is where a web cache comes in.

A web cache keeps track of content from websites that have been requested previously. If a request is made for new content, then the web cache has no choice but to go through the standard forwarding process. However, once it's received this content, it can store it in case another request for the content comes by later. This is obviously a lot of data, so web caches will only keep track of recent data.

So, now if we request something, the first router will check the web cache to see if it has what we're looking for. If it does, then it can just immediately give us what we're looking for. This on average saves quite a lot of time!
image.png
So, now that we have an idea of what one type of caching is, let's look at one potential data structure. Splay trees are another type of BST that performs rotations to maintain its structure. However, unlike AVL trees and (2, 4) trees, it doesn't attempt to maintain a strict balancing scheme. Instead, it works under the following principle: "If data is queried in the BST, then we want to perform a series of rotations so that the data is now at the root." These rotations are referred to as splaying, and the different types are zig, zig-zig, and zig-zag.

So, what does this accomplish exactly? Over time, data that is queried often (popular data) will be kept closer to the root, while data that is queried very rarely will sink to the bottom of the tree. Data kept closer to the root have shorter search times, so popular requests will be answered quickly. In other words, if there is a bias in the input distribution (how often something is queried), then splay trees will do very well.

What about if the input distribution is drawn uniformly at random? Then splay trees won't do as well as their balanced counterparts since splay trees don't have strict balancing (the degenerate tree is still possible). However, while the worst case of a single search operation is 𝑂(𝑛), we can show that any sequence of search operations will average𝑂(log⁡(n)) time. So, it does have a worst case of amortized 𝑂(log⁡(𝑛))! We haven't seen amortization outside of the context of array resizing, so if you want to see amortization in action with a more complex procedure, then look into splay trees!

END OF PROBLEM SET 2

Module 10 - Sorting

Module Overview
This module marks the start of the second half of the course sequence, algorithms! An algorithm is a finite procedure/mechanism used to solve a problem or perform some computation. In a sense, much of this course sequence has already been about algorithms; however, we have only encountered algorithms with respect to data structures. Now we are flipping perspectives, where we will examine how to solve problems using all of the tools (data structures) that we have added to our toolbox.

Our first meeting with algorithms will be sorting algorithms. Sorting algorithms will span both module 10 and 11. In this module, we cover the more simple, "naïve" sorts to get our feet wet. We will refer to these sorting algorithms as iterative sorts since they all inherently rely on making multiple iterations of the data, fixing unsorted pairs. Once you're familiar with basic sorting algorithms, we'll dive into more complex and efficient algorithms in the next module.

Learning Objectives
Students will:

Introduction to Sorting

Algorithmic Problems and Solutions

Now that we've begun our journey into the algorithm half of the course, it's worth the time to take a moment and consider what is the goal. For the first half of the course sequence, we built up a large toolkit of techniques and data structures that could be used in a variety of contexts.

In this second half of the course, we turn to solutions for specific problems rather than general data structures. As such, for each problem we try to solve, it's a good idea to formally state the format of the problem. For example, when we say "sorting," we're referring to the following goal:

Input: An array of 𝑛 data that have some predefined consistent ordering (i.e. the data is Comparable).
Output: The same array of 𝑛 data that has been permuted to be in ascending order as determined by the predefined ordering.

This specification may appear unnecessary, but it is essential to understand the nature of the problem. For example, we have specified that the output array is to be the same array in memory as the input array, or that we have specified the sorted order to be ascending. Neither of these specifications make the sorting problem more difficult than before, but we must be sure of what exactly is the goal and how the input is specified. For example, if we require that the input is an array of integers instead, then the problem becomes easier (in a sense) as we'll see in module 11.

Analyzing Sorting Algorithms

Screen Shot 2024-06-29 at 6.19.41 PM.png
Screen Shot 2024-06-29 at 6.19.59 PM.png
To become a problem solver, there are core skills one needs to develop. This development in skills comes from taking the time to understand

Bubble Sort

Screen Shot 2024-06-29 at 6.21.57 PM.png
Key Idea: Start from front and compare first two items. If they’re in wrong order swap, if not move on.  Keep looping in incremented index until reaching end of array.

Optimizations

Optimization: If swap is made boolean is set to true
Screen Shot 2024-06-29 at 6.23.00 PM.png
Optimization: When the last swap has occurred.
Screen Shot 2024-06-29 at 6.23.36 PM.png
From Sarakrishna slides:
Screen Shot 2024-06-30 at 10.03.02 PM.png

Time Complexity of Bubble Sort

Screen Shot 2024-06-29 at 6.25.09 PM.png
Screen Shot 2024-06-30 at 10.03.55 PM.png

Bubble Sort Example

Example 1

First outer loop

Orange: Cells swapped
Blue: Sorted cells
Screen Shot 2024-06-29 at 6.28.09 PM.png
Screen Shot 2024-06-29 at 6.30.04 PM.png

Second outer loop

Screen Shot 2024-06-29 at 6.30.58 PM.png

Third + Fourth outer loop

Screen Shot 2024-06-29 at 6.32.14 PM.png

Example 2

Screen Shot 2024-06-29 at 6.33.38 PM.png
Screen Shot 2024-06-29 at 6.33.44 PM.png

Insertion Sort

Pseudocode

Screen Shot 2024-06-29 at 6.46.51 PM.png
From Sarakrishna slides:
Screen Shot 2024-06-30 at 10.05.37 PM.png
Key Idea: Sort of reverse of Selection Sort, start from front and push minimum to front.

Time Complexity

Screen Shot 2024-06-29 at 6.44.35 PM.png

Examples

Screen Shot 2024-06-29 at 6.48.31 PM.png
Screen Shot 2024-06-29 at 6.49.05 PM.png
Screen Shot 2024-06-29 at 6.50.08 PM.png
Screen Shot 2024-06-29 at 6.50.21 PM.png

Food for Thought - Beyond Traditional Computation: Streaming Algorithms

Insertion sort has a somewhat interesting property that we'd like to point out here. In the modern world, datasets are starting to become very large, to the point that it might not be realistic to expect a machine to be able to store the entire dataset in active memory to work with. So, rather than being given all of our input all at once, we instead received data in a streaming fashion? There are various proposed models, but we give one below.
image.png
It's worth taking some time digesting the streaming model and the nuances involved. To reiterate, the input size might be very large, so even storing it is prohibitively expensive. So, the best that we could hope for is to "summarize" or "sketch" the important characteristics of the data as we receive the data in the stream. Ideally, the computation time for each item (the update time) should be small too, hopefully not much more expensive than 𝑂(1). Then, at the end of the stream, we use our sketch to finish the computation and solve our problem. If you reflect on various problems you know how to solve, then you'll soon find that this model is very difficult to work in and is an active area of research!

For example, suppose the problem is "How many distinct items are in the dataset, assuming there can be duplicates?" In a traditional model of computation, we can solve this in 𝑂(𝑛) on average (you have the tools to solve this problem if you've completed courses I and II, so try to think of a solution!) As it turns out, if we want to solve this problem exactly, then we need Ω(𝑛) memory too, which is no good in the streaming model! We can do better, but only if we allow some error in our answer. Giving approximate answers and using randomization are common occurrence in the literature of streaming algorithms.

Let's circle back to insertion sort, which is what triggered our discussion of streaming algorithms in the first place. In this model, the problem of "Sort the dataset." doesn't make too much sense anymore since we can't store the entire dataset anyway. Instead, let's relax our problem to answering the question "Is the dataset in streaming order sorted?" This is a significantly easier problem, we just store the maximum item seen so far. If the next data is smaller, we know the dataset is not sorted. This is reminiscent of insertion sort, except that we don't bubble backwards to insert the data in the correct position. In fact, if we remove the space restriction and only look at the data stream portion of the model, then insertion sort is the only algorithm we'll see in the course that can sort the data as the stream progresses.

To the best of our knowledge, streaming algorithms were first discussed by Munro and Paterson in 1978, but the idea and model was popularized in 1996 by Alon, Matias, and Szegedy. We'll cite these papers in the module review section later if you're interested in taking a peek, though keep in mind that they are rather theoretically-oriented. Streaming algorithms have many applications, but they are often useful for networking applications, in monitoring network traffic for abnormalities or other properties of interest.

Selection Sort

Pseudocode

Screen Shot 2024-06-30 at 9.22.06 PM.png
Max approach
Screen Shot 2024-06-30 at 9.23.31 PM.png
Min approach
image.png

Saikrishna slides

(Uses minimum index approach)
image.png
Key Idea: Start from back and push max to the end.

Time complexity

Screen Shot 2024-06-30 at 9.24.14 PM.png
Screen Shot 2024-06-30 at 10.06.57 PM.png

Example

Screen Shot 2024-06-30 at 9.25.18 PM.png
Example of long-swaps - why this algo is unstable
Screen Shot 2024-06-30 at 9.29.08 PM.png

Food for Thought - Beyond Traditional Computation: Sublinear Algorithms

Selection sort is the first sorting algorithm we've seen where the best case is not
𝑂(𝑛). In a sense, bubble sort and insertion sort have optimal best cases because it takes
Ω(𝑛) to read the input! We've seen some operations that do better, but that's only if we assume strong things about our input (for example, binary search takes 𝑂(log⁡(𝑛)), but a sorted input is a rather strong assumption). Like in streaming algorithms, this lower bound based on reading the input is only true in the traditional model of computation. What if our data set is large enough that even reading the entire dataset is too expensive? Then, we need to look at sublinear algorithms, which have time complexity smaller than 𝑛. In this note, we'll be looking at a specific class of sublinear algorithms, known as property testers.
image.png
A property tester is an algorithm that checks if the input satisfies some property. For example, we may ask whether our array is sorted or not. Since we're not even reading the entire input, we cannot hope to answer correctly for every possible input. So, we need some parameter
𝜖 to quantify what inputs our property tester should work well for as well as some metric for how "far" our input is from satisfying the property. For example, our distance function might be "How many swaps are needed at a minimum to sort the array?" and 𝜖=12 would be saying our algorithm should be able to distinguish a sorted array from an array that requires half of all possible swaps to sort with high probability.

Similar to streaming algorithms, this is a very difficult model of computation to work with, and it is a relatively newer area of research. After all, we are only reading a small fraction of the input and trying to determine whether the input satisfies a certain property from that small amount of information. By the nature of things, many properties seem to require reading the entire input to test for them well, though research is actively working to better understand where this line in the sand is. This answer is rather unfortunate, but it is not entirely unexpected. There are many properties out there where there is no "locality" information; learning about data in one location may tell us very little about the rest of the data.

Cocktail Shaker Sort

Pseudocode

(With last swapped optimization)
Screen Shot 2024-06-30 at 9.34.30 PM.png
Screen Shot 2024-06-30 at 9.35.48 PM.png
Key Idea: Modification of bubble sort

Time Complexity

Screen Shot 2024-06-30 at 9.37.39 PM.png
Best case: Already sorted array, O(n) - one iteration and done.
Worst case: Reverse sorted array, O(n^2).  Same as bubble sort.
Benefit to bubble sort: Sorted array except last index which is minimum.

Example

Screen Shot 2024-06-30 at 9.40.19 PM.png
Screen Shot 2024-06-30 at 9.41.09 PM.png
2nd nested loop starts from index 7 (where swap occurred)
Screen Shot 2024-06-30 at 9.42.21 PM.png
End of iteration 1 - last location of swap is index 4.  2nd iteration’s start index is 4.
Screen Shot 2024-06-30 at 9.43.47 PM.png
Last swap was index 6 so backward starts there.
Screen Shot 2024-06-30 at 9.44.29 PM.png
Can set a terminating condition, or just let it run to its end.

Iterative Sorts: Review

image.png
Bubble sort bubbles the maximum item to the end of the subarray. The subarray's search space decreases based on where the last swap occurred.
image.png
Insertion sort maintains two parts, the left part is relatively sorted, and the right part has not been looked at yet. The border between the two sections moves rightward in each iteration, and the item that has switched sides is inserted into its correct relative position in the sorted part.
image.png
Selection sort looks for the maximum item in the subarray and swaps it into its correct position. The search space decreases by 1 each time.
image.png
Cocktail shaker sort is a modified bubble sort, where each iteration is composed of two bubble sort iterations in opposite directions. The first iteration bubbles the maximum to the end, and the second does the opposite. The search space's bounded are modified based on where the last swap occurred in each half iteration.

Takeaways:


The Usage of the Iterative Sorts

The four iterative sorts that we've covered in this module are more demonstrative than anything else. They are rarely ever used in practice because quadratic performance on average is expensive for sorting. There are two cases where they might be used.
In most cases, one of the sorts in the next module (or a variation) are used instead since they are more efficient. However, it is worth it to take the time and see why the iterative sorts don't perform very well. Each of them are simple and intuitive, but it's very common that the first ideas you come up with that are clean and simple will be inefficient and naïve. It is as important to see algorithms with instability and quadratic efficiency in order to recognize these characteristics in other algorithms and avoid them.

Each of these algorithms have the problem that in the worst case, the search space decreases by 1. In other words, we sort everything together, meaning that each iteration must go through (most) of the data. So, what if instead, we tried to split up the data, and sort the smaller parts? This way, while we'll have to sort multiple parts, these parts will be smaller, so each sort will have to inspect less data. Then, we can combine the sorted parts together in an efficient way. This is precisely a divide and conquer approach, which we'll be exploring in the next module!

Module 11 - Divide and Conquer Sorts

Key Idea:
Learning Objectives
Students will:

Merge Sort

Key Idea: Break up problem into halves recursively, then merges them back together as it bubbles up.

Pseudocode

Part 1: Partitioning
Screen Shot 2024-06-30 at 10.16.43 PM.png
Part 2: Merging
Screen Shot 2024-06-30 at 10.18.31 PM.png
Part 3: Emptying one of the subarray when the end of the other subarray is reached.
Screen Shot 2024-06-30 at 10.20.21 PM.png

CSVizTool:

Screen Shot 2024-06-30 at 10.23.40 PM.png
Screen Shot 2024-06-30 at 10.24.05 PM.png

Saikrishna Slides

Screen Shot 2024-07-03 at 10.18.23 PM.png
Screen Shot 2024-07-03 at 10.18.41 PM.png
Screen Shot 2024-07-03 at 10.19.06 PM.png

Time Complexity

Screen Shot 2024-06-30 at 10.22.14 PM.png

Example

Left side:
Screen Shot 2024-06-30 at 10.26.03 PM.png
Right side:
Screen Shot 2024-06-30 at 10.29.13 PM.png
Merging example using i and j
Screen Shot 2024-06-30 at 10.31.03 PM.png
another example one level above
Screen Shot 2024-06-30 at 10.31.42 PM.png
Example from CSViztool (2,3,4,5,1)
Screen Shot 2024-06-30 at 10.38.09 PM.png

The Efficiency of Merge Sort

An O(𝑛log⁡(n)) time complexity is a bit difficult to visualize, so let's take a closer look at where this time complexity comes from.
image.png
Merge sort has two phases per recursive call, the splitting and merging phases. Splitting occurs before we make the recursive calls, while merging occurs after recursive calls end, which is the "unraveling" of the structure. When analyzing the time complexity, we are interested in the entire tree. Each "node" in the tree contributes a cost to the time complexity, and our job is to do the bookkeeping of summing it all up.

In the splitting phase, suppose we're working with a subarray of length 𝑚. Then, we take 𝑚/2 time to copy the left half of the subarray into a new array, and we take another 𝑚/2 time to copy the right half. In total, this is m time for a length 𝑚 subarray. However, if we sum the total lengths of subarrays on each level, this total is 𝑛. So, each level of this tree contributes 𝑂(𝑛) time to the total cost. There are 𝑂(log⁡(𝑛)) levels to the splitting side of the tree, so this becomes 𝑂(𝑛 log⁡(𝑛)) for our total time complexity. 

In the merging phase, it's very similar in analysis. For a length 𝑚 subarray, it takes 𝑚/2+𝑚/2=𝑚 time to merge the left and right halves together since they're already sorted. Once again, there are 𝑛 data on each level, and there are 𝑂(log⁡(𝑛)) levels, giving a time complexity of 𝑂(𝑛 log⁡(𝑛)). Summing these two phases together just increases the constant factor, giving an overall 𝑂(𝑛 log⁡(𝑛)).

What about the space complexity? We can actually get away with having one extra length 𝑛 array as a buffer to copy elements into/out of. This gives us a space complexity of 𝑂(𝑛). In fact, even if we had allocated two new length 𝑚/2 arrays at the splitting step of a length 𝑚 array, we'd still only have 𝑂(𝑛) space complexity. There is a bit of nuance to this, so we will defer this discussion to the quick sort lesson, where it will be more meaningful.

In-Place Quicksort

Note: Version taught in course: randomized inplace quick sort using the hoare partition scheme
Screen Shot 2024-07-01 at 9.41.57 PM.png

Algorithm

Part 1: Pivot Selection
Screen Shot 2024-07-01 at 9.42.28 PM.png
Base case: length is 1 or less
Part 2: Partitioning
Screen Shot 2024-07-01 at 9.44.21 PM.png
Takeaway: Move items greater than pivot to right of array, smaller items go left.
Part 3: Placement of pivot and recursive calls
Screen Shot 2024-07-01 at 9.46.40 PM.png

CSViztool

Screen Shot 2024-07-01 at 9.49.59 PM.png
Screen Shot 2024-07-01 at 9.50.20 PM.png

Saikrishna Slides

Screen Shot 2024-07-03 at 10.21.10 PM.png
Screen Shot 2024-07-03 at 10.21.17 PM.png
Screen Shot 2024-07-03 at 10.21.28 PM.png

Time Complexity

Screen Shot 2024-07-01 at 9.49.06 PM.png
Best case: Array cut in half
Worse case: Pivot always min/max value of subarray.  Degenerates into selection sort.  Fixed by randomizing pivot selection.

Example

(Complicated, refer to CSViztool QuickSort to understand)

The Space Complexity of Algorithms

Space complexity is a bit more nuanced when we discuss algorithms rather than data structures. Generally, when people talk about the space complexity of an algorithm, they are talking about what is called the auxiliary space complexity. This is the space complexity of an algorithm if we don't count the space needed for the input and also don't count the space needed for the output. So, in other words, we're looking at the working memory for the algorithm to work beyond the necessary input/output.

This may seem strange at first, but there's a very good reason for this distinction, one we can see looking at the various sorting algorithms. If we included the input and output, then all of these sorting algorithms would have space complexity 𝑂(𝑛), which is quite a heavy price. This measure doesn't distinguish between sorts like bubble sort which have minimal memory footprint and merge sort, which uses a lot of memory due to the extra space needed for array copies. There would also be no way to talk about algorithms that use sublinear space if we didn't make this distinction.

Another nuance that might seem strange is that the space complexity is the maximum amount of space used at any given time during execution of the algorithm. The other option would be to consider all space used throughout an algorithm. We define space complexity this way because from the computer's perspective, if memory is not being used by the algorithm, then that memory is "freed" for the rest of the system to use. This means that the space complexity of merge sort is indeed 𝑂(𝑛) rather than 𝑂(𝑛log⁡(𝑛)) since the maximum memory used at any given time is
Screen Shot 2024-07-01 at 10.09.29 PM.png

Finally, there is one other nuance that has to do with lower level details. Recursion and method calls in general are not a free resource in terms of space. The program needs some way of remembering where execution stopped when a new method call was invoked, so it stores whatever state information is necessary to remember this on the call stack. So, the recursion depth of an algorithm contributes to the space complexity as well. In the case of merge sort, the max recursion depth is 𝑂(log⁡(𝑛)), which is much smaller than 𝑂(𝑛), so we didn't need to worry about it.

However, quick sort is inplace, so this recursive cost is non-trivial. On average, quick sort's recursive depth will be 𝑂(log⁡(𝑛)), and in the pathological worst case, it will be
𝑂(𝑛). So, the space complexity of quick sort is on average logarithmic, and in the worst case linear.

(New) HeapSort

Key Idea:
Screen Shot 2024-06-30 at 10.09.25 PM.png
Screen Shot 2024-06-30 at 10.09.41 PM.png
Class implementation uses PriorityQueue (MinHeap) to sort the data.

Time Complexity

Screen Shot 2024-06-30 at 10.10.48 PM.png
Screen Shot 2024-06-30 at 10.10.59 PM.png

Example

Screen Shot 2024-06-30 at 10.11.59 PM.png
Screen Shot 2024-06-30 at 10.12.06 PM.png
Screen Shot 2024-06-30 at 10.12.18 PM.png
Screen Shot 2024-06-30 at 10.12.31 PM.png
Screen Shot 2024-06-30 at 10.12.43 PM.png

Limits of Comparison-Based Sorting

Key Idea:
"For any deterministic, comparison-based sorting algorithm A that is always correct, there exists some input x such that A(x) takes Ω(nlog(n)) comparisons to execute, and therefore its worst case time complexity is Ω(nlog(n))."
High-Level Idea Summary:
Final takeaway: We can prove a stronger statement.  Highlighted part is important.
"For any (potentially randomized) comparison-based sorting algorithm that is always correct, the number of comparisons it must make on average is at least Ω(nlog(n)), meaning the average case is Ω(nlog(n))."

LSD Radix Sort

Screen Shot 2024-07-01 at 10.46.24 PM.png

Pseudocode

Screen Shot 2024-07-01 at 10.46.36 PM.png
(Designed for sorting numbers in base 10)

CSViztool

image.png
image.png

Saikrishna Slides

Screen Shot 2024-07-03 at 10.23.07 PM.png
Screen Shot 2024-07-03 at 10.23.16 PM.png

Notes:
Tip: “One common mistake students make with radix sort is the termination condition when implementing base 10 LSD radix sort. Students notice that in many cases, if all items falls into the 0 bucket, then the result is sorted. However, this is not always the case!”

Time Complexity

Screen Shot 2024-07-01 at 10.50.10 PM.png
Take aways:

Example

(Note: Example only uses positive integers, so bucket width of 9 is used (use 19 for negative)

Iteration 1

Step 1) Put each index item in buckets (LinkedList) according to 1st digit.
Screen Shot 2024-07-03 at 9.14.15 PM.png
Step 2) From 0, overwrite items in original array with items from buckets.
Screen Shot 2024-07-03 at 9.17.56 PM.png

Iteration 2

Step 3) Put each index item in buckets according to 2nd digit. 
Screen Shot 2024-07-03 at 9.19.26 PM.png
Step 4) From 0, overwrite items in original array with items from buckets.
Screen Shot 2024-07-03 at 9.20.40 PM.png

Iteration 3

Step 5) Put each index item in bucket according to 3rd digit.
Screen Shot 2024-07-03 at 9.21.16 PM.png
Step 6) Empty buckets.  Array is now sorted.
Screen Shot 2024-07-03 at 9.22.19 PM.png

Food for Thought - MSD Radix Sort

Quick Select (Kth selection problem)

Key idea: Use a derivative of an algorithm that solves one problem in order to solve a different problem.

K-th selection problem:

Pseudocode

Screen Shot 2024-07-03 at 9.48.22 PM.png
Base case: If recursion goes all the way to base case of QuickSort (k-th smallest element)

Time Complexity

Screen Shot 2024-07-03 at 9.50.25 PM.png

Example

Goal: Find 3rd smallest item
Screen Shot 2024-07-03 at 9.53.23 PM.png
Iteration 1:  First do same process as QuickSort
Screen Shot 2024-07-03 at 9.55.54 PM.png
Once i and j cross, QuickSelect deviates:
Screen Shot 2024-07-03 at 9.58.25 PM.png
Once k-1 is found (e.g. 3 - 1 = 2), we terminate.
Screen Shot 2024-07-03 at 9.58.47 PM.png

Food for Thought - Deterministic Pivot Selection

Q: Instead of random pivots, why not deterministic.

Module 12 - Pattern Matching Algorithms

Problem Statement

Input: A small string that is called the pattern  and a longer string called the text. The pattern is of length 𝑚, and the text is of length 𝑛. In some special cases, we may also receive the alphabet  with which the characters in the pattern and string are drawn from of size 𝑠.

Output:  There are two primary variations of this problem. For the single occurrence  version, we output the location (text index) of any occurrence of the pattern as a substring of the text. In the all occurrences  version, we output a list of all locations (text indices) for occurrences of the pattern as substrings in the text. These occurrences can overlap, so if the pattern were 𝑎𝑏𝑎, then the text 𝑎𝑏𝑎𝑏𝑎 would have two occurrences to output.

Brute Force Algorithm

Pseudocode

Screen Shot 2024-07-07 at 2.02.31 PM.png
Screen Shot 2024-07-07 at 6.07.14 PM.png

Example

Screen Shot 2024-07-07 at 2.05.30 PM.png

Time Complexity of Brute Force

Note: Main cases covered in course:
image.png
(Course does not focus on average case)

Boyer-Moore Preprocessing

Last Occurrence Table

Screen Shot 2024-07-07 at 2.12.45 PM.png
Diagram Example:
Screen Shot 2024-07-07 at 2.16.50 PM.png

Pseudocode

Screen Shot 2024-07-07 at 2.13.32 PM.png

CSVizTool

Building Last Occurrence Table:
Screen Shot 2024-07-07 at 2.15.21 PM.png

Bayer-Moore Tracing

Key Idea:

Pseudocode

Screen Shot 2024-07-07 at 2.22.14 PM.png
CSViztool
image.png

Pattern shift cases:
Screen Shot 2024-07-07 at 3.01.59 PM.png

Example 1

Screen Shot 2024-07-07 at 3.04.01 PM.png
Screen Shot 2024-07-07 at 3.04.53 PM.png

Example 2

Screen Shot 2024-07-07 at 3.08.16 PM.png

Time Complexity

image.png

KMP Preprocessing - Failure Table

Key idea: Preprocesses pattern AND text

Failure Table

Screen Shot 2024-07-07 at 4.26.13 PM.png

Creating Failure Table by Inspection

Screen Shot 2024-07-07 at 4.32.55 PM.png

Pseudocode - KMP Failure Table

Screen Shot 2024-07-07 at 4.33.23 PM.png
3 cases in while loop:  Compare pattern at index i to pattern at index j

Example

Complicated, watch video: https://gatech.instructure.com/courses/406950/pages/kmp-failure-table-algorithm?module_item_id=4026224

CSViztool
Screen Shot 2024-07-07 at 4.45.36 PM.png

Tip:
The example presented in the video, the case where 𝑝[𝑖] !=𝑝[𝑗] and 𝑖 !=0 may not be useful. The pseudocode tells us that in this case, we set 𝑖←𝑓[𝑖−1], but why is that? This is because there can be sub-prefixes that may be a potential match. To see this in action, try to compute the failure table of the pattern 𝑎𝑏𝑎𝑎𝑏𝑎𝑏 by the algorithm. Your final result should be [0,0,1,1,2,3,2], which you can confirm by inspection using the definition of what the failure table entries should represent.

Time Complexity

Efficiency
Big-O: Failure table construction is O(m + m) → O(m)

KMP Search Algorithm

Comparing Boyer-Moore and KMP

Screen Shot 2024-07-07 at 4.47.54 PM.png
(Errata: “To clarify, the first bullet point on the right should read pattern index - 1”)

Pseudocode

Screen Shot 2024-07-07 at 4.48.22 PM.png
image.png

Handling multiple matches


CSVizTool
image.png

Screen Shot 2024-07-07 at 4.45.19 PM.png

Example

Complicated - watch video: https://gatech.instructure.com/courses/406950/pages/kmp-tracing-example?module_item_id=4026230
Screen Shot 2024-07-07 at 4.56.12 PM.png

Time Complexity

image.png
Screen Shot 2024-07-07 at 4.57.05 PM.png

Rabin-Karp

Key idea: 
Screen Shot 2024-07-07 at 5.39.19 PM.png

Pseudocode

Screen Shot 2024-07-07 at 5.39.46 PM.png
image.png

Rabin-Karp Rolling Hash

Screen Shot 2024-07-07 at 5.42.20 PM.png
Example:
Screen Shot 2024-07-07 at 5.45.08 PM.png
Updating the Hash:
Screen Shot 2024-07-07 at 5.46.22 PM.png
Example:
Screen Shot 2024-07-07 at 5.46.52 PM.png
Summary:
Screen Shot 2024-07-07 at 5.49.31 PM.png

Rabin-Karp Tracing

Key Idea: Rolling has, if hash matches (e.g. “cab” vs “acb”) do a inner brute force loop.
Screen Shot 2024-07-07 at 5.50.34 PM.png
Screen Shot 2024-07-07 at 5.52.47 PM.png
(To record all occurrences, just store the indexes where a match occurs while looping)

Efficiency

Screen Shot 2024-07-07 at 5.54.08 PM.png
image.png

Pattern Matching Summary

Boyer-Moore

KMP

Rabin-Karp

Module 12 Review

image.png

Big-O Summaries

image.png
image.png
image.png
image.png

M13 Graph Algorithms

Terminology

Screen Shot 2024-07-13 at 6.03.04 PM.png
Screen Shot 2024-07-13 at 6.05.14 PM.png
Screen Shot 2024-07-13 at 6.05.48 PM.png
Screen Shot 2024-07-13 at 6.06.54 PM.png
Screen Shot 2024-07-13 at 6.07.53 PM.png

Graph Terminology and Conventions

Undirected v. Directed Graphs

Paths, Trails, Cycles, Circuits, and Walks

Path: sequence of non-repeated adjacent vertices (aka “simple path”)

Trail: repeat vertices allowed, but edges can’t repeat (aka “trail”)

Walk: Trail where edge can also repeat.

Cycle: Path where 1st and last vertices are adjacent (and can connect)

Circuit: Trail where 1st and last vertices are adjacent.

image.png

Simple Graphs, Self-Loops, and Parallel Edges

Now let's look at different types of edges and see how they affect a graph:
image.png

Graph Complexity

Connectedness in Graphs

image.png

Dense v. Sparse Graphs

To motivate these definitions, let's first consider a simple graph where the number of edges is maximized, a complete graph
Kn​
Putting this in the context of graph complexity:

Common Graph Families

image.png

Graph Representations

Graph ADT Procedures

Screen Shot 2024-07-13 at 8.09.17 PM.png
Screen Shot 2024-07-13 at 8.09.31 PM.png
Screen Shot 2024-07-13 at 8.09.47 PM.png

Graph Representation - Considerations

Screen Shot 2024-07-13 at 8.12.08 PM.png

Adjacency Matrix

Screen Shot 2024-07-13 at 8.12.32 PM.png

Adjacency List

Screen Shot 2024-07-13 at 8.14.11 PM.png

Edge List

Screen Shot 2024-07-13 at 8.14.39 PM.png

Depth-first Search (DFS)

Implementations:

Non-recursive DFS 

(algorithm not covered in course)
Screen Shot 2024-07-13 at 8.24.29 PM.png
CSVizTool
Screen Shot 2024-07-13 at 8.40.51 PM.png

Recursive DFS

Pseudocode

Screen Shot 2024-07-13 at 8.24.59 PM.png
CSVizTool
Screen Shot 2024-07-13 at 8.36.47 PM.png

Example

Screen Shot 2024-07-13 at 8.34.06 PM.png
CSVIsTool
Screen Shot 2024-07-13 at 8.35.20 PM.png

Efficiency

Notes

Applications of DFS

Breadth-first Search (BFS)

Key Idea: Visit all vertices reachable through connections from some indicated start vertex.

Requirements:   1) Queue, 2) Visited set, 3) Starting vertex

Pseudocode

Screen Shot 2024-07-13 at 9.11.10 PM.png
CSVizTool
Screen Shot 2024-07-13 at 9.13.38 PM.png

Efficiency

BFS vs DFS

BFS when to use:
DFS when to use:
Summary:

M14 - Dijkstra’s Shortest Path

Algorithm

Key Idea: Solve the single source shortest paths problem in a graph.
There are variations of the shortest paths problem, based on certain parameters:
Requirements:  1) Priority queue, 2) Visited set, 3) Map, 4) Source vertex

Pseudocode

Screen Shot 2024-07-14 at 11.25.08 AM.png
image.png

Dijkstra's Algorithm: Vertex Distance Updates

Once a vertex becomes visited, is that vertex's distance ever updated again? Why or why not?

Key Concepts
Why Distances to Visited Vertices Are Not Updated

Advantages of Updating Distance Map in Dijkstra's Algorithm

Dijkstra's algorithm will update the distance map whenever a shorter path is found, regardless of whether or not the vertex has been enqueued or not. What is the advantage of doing this?

Dynamic Path Optimization
Enhanced Accuracy
Improved Efficiency
Priority Queue Effectiveness
Algorithmic Flexibility

Application on Digraphs

Is there any reason that Dijkstra's algorithm would fail for digraphs compared to undirected graphs?

Motivating Dijkstra’s Algorithm


image.png

Efficiency of Dijkstra

Given assumptions:
Space complexity:
Screen Shot 2024-07-14 at 11.46.30 AM.png
Side Note: Some sources may list the time complexity as either 𝑂(|𝑉|2) or 𝑂(|𝐸|+|𝑉|log⁡|𝑉|). The first expression is what happens if an adjacency matrix is used; the cost of going through neighbors dominates the log factors. The second expression is what happens if a Fibonacci heap is used since the amortized cost of adding/updating in a Fibonacci heap is Θ(1). Please keep in mind that Fibonacci heaps are almost never used in practice and are largely for theoretical purposes of bettering the time complexity.

Optimization

Decoupling “visited” vertices from updates to distance map:
Update priority queue updates with a different priority:

Negative Edge Weights and Cycles

Negative Edge Weights

Negative Cycles

image.png

Handling Negative Edge Weights

Summary

Food for Thought -  Heuristics and Search Algorithms

Foundations

AI Applications

Key Algorithms

Heuristics

A* Algorithm

Conclusion


Efficiency in Graphs - Review

Screen Shot 2024-07-13 at 9.35.47 PM.png
Screen Shot 2024-07-13 at 9.36.28 PM.png
Screen Shot 2024-07-13 at 9.37.02 PM.png
image.png

M15 Minimum Spanning Trees (MSTs)

MSTs

Prim’s Algorithm

Key Idea

Key Idea: Very similar to Dijkstra, but only considers shortest distance of immediate edge (not cumulative like Dijkstra)
Screen Shot 2024-07-14 at 2.57.52 PM.png
Errata: 

Observing Graphs

Graph cut - takes a subset of v’s and all e’s connecting them
Screen Shot 2024-07-14 at 2.56.47 PM.png

Requirements for Prim’s Algorithm

Screen Shot 2024-07-14 at 2.58.22 PM.png

Pseudocode

Screen Shot 2024-07-14 at 2.58.44 PM.png
image.png

Efficiency

Time Complexity

Java's PriorityQueue

Cut Property

image.png

Key Points

Example

Watch video: 
Screen Shot 2024-07-14 at 3.08.05 PM.png
Screen Shot 2024-07-14 at 3.08.29 PM.png
Screen Shot 2024-07-14 at 6.24.08 PM.png
Screen Shot 2024-07-14 at 6.26.04 PM.png
Final MST has minimum distance to get from one v to another v

Warning: There is a point we'd like to make it clear that the shortest path problem solved by Dijkstra's algorithm and the MST problem solved by Prim's algorithm are mostly unrelated to each other. Albeit, the pseudocode of the two algorithms appears similar.

Greedy Paradigm

Definition
Examples
Characteristics
Limitations
Greedy Strategy in TSP
Advantages Despite Limitations
Conclusion

Krukal’s Algorithm

Overview

Screen Shot 2024-07-14 at 7.43.46 PM.png
Screen Shot 2024-07-14 at 7.45.04 PM.png
Screen Shot 2024-07-14 at 7.45.24 PM.png

Pseudocode

Screen Shot 2024-07-14 at 7.45.47 PM.png
CSVizTool
image.png

Example

Video: 
Screen Shot 2024-07-14 at 7.47.09 PM.png
Screen Shot 2024-07-14 at 7.58.26 PM.png

Algorithm Steps

Motivation
Motivation for Disjoint Sets
image.png

Example of where a visited set will fail (DF edge will create a cycle)


Disjoint Steps

Disjoint Set (Union-Find) Data Structure

Kruskal's Algorithm Context:

Tree-based Solution

Data Structure Requirements:
Operations:
How the operation works:
image.png
Efficiency

Path Compression and Rank

Union by Rank
New find and union
Screen Shot 2024-07-14 at 7.35.39 PM.png

Summary

Efficiency of Kruskal's Algorithm

Disjoint Set Optimizations: The find and union operations have an amortized cost of O(alpha |V|) where (alpha) is the inverse Ackermann function. This function grows extremely slowly, making these operations nearly constant time O(1) for all practical purposes.

Kruskal's Algorithm Efficiency:
Overall Complexity: Since dequeuing and cycle detection dominate, the overall time complexity of Kruskal's algorithm is O(|E| log |E|).

Simplification for Simple Graphs: If the graph is simple (no self-loops or parallel edges), then |E| = O(|V|)^2.  Simplifies to O(|E| log |V|)

Prim vs Kruskal’s Algorithm

Time Complexity

Dense Graphs

Sparse Graphs

Special Cases

Other Considerations

Conclusion


M16 - Dynamic Programming

Key Idea: Store the solutions to subproblems so that when they are needed for reuse, we do not recompute the subproblem
Screen Shot 2024-07-19 at 10.16.23 PM.png
Naive Fibonacci:
Screen Shot 2024-07-19 at 10.17.04 PM.png
DP Fibonacci:
Screen Shot 2024-07-19 at 10.18.00 PM.png

Dynamic Programming: More Formally

Dynamic Programming (DP) Overview:
Fibonacci Example:
Top-Down vs. Bottom-Up DP:
Nuances and Benefits of DP:
Underlying Implicit DAG in DP:
Key Takeaway: Dynamic Programming is a powerful technique for solving problems with overlapping subproblems and optimal substructure, using either a top-down (with memoization) or bottom-up approach to efficiently compute solutions while managing increased space complexity.

The P = NP Problem

Introduction
Definitions
Why do we care
image.png

Hard vs. Easy Problems
The P = NP Question
Consequences of P = NP
Perspectives
Conclusion

Exploratory Lab: Modeling Dynamic Programming

Problem 1: Largest Contiguous Sum

Suppose we have an array𝐴 of integers of length 𝑛. We'd like to compute the largest contiguous sum of the array, which is a block of numbers for which the sum is maximized. For example, suppose our array is [−1,52,−1,4​​,5,−3,1,3]. Our largest contiguous sum is denoted having a value of 5. In the worst case, you can always choose the empty sum, which gives a value of 0. Come up with a dynamic programming algorithm to compute the value of the sum.
Solution:
Scratch Pad:
- Loop over sets of 2 to n sets where n is sum of entire array.
- Each smaller loop keeps the sum of 2 saved, which is then used by loop of 3 sets
- Keep an additional data structure storing the set of number with the greatest max 
  so far.
Screen Shot 2024-07-20 at 4.55.17 PM.png

More explanation of solution:

Problem 2: Longest Increasing Subsequence

Suppose that we have an array𝐴 of length 𝑛 as input consisting of integers. A subsequence of the array is a sequence of some of the array entries in the order that they appear. For example, suppose that we have the array [1,7,3,5,2,8,10,24,−1,−5,4]. Then, 3,2,−1,−5,4 is a valid subsequence since the numbers appear in the sequence in array order, 1,7,10,8 is not a valid subsequence. An increasing subsequence is a subsequence where the numbers go in (strictly) increasing order. Come up with a dynamic programming algorithm to compute the length of the longest increasing subsequence of the array.
Solution:
Scratch pad:
- Initial thought would be to approach the problem recursively, similar to merge sort.
- Loop i starts a A[0] and continues
- Loop j accumulates integers from i until array is non-increasing (n-1 > n)
    - Longest series from A[n] is saved.
- Continue doing this, while checking if some longest running series is longer than
  the current max.  Replace if it is longer.

Problem 3: Rod Cutting

We have a rod of length 𝑛 centimeters. We'd like to chop up this rod into smaller segments to sell at the market, but the market only accepts rods of lengths ℓ1,…,ℓ𝑘 as marketable items. All of these lengths are integral in nature. Rods of length
ℓ𝑖 sell for cost 𝑐𝑖. Come up with a dynamic programming algorithm to maximize our potential profit.
Solution:


Longest Common Subsequence

Definitions

Screen Shot 2024-07-20 at 5.17.47 PM.png

How do we find the LCS?

Screen Shot 2024-07-20 at 5.19.07 PM.png

Pseudocode

Screen Shot 2024-07-20 at 5.22.32 PM.png
CSVizTool:
image.png
Uses:

Brief Example of LCS

Tracing Forward (orange arrow):
Screen Shot 2024-07-20 at 7.27.14 PM.png
Tracing Backward (blue arrow):  LCS is [A, D]
Screen Shot 2024-07-20 at 7.28.17 PM.png

More Complex Example

Case where there are multiple LCS

Tracing Forward:
Screen Shot 2024-07-20 at 7.31.20 PM.png
Tracing Backward:  LCS is [T, O, A]
Screen Shot 2024-07-20 at 7.32.29 PM.png

In-Depth Example of LCS

This example drives home the point that finding a longest common sequence is NOT unique.

Multiple paths can be traced in this example:
Screen Shot 2024-07-20 at 7.34.12 PM.png
Blocking value changes makes it easier to visualize where the LCS backtrace can go.
Screen Shot 2024-07-20 at 7.34.47 PM.png
One path, [C A C E]
Screen Shot 2024-07-20 at 7.36.47 PM.png
Another path,  [C, A, D, E]
Screen Shot 2024-07-20 at 7.37.11 PM.png
Another path, [C A D E] - same LCS but different path.
Screen Shot 2024-07-20 at 7.37.31 PM.png
and more....

Food-for-thought: Knapsack Problem

What is Knapsack (0-1 Knapsack version)
Solution
image.png
How is this NP-Complete?

Bellman-Ford Algorithm

Pseudocode

The underlying principle of the Bellman-Ford algorithm is simple: every shortest path between two vertices will use at most ∣V∣−1 edges. With this in mind, here's the algorithm. 
def bellmanFord(source):
   Initialize dist, a mapping from vertices to distances.
   for each vertex v:
       distMap[v] = ∞
   dist[source] = 0

   for i = 1, 2, ..., |V| - 1:
       for each directed edge (u, v, w):
           if dist[v] > dist[u] + w
               dist[v] = dist[u] + w

   return dist
Negative Cycle Detection
Example

Below is an example worked out on a small graph. As you can see, on iteration 𝑖, we just find the shortest path that uses 𝑖 edges and update it if it's shorter.
image.png

Food-for-thought: Application: Routing Protocols

Routing protocols like Dijkstra's algorithm and Bellman-Ford algorithm play a crucial role in small internal networks. Networks are modeled as weighted graphs with routers as vertices and physical links as edges. The weights can represent distance, cost, etc. The main goal of routing algorithms is to guide routers on forwarding packets based on routing information.


The goal of routing algorithms is to help routers determine what link to send packets along based on the packet's routing information (source, destination, etc.). Each router has only a local view of the network, seeing only what routers it is physically linked to. This is usually done by forwarding tables, which tell the router where to send the packet based on that information. The problem is: How can each router make informed decisions on where to forward packets with only a local view of the network? Below are two approaches.

Centralized Approach (Link-State)

Decentralized Approach (Distance-Vector)

Both algorithms, despite their simplicity, are vital for efficient network routing, each fitting different network management strategies.

Floyd-Warshall Algorithm

What is it:

Defining the Algorithm

Key traits:
How it works:
def floydWarshall(adjMatrix):
    Initialize dist, a |V| x |V| matrix initialized to take the values of adjMatrix.

   for each vertex vj:
       for each vertex vi:
           for each vertex vk:
               if dist[vi][vk] > dist[vi][vj] + dist[vj][vk]
                   dist[vi][vk] = dist[vi][vj] + dist[vj][vk]

   return dist

Complexity

Negative Cycle Detection

Example

Screen Shot 2024-07-20 at 8.18.11 PM.png

Screen Shot 2024-07-20 at 8.18.22 PM.png

Screen Shot 2024-07-20 at 8.18.31 PM.png

Hamilton and Euler: "Food for Thought"

Hamilton Cycles and Euler Circuits

Key Differences

Similar Formulations, Different Difficulties

Module 15 Review

Key Idea: One useful property of dynamic programming is that when applicable, it can turn simple exponential time algorithms into polynomial time ones

DP Algorithms we covered:

Big O - Complexity