Smart Algorithms Use Meta-Information to Solve a Problem
While preparing for the introductory lecture of Algorithms and Programming, a course I was about to teach, I was struggling to find the core idea that would bind the fascinating ideas of the syllabus together.
Then, on a cold March morning, in Berlin’s Staatsbibliothek, when turning a page from Tony Crilly’s 50 Mathematical Ideas, it dawned on me. It is by rediscovering the elegance of Euclid’s algorithm that I realised that smart algorithms leverage information about the problem (i.e., “meta-information”) to solve it.
Meta-information refers to information about the problem itself. This includes properties, patterns or structures which can be used to reduce the computational effort required. This notion will become clearer when illustrated in the two examples below.
Finding the Greatest Common Divisor
Euclid’s algorithm solves the problem of finding the Greatest Common Divisor (GCD) of two numbers. The GCD (also known as Greatest Common Factor, GCF) has application in many fields such as network design or operations. As an example, the GCD of 120 and 180 is 60, as 60 is the largest divisor of both 120 and 180.
A more formal definition of this problem, given two integers \(a\) and \(b\) with \(a > b\), the \(gcd(a,b)\) is the largest integer that divides both integers \(a\) and \(b\).
How Would You Do It?
Like most problems, this can be approached by brute force, by listing the factors of both numbers.
Factors of 120: 1, 2, 3, 4, 5, 6, 8, 10, 12, 15, 20, 24, 30, 40, 60, 120
Factors of 180: 1, 2, 3, 4, 5, 6, 9, 10, 12, 15, 18, 20, 30, 36, 45, 60, 90, 180
From this decomposition, we see that the largest common factor of these two numbers is 60. This can be done programmatically with complexity \(O(b)\), by iterating over every integer from 1 to \(b\), checking at each step if the integer divides both \(a\) and \(b\). The following Python code should do the trick (please do not use this at home).
Python Code
def brute_force_gcd(a, b):
= 1
gcd for i in range(1, b + 1):
if a % i == 0 and b % i == 0:
= i
gcd return gcd
print(brute_force_gcd(180, 120)) # Output: 60
Euclid’s algorithm leverages the fact that the GCD of two numbers is also the GCD of their difference. Iteratively getting the difference between the two numbers until the two numbers are the same. This number will be the GCD.
As an example with \(120\) and \(180\):
- \(180 - 120 = 60\), taking the difference between \(120\) and \(60\)
- \(120 - 60 = 60\), the two numbers are now \(60\) and \(60\), the GCD
Some Mathematical Refinements
This is a simplified version, Euclid’s algorithm uses the modulo operator instead of the difference to speed up the process. For instance, given two numbers \(180\) and \(120\), using modulo instead of difference allows us to skip a step:
\(180 - 60 = 120\)
\(120 - 60 = 60\)
\(60 - 60 = 0\), \(60\) is the GCD
\(180 \mod 120 = 60\)
\(180 \mod 60 = 0\), \(60\) is the GCD
Python Code
def euclid_gcd(a, b):
while b:
= b, a % b
a, b return a
print(euclid_gcd(180, 120)) # Output: 60
For more details on the complexity of Euclid’s algorithm, read this short article.
Searching Through a Sorted Array
Another example of this is Linear and Binary Search in a sorted array. The problem of search is to determine whether a certain value (the “target” value) is contained within an array, and if yes, to return its position.
Given the sorted array [1, 3, 5, 7, 9, 11, 13, 15, 17]
, how would you determine if 11 is contained in the array? The linear approach (Linear Search) will iterate over all elements in the array. In the worst-case scenario, this algorithm will have to iterate over the entire array.
Can You Think of a Smarter Way?
If you came up with the following solution:
This array contains all odd numbers between 1 and 17 included. Based on this premise, 11 is part of this array at position \(1 + 2 \times n = 11\), \(n = 6\).
Congratulations! You are already using information about the problem to solve it. This is the spirit. Let’s, however, look into a more general way of solving this problem, as we cannot assume that all sorted arrays we get are odd numbers within a given interval.
The Binary Search approach leverages the fact that the array is sorted, a piece of information that Linear Search ignores. Because the array is sorted, I can start with the middle of the array. If the value is greater than the value we are looking for (i.e., the “target” value), we know that the target value, if it is in the array, must be to the left of the middle. We can then discard the right or “upper” part of the array and apply the same process to the lower part of the array. We repeat this process recursively until we get a sub-array of size 1.
In the case of searching for 11, the Binary Search process would look like this:
- Checking the middle value of the array
[1, 3, 5, 7, 9, 11, 13, 15]
, 11 is greater than 7, applying binary search on sub-array[9, 11, 13, 15]
- Checking the middle value of the array
[9, 11, 13, 15]
, 11 is found, search successful
This process finds 11 in 3 steps instead of 6. The difference is even more dramatic when looking for a value which does not exist in the array. Where the Linear Search would have to iterate through the 8 elements of the array, Binary Search could do this in only 3 steps. As an example, searching for 18:
- Checking the middle value of the array
[1, 3, 5, 7, 9, 11, 13, 15]
, 18 is greater than 9, applying binary search on sub-array[11, 13, 15]
- Checking the middle value of the array
[11, 13, 15]
, 18 is higher than 13, applying binary search on sub-array[15]
- Checking the middle value of the sub-array
[15]
, 18 is not found
It is by using the information that the list is sorted that the Binary Search algorithm avoids exhaustive search.
Final Thoughts
Going back to the thread tying the course I was to teach a month after this realisation, smart algorithms leverage information about a problem to solve it. The examples studied above showed how the following pieces of meta-information were used to dramatically decrease computation time:
- Euclid’s algorithm: the GCD of two numbers is also the GCD of their difference
- Binary search: large portions of a sorted array can be skipped in the search process
The next time you are stuck on a problem, can you think of additional information you could leverage? Can you avoid exhaustive search?