Understanding the World with Occam’s Razor

Survival of the Simplest

Computers

Machine Learning

Author

Eliott Kalfon

Published

August 10, 2025

If you cannot find your keys, it could be because a wild gnome broke into your flat, stole them and brought them back to the underworld. It could also be because you left them in an unusual spot last evening.

Which of these two hypotheses is most likely to be true? This is where Occam’s Razor comes in.

Occam’s Razor is an idea: the idea that when two different hypotheses equally explain reality, the simplest is most likely to be true.

Hypotheses and Reality

Before moving on, let’s start by defining some terms.

Hypothesis: a proposed explanation of reality

Some example hypotheses include:

The earth orbits around the sun
The sun revolves around the earth
We live in a simulation

An explanation gives the reason or cause of an event (link).

Seeing the sun rise and fall every day, I can ask myself why. There, two hypotheses could equally explain this daily occurrence:

The sun revolves around the earth
The earth orbits around the sun

How could we use Occam’s Razor to decide which of the two is most likely to be true? This example will be expanded later in the article. We will start with a simpler case.

Trees and Bikes

Imagine that as you walk around a city, you find what seems to be a bike behind a tree:

The following hypotheses explain this image equally well:

H1: There is a bike behind a tree
H2: There are two halves of a bike behind a tree, giving the illusion of a single bike

From experience, you may be able to tell that bikes are much more common than half-bikes. Also, half-bikes are rarely perfectly positioned behind trees, making the second hypothesis even less likely. This is Occam’s Razor in action.

Occam’s Razor and the Universe

Let’s return to the sun rising and setting. The following hypotheses explain this reality:

H1: The sun revolves around the earth, geocentrism
H2: The earth orbits around the sun, heliocentrism

At first, H1 seems to be the most plausible. I do not feel the earth moving under my feet, it would be strange to imagine that our planet could move without me perceiving it.

Yet, when we start to think about it, this hypothesis would require all the stars to move around the earth as well. These stars, moving around the earth, would need keep the same distance between them. This is starting to feel like a strong requirement.

Then come the planets. These planets move around the earth, but unlike the stars, their relative positions also change. Explaining these movements is very difficult. The Egyptian mathematician Ptolemy did come up with a sophisticated model involving epicycles (“circles upon circles”):

Even though geocentrism was eventually proven wrong, and would require frequent recalibration to generate accurate predictions, it is still a useful model of reality. This method was also a precursor of Fourier analysis.

Summarising, Hypothesis 1 would require:

Stars to move around the earth and keep their relative position
Planets to move in epicycles around the earth

On the other hand, the heliocentric model (i.e., the sun at the centre) solved a lot of these problems with simple orbits around the sun.

Using Occam’s Razor, this second hypothesis seems much more likely to be true.

Scientists would only have to tackle two interesting problems:

Why don’t we feel the earth’s movements?
What are the forces at work in these orbits?

Some good questions. It turns out that the earth’s movement is constant and that we only feel accelerations. This is the reason why we do not feel when a high speed train travels at 300km/h. We only feel acceleration, deceleration and slight bumps on the tracks.

A few years after Copernicus, Isaac Newton discovered the universal law of gravitation; the fact that mass attracts mass.

\[ F = G \cdot \frac{m_1 \cdot m_2}{r^2} \]

Where:

\(F\) is the force between two masses
\(G\) is the gravitational constant
\(m_1\) and \(m_2\) are the masses of the two objects
\(r\) is the distance between the centres of the two masses

Occam’s Razor strikes again. Now let’s try to think about it in numbers.

Trees and Boxes

This example is inspired by McKay’s fantastic Information Theory, Inference and Learning Algorithms link. It will involve multiplying probabilities (and more cartoons). If this is not your thing, you may want to skip to the conclusion directly.

As you walk around a city, you stumble upon the following sight:

or more geometrically:

If asked to describe this scene, you would probably say: a tree in front of a box. Yet, there are two different hypotheses that explain this scene equally well:

H1: A box behind the tree
H2: Two boxes of the same colour behind a tree, giving the illusion of a single box

or:

Abstract representation of the two hypotheses

From experience, it seems like our perception system has a bias towards the simplest hypothesis. We would, by default, think that there is a single box.

Another way to reason about this scene is to think in terms of probability. What are the odds of seeing one or two separate boxes perfectly positioned behind a tree?

Let’s assume the following:

A box’s height can be any of 20 different heights (10-200cm)
A box’s width can be any of 20 different widths (10-200cm)
A box can have one of three colours with equal probability: cardboard brown, white or grey
A box’s position can have any of 20 positions (0-190cm), we stick to a one dimensional space for now

Looking at the hypothesis of the single box, its probability given the data is:

\[ \frac{1}{20} \cdot \frac{1}{20} \cdot \frac{1}{3} \cdot \frac{1}{20} \]

Out of all the possible heights, widths, colours and positions, only one box explains the sense data perfectly.

Now, hypothesis 2 requires two boxes of the same height and colour.

For two boxes of the same colour (cardboard brown) and height, we already need:

\[ \frac{1}{20} \cdot \frac{1}{20} \cdot \frac{1}{3} \cdot \frac{1}{3} \]

Hypothesis 2 also requires some starting position and width agreement. The two boxes cannot overlap, and the first box must end beyond the tree, with the second box starting behind the tree as well. The first box needs to start at the position of the data, which has probability \(\frac{1}{20}\)

This box needs to end behind the tree, which leaves a bit of freedom, as the tree covers two possible ending positions:

This gives us a probability of \(\frac{2}{20} = \frac{1}{10}\).

The second box now needs to start at the exact spot at which the first box ends, or on either of the two positions behind the tree if the first ends early. No overlap is tolerated.

\[ \begin{aligned} \text{Probability} &= \frac{1}{20} \cdot \frac{2}{20} && \text{(box 1 ends at first position)} \\ &\quad + \frac{1}{20} \cdot \frac{1}{20} && \text{(box 1 ends at the second position behind the tree)} \\ &= \frac{3}{400} \end{aligned} \]

Different scenarios for the starting coordinates of box 2

The width of box 2 is fully determined by its starting position. Given its starting position, it must have a width that explains the data. This is a probability of \(\frac{1}{20}\)

Considering the combined probability of H2, we get:

\[ \frac{1}{20} \cdot \frac{1}{20} \cdot \frac{1}{3} \cdot \frac{1}{3} \cdot \frac{1}{20} \cdot \frac{1}{10} \cdot \frac{3}{40} \cdot \frac{1}{20} = \text{A very small number} \]

The ration \(\frac{P(H_1)}{P(H_2)}\) is:

\[\begin{aligned} &= \frac{\frac{1}{20} \cdot \frac{1}{20} \cdot \frac{1}{3} \cdot \frac{1}{20}}{\frac{1}{20} \cdot \frac{1}{20} \cdot \frac{1}{3} \cdot \frac{1}{3} \cdot \frac{1}{20} \cdot \frac{1}{10} \cdot \frac{3}{40} \cdot \frac{1}{20}} \\ &= \frac{1}{\frac{1}{3} \cdot \frac{1}{10} \cdot \frac{3}{40} \cdot \frac{1}{20}} \\ &= 3 \cdot 10 \cdot \frac{40}{3} \cdot 20 \\ &= 8000 \end{aligned} \]

In this example, H1, the simpler hypothesis is 8000 times more likely than H2. There is most likely only one box behind this tree.

Final Thoughts

Occam’s Razor can be a fantastic tool to think about the world. An invitation to consider our assumptions and think in terms of probabilities. The following article showed how it could be used to walk around a city and understand the universe.

This way of thinking has many implications on Machine Learning, the practice of building models of reality. Some models are simpler than others, some explain the data better than others. If that sounds interesting, this topic will be further explored in my next article.

Footnotes

By James Ferguson (1710-1776), based on similar diagrams by Giovanni Cassini (1625-1712) and Dr Roger Long (1680-1770); engraved for the Encyclopaedia by Andrew Bell. - Encyclopaedia Britannica (1st Edition, 1771; facsimile reprint 1971), Volume 1, Fig. 2 of Plate XL facing page 449., Public Domain, Link ↩︎
By Nicolai CoperniciCreated in vector format by Scewing - [1], Public Domain, Link ↩︎