Do shorter words have longer dictionary definitions?

Author

Eliott Kalfon

Published

May 10, 2025

It all started with books

Most of the books I read are on my Kindle. One of the features of the Kindle I love is its dictionary feature. I use it frequently when I read books in other languages or in 19th-century (or older) English.

I also mostly read before bed. At this time of the day, my movements are not always very precise, and I sometimes end up checking the definition of the wrong word. As an example, reading King Lear, and coming across the line:

Fool: Let me hire him too.—Here’s my coxcomb.

King Lear, Act I Scene 4

By mistake, I would select “my” instead of “coxcomb”.

Doing so more times than I would like to admit on the internet, I realised that short words seemed to have an unusually long dictionary definition compared to longer words. As an example, the definition of “my” is actually much longer than the definition of “coxcomb”.

From the Merriam-Webster dictionary, we get:

“coxcomb, noun:

a jester’s cap adorned with a strip of red

a conceited foolish person”

The more you know, and don’t forget to bring your coxcomb next time.

And:

“my, adjective:

of or relating to me or myself especially as possessor, agent, object of an action, or familiar person

used interjectionally to express surprise and sometimes reduplicate

used also interjectionally with names of various parts of the body to express doubt or disapproval”

I know, this specific example is only a sample of one, but believe me, I have accumulated a larger sample over the years.

Validating this hypothesis

As a Data Scientist, I would not be satisfied with either intuition built over time or this single example. Can we validate this with data?

If I could get all dictionary definitions for all common words in the English language, I could simply check the average length of the definition of each word length.

This sounds like a lot of pointless work. Luckily, we have GPT-4, who solved this in just a few seconds using WordNet. WordNet is a lexical database containing a lot of useful features such as semantic groupings (groups of related words). In this case, only the short definitions will be used.

Some cool things you can do with WordNet.

The GPT-generated script below shows many of the cool things we can do with WordNet.

import nltk
from nltk.corpus import wordnet as wn

# Make sure you have the required resources
# nltk.download('wordnet')
# nltk.download('omw-1.4')

# 1. Find Synonyms of a Word
synonyms = set()
for syn in wn.synsets('happy'):
    for lemma in syn.lemmas():
        synonyms.add(lemma.name())
print("1. Synonyms of 'happy':", synonyms)
# Output: {'felicitous', 'well-chosen', 'happy', 'glad'}

# 2. Find Antonyms of a Word
antonyms = set()
for syn in wn.synsets('happy'):
    for lemma in syn.lemmas():
        if lemma.antonyms():
            antonyms.add(lemma.antonyms()[0].name())
print("2. Antonyms of 'happy':", antonyms)
# Output: {'unhappy'}

# 3. Get Definitions and Examples
print("3. Definitions and examples for 'computer':")
for syn in wn.synsets('computer'):
    print(f"  Definition: {syn.definition()}")
    print(f"  Examples: {syn.examples()}")
    print()
# Output (truncated):
#   Definition: a machine for performing calculations automatically
#   Examples: ['...']
#   Definition: an expert at calculation (or at operating calculating machines)
#   Examples: []

# 4. Explore Hypernyms (More General Terms)
syn = wn.synsets('dog')[0]
hypernyms = [hypernym.name() for hypernym in syn.hypernyms()]
print("4. Hypernyms of 'dog':", hypernyms)
# Output: ['canine.n.02', 'domestic_animal.n.01', 'carnivore.n.01']

# 5. Explore Hyponyms (More Specific Terms)
hyponyms = [hyponym.name() for hyponym in syn.hyponyms()]
print("5. Hyponyms of 'dog':", hyponyms)
# Output (truncated): ['basenji.n.01', 'puppy.n.01', 'working_dog.n.01', ...]

# 6. Find the Path Similarity Between Two Words
dog = wn.synsets('dog')[0]
cat = wn.synsets('cat')[0]
similarity = dog.path_similarity(cat)
print("6. Path similarity between 'dog' and 'cat':", similarity)
# Output: 0.2 (value may vary)

# 7. Get All Lemmas for a Synset
syn = wn.synsets('run', 'v')[0]
lemmas = [lemma.name() for lemma in syn.lemmas()]
print("7. Lemmas for the first verb synset of 'run':", lemmas)
# Output: ['run']

# 8. Find Meronyms (Parts of a Whole)
syn = wn.synsets('car')[0]
meronyms = [meronym.name() for meronym in syn.part_meronyms()]
print("8. Meronyms (parts) of 'car':", meronyms)
# Output: ['car_door.n.01', 'accelerator.n.01', 'brake.n.01', ...]

# 9. Find Holonyms (Whole of Which This is a Part)
syn = wn.synsets('wheel')[0]
holonyms = [holonym.name() for holonym in syn.part_holonyms()]
print("9. Holonyms (wholes) of 'wheel':", holonyms)
# Output: ['car.n.01', 'bicycle.n.01', 'locomotive.n.01', ...]

# 10. Get All Synsets for a Word and Part of Speech
print("10. Verb synsets for 'play':")
for syn in wn.synsets('play', pos=wn.VERB):
    print(f"  {syn.name()}: {syn.definition()}")
# Output (truncated):
#   play.v.01: participate in games or sport
#   play.v.02: act or have an effect in a specified way or with a specific effect or outcome
#   ...

The following Python script gathers all the words in WordNet and their definitions. It then computes the average length of the definition (in number of words) of words from 1 to 15 characters.

Expand to see the code

import nltk
from nltk.corpus import wordnet as wn
import pandas as pd
import numpy as np
from scipy.stats import pearsonr

def collect_lemma_data():
    """Extract and process lemma data from WordNet."""
    records = []
    # Get all single-word lemmas (no underscores)
    lemmas = {l.lower() for syn in wn.all_synsets() 
                        for l in syn.lemma_names() 
                        if "_" not in l}
    
    for lemma in lemmas:
        syns = wn.synsets(lemma)
        if not syns:
            continue
            
        # Extract definitions and calculate statistics
        defs = [s.definition() for s in syns if s.definition().strip()]
        if not defs:
            continue
            
        def_lens = [len(d.split()) for d in defs]
        
        # Calculate POS distribution
        pos_counts = {}
        for s in syns:
            pos_counts[s.pos()] = pos_counts.get(s.pos(), 0) + 1
            
        records.append({
            'lemma': lemma,
            'word_length': len(lemma),
            'mean_def_len': np.mean(def_lens),
            'med_def_len': np.median(def_lens),
            'sense_count': len(syns),
            'primary_pos': max(pos_counts, key=pos_counts.get)
        })
        
    return pd.DataFrame(records)

def summarize_by_bin(df, key, target, agg_funcs):
    """Group data by key and calculate aggregate statistics.
    
    Args:
        df: DataFrame containing the data
        key: Column to group by
        target: Column to aggregate
        agg_funcs: Dictionary of {name: function} for aggregation
    
    Returns:
        DataFrame with aggregated statistics
    """
    # Group and convert to lists for custom aggregation
    grouped = df.groupby(key)[target].apply(list).to_frame('values')
    
    # Apply each aggregation function
    for name, func in agg_funcs.items():
        grouped[name] = grouped['values'].map(func)
        
    # Add count of items in each bin
    grouped['count'] = df.groupby(key).size()
    
    return grouped.drop(columns=['values'])

def main():
    print("Collecting data from WordNet…")
    df = collect_lemma_data()
    print(f"Total lemmas: {len(df)}")

    # Calculate correlation coefficients
    for vars, label in [
        ((df.word_length, df.mean_def_len), "word_length vs mean_def_len"),
        ((df.word_length, df.sense_count), "word_length vs sense_count"),
        ((df.sense_count, df.mean_def_len), "sense_count vs mean_def_len")
    ]:
        r, p = pearsonr(*vars)
        print(f"Pearson r({label}) = {r:.3f}, p = {p:.2g}")
    print()

    # Analyze definition length by word length
    agg_funcs = {
        'mean': np.mean,
        'median': np.median,
        'q25': lambda x: np.percentile(x, 25),
        'q75': lambda x: np.percentile(x, 75)
    }
    
    by_len = summarize_by_bin(df, 'word_length', 'mean_def_len', agg_funcs)
    
    print("By word_length (def_len stats):")
    print(by_len.head(15).to_string(formatters={
        'mean': '{:.2f}'.format,
        'median': '{:.2f}'.format,
        'q25': '{:.2f}'.format,
        'q75': '{:.2f}'.format,
        'count': '{:d}'.format
    }))

if __name__ == "__main__":
    main()

I got the following results:

Total lemmas: 83118
Pearson r(word_length vs mean_def_len) = -0.097, p = 2.4e-174
Pearson r(word_length vs sense_count) = -0.219, p = 0
Pearson r(sense_count vs mean_def_len) = -0.075, p = 4.8e-105

Plotting the results, we can see a monotonous decrease in the average definition length with word length

Showing the definition length (in words) statistics by word length (in characters):

word_length	mean	median	q25	q75	count
1	12.94	12.63	9.75	16.50	36
2	12.67	11.00	6.94	16.34	376
3	12.40	10.00	7.00	15.00	1298
4	10.47	8.86	6.67	12.50	3209
5	10.23	9.00	6.29	12.67	5457
6	10.13	9.00	6.00	12.67	8448
7	9.90	8.50	6.00	12.25	10410
8	9.60	8.00	6.00	12.00	11482
9	9.41	8.00	5.40	12.00	11479
10	9.02	8.00	5.00	11.00	9687
11	8.76	7.50	5.00	11.00	7501
12	8.59	7.00	5.00	11.00	5370
13	8.37	7.00	5.00	10.00	3621
14	8.86	7.33	5.00	11.00	2151
15	9.04	8.00	5.00	11.00	1228

I was amazed by the result. Over more than 80,000 words, the average length of the definition decreases with word length (until reaching 14 characters). This is a much clearer outcome than what I had expected.

I was also impressed by how quickly I moved from an initial intuition to the finished correlation analysis shown above.

Wondering why

Laziness and intuition

But why is this the case?

As a Computer Scientist, human laziness is the first explanation I found. When we speak, we aim to pack as much information into as few words as possible. This does not seem to apply to Whatsapp voice notes though.

Joking aside, we can notice this in abbreviations such as “can’t”, “I’m”, “wanna”, or “Wassup?”.

My partner has a relatively long first name. I always call her by her nickname (and sometimes even a shorter version of this nickname).

If I can choose between two words that mean the same thing, I will probably choose the shorter one.

I also believe that the more we use a word, the more distinct meanings it develops. We are lazy so we use shorter words. The more we use shorter words, the more meanings they take on.

This was the intuition that got this research started. But was I missing something?

Digging deeper

I started with a Google search. To my great disappointment, I could not find anything. I quickly moved to ChatGPT to see if it could do better. Generally, it agreed with the theory above, adding some more structure:

Shorter words are older and more frequently used and accumulate meanings (polysemy)
Shorter words generally have multiple grammatical roles. For example, the word “up” can be an adverb (“stand up”), a preposition (“go up the stairs”), an adjective (“the up escalator”) and a verb (“He upped his game”)
Shorter words encompass abstract foundational concepts like “love”, “time”, “have” or “with”, that are more complex to define
Longer words generally represent more complex concepts and can be built from one or more shorter words. As an example, you can define “aftermath” as the consequences of an event, without having to worry about defining “after”

Final Thoughts

Shorter words are used more widely and accumulate more meanings over time. They also serve various grammatical roles which all need to be defined. Also, WordNet is a fantastic tool for lexical analysis. As language modelling progresses, we should be able to learn such representations of words from a large corpus of texts (which probably includes WordNet…). This is the movement that was started with models like Word2Vec and everything that followed.

This was such a fun article to write. We live in an era in which it is very easy to indulge in one’s curiosity. It is too early to say the impact this will have on human knowledge. With all the existential risk posed by a possible AI-driven intelligence explosion, this is at least something to be excited about.