Do shorter words have longer dictionary definitions?
It all started with books
Most of the books I read are on my Kindle. One of the features of the Kindle I love is its dictionary feature. I use it frequently when I read books in other languages or in 19th-century (or older) English.
I also mostly read before bed. At this time of the day, my movements are not always very precise, and I sometimes end up checking the definition of the wrong word. As an example, reading King Lear, and coming across the line:
Fool: Let me hire him too.—Here’s my coxcomb.
King Lear, Act I Scene 4
By mistake, I would select “my” instead of “coxcomb”.
Doing so more times than I would like to admit on the internet, I realised that short words seemed to have an unusually long dictionary definition compared to longer words. As an example, the definition of “my” is actually much longer than the definition of “coxcomb”.
From the Merriam-Webster dictionary, we get:
“coxcomb, noun:
- a jester’s cap adorned with a strip of red
- a conceited foolish person”
The more you know, and don’t forget to bring your coxcomb next time.
And:
“my, adjective:
- of or relating to me or myself especially as possessor, agent, object of an action, or familiar person
- used interjectionally to express surprise and sometimes reduplicate
- used also interjectionally with names of various parts of the body to express doubt or disapproval”
I know, this specific example is only a sample of one, but believe me, I have accumulated a larger sample over the years.
Validating this hypothesis
As a Data Scientist, I would not be satisfied with either intuition built over time or this single example. Can we validate this with data?
If I could get all dictionary definitions for all common words in the English language, I could simply check the average length of the definition of each word length.
This sounds like a lot of pointless work. Luckily, we have GPT-4, who solved this in just a few seconds using WordNet. WordNet is a lexical database containing a lot of useful features such as semantic groupings (groups of related words). In this case, only the short definitions will be used.
Some cool things you can do with WordNet.
The GPT-generated script below shows many of the cool things we can do with WordNet.
import nltk
from nltk.corpus import wordnet as wn
# Make sure you have the required resources
# nltk.download('wordnet')
# nltk.download('omw-1.4')
# 1. Find Synonyms of a Word
= set()
synonyms for syn in wn.synsets('happy'):
for lemma in syn.lemmas():
synonyms.add(lemma.name())print("1. Synonyms of 'happy':", synonyms)
# Output: {'felicitous', 'well-chosen', 'happy', 'glad'}
# 2. Find Antonyms of a Word
= set()
antonyms for syn in wn.synsets('happy'):
for lemma in syn.lemmas():
if lemma.antonyms():
0].name())
antonyms.add(lemma.antonyms()[print("2. Antonyms of 'happy':", antonyms)
# Output: {'unhappy'}
# 3. Get Definitions and Examples
print("3. Definitions and examples for 'computer':")
for syn in wn.synsets('computer'):
print(f" Definition: {syn.definition()}")
print(f" Examples: {syn.examples()}")
print()
# Output (truncated):
# Definition: a machine for performing calculations automatically
# Examples: ['...']
# Definition: an expert at calculation (or at operating calculating machines)
# Examples: []
# 4. Explore Hypernyms (More General Terms)
= wn.synsets('dog')[0]
syn = [hypernym.name() for hypernym in syn.hypernyms()]
hypernyms print("4. Hypernyms of 'dog':", hypernyms)
# Output: ['canine.n.02', 'domestic_animal.n.01', 'carnivore.n.01']
# 5. Explore Hyponyms (More Specific Terms)
= [hyponym.name() for hyponym in syn.hyponyms()]
hyponyms print("5. Hyponyms of 'dog':", hyponyms)
# Output (truncated): ['basenji.n.01', 'puppy.n.01', 'working_dog.n.01', ...]
# 6. Find the Path Similarity Between Two Words
= wn.synsets('dog')[0]
dog = wn.synsets('cat')[0]
cat = dog.path_similarity(cat)
similarity print("6. Path similarity between 'dog' and 'cat':", similarity)
# Output: 0.2 (value may vary)
# 7. Get All Lemmas for a Synset
= wn.synsets('run', 'v')[0]
syn = [lemma.name() for lemma in syn.lemmas()]
lemmas print("7. Lemmas for the first verb synset of 'run':", lemmas)
# Output: ['run']
# 8. Find Meronyms (Parts of a Whole)
= wn.synsets('car')[0]
syn = [meronym.name() for meronym in syn.part_meronyms()]
meronyms print("8. Meronyms (parts) of 'car':", meronyms)
# Output: ['car_door.n.01', 'accelerator.n.01', 'brake.n.01', ...]
# 9. Find Holonyms (Whole of Which This is a Part)
= wn.synsets('wheel')[0]
syn = [holonym.name() for holonym in syn.part_holonyms()]
holonyms print("9. Holonyms (wholes) of 'wheel':", holonyms)
# Output: ['car.n.01', 'bicycle.n.01', 'locomotive.n.01', ...]
# 10. Get All Synsets for a Word and Part of Speech
print("10. Verb synsets for 'play':")
for syn in wn.synsets('play', pos=wn.VERB):
print(f" {syn.name()}: {syn.definition()}")
# Output (truncated):
# play.v.01: participate in games or sport
# play.v.02: act or have an effect in a specified way or with a specific effect or outcome
# ...
The following Python script gathers all the words in WordNet and their definitions. It then computes the average length of the definition (in number of words) of words from 1 to 15 characters.
Expand to see the code
import nltk
from nltk.corpus import wordnet as wn
import pandas as pd
import numpy as np
from scipy.stats import pearsonr
def collect_lemma_data():
"""Extract and process lemma data from WordNet."""
= []
records # Get all single-word lemmas (no underscores)
= {l.lower() for syn in wn.all_synsets()
lemmas for l in syn.lemma_names()
if "_" not in l}
for lemma in lemmas:
= wn.synsets(lemma)
syns if not syns:
continue
# Extract definitions and calculate statistics
= [s.definition() for s in syns if s.definition().strip()]
defs if not defs:
continue
= [len(d.split()) for d in defs]
def_lens
# Calculate POS distribution
= {}
pos_counts for s in syns:
= pos_counts.get(s.pos(), 0) + 1
pos_counts[s.pos()]
records.append({'lemma': lemma,
'word_length': len(lemma),
'mean_def_len': np.mean(def_lens),
'med_def_len': np.median(def_lens),
'sense_count': len(syns),
'primary_pos': max(pos_counts, key=pos_counts.get)
})
return pd.DataFrame(records)
def summarize_by_bin(df, key, target, agg_funcs):
"""Group data by key and calculate aggregate statistics.
Args:
df: DataFrame containing the data
key: Column to group by
target: Column to aggregate
agg_funcs: Dictionary of {name: function} for aggregation
Returns:
DataFrame with aggregated statistics
"""
# Group and convert to lists for custom aggregation
= df.groupby(key)[target].apply(list).to_frame('values')
grouped
# Apply each aggregation function
for name, func in agg_funcs.items():
= grouped['values'].map(func)
grouped[name]
# Add count of items in each bin
'count'] = df.groupby(key).size()
grouped[
return grouped.drop(columns=['values'])
def main():
print("Collecting data from WordNet…")
= collect_lemma_data()
df print(f"Total lemmas: {len(df)}")
# Calculate correlation coefficients
for vars, label in [
"word_length vs mean_def_len"),
((df.word_length, df.mean_def_len), "word_length vs sense_count"),
((df.word_length, df.sense_count), "sense_count vs mean_def_len")
((df.sense_count, df.mean_def_len),
]:= pearsonr(*vars)
r, p print(f"Pearson r({label}) = {r:.3f}, p = {p:.2g}")
print()
# Analyze definition length by word length
= {
agg_funcs 'mean': np.mean,
'median': np.median,
'q25': lambda x: np.percentile(x, 25),
'q75': lambda x: np.percentile(x, 75)
}
= summarize_by_bin(df, 'word_length', 'mean_def_len', agg_funcs)
by_len
print("By word_length (def_len stats):")
print(by_len.head(15).to_string(formatters={
'mean': '{:.2f}'.format,
'median': '{:.2f}'.format,
'q25': '{:.2f}'.format,
'q75': '{:.2f}'.format,
'count': '{:d}'.format
}))
if __name__ == "__main__":
main()
I got the following results:
Total lemmas: 83118
Pearson r(word_length vs mean_def_len) = -0.097, p = 2.4e-174
Pearson r(word_length vs sense_count) = -0.219, p = 0 Pearson r(sense_count vs mean_def_len) = -0.075, p = 4.8e-105
Showing the definition length (in words) statistics by word length (in characters):
word_length | mean | median | q25 | q75 | count |
---|---|---|---|---|---|
1 | 12.94 | 12.63 | 9.75 | 16.50 | 36 |
2 | 12.67 | 11.00 | 6.94 | 16.34 | 376 |
3 | 12.40 | 10.00 | 7.00 | 15.00 | 1298 |
4 | 10.47 | 8.86 | 6.67 | 12.50 | 3209 |
5 | 10.23 | 9.00 | 6.29 | 12.67 | 5457 |
6 | 10.13 | 9.00 | 6.00 | 12.67 | 8448 |
7 | 9.90 | 8.50 | 6.00 | 12.25 | 10410 |
8 | 9.60 | 8.00 | 6.00 | 12.00 | 11482 |
9 | 9.41 | 8.00 | 5.40 | 12.00 | 11479 |
10 | 9.02 | 8.00 | 5.00 | 11.00 | 9687 |
11 | 8.76 | 7.50 | 5.00 | 11.00 | 7501 |
12 | 8.59 | 7.00 | 5.00 | 11.00 | 5370 |
13 | 8.37 | 7.00 | 5.00 | 10.00 | 3621 |
14 | 8.86 | 7.33 | 5.00 | 11.00 | 2151 |
15 | 9.04 | 8.00 | 5.00 | 11.00 | 1228 |
I was amazed by the result. Over more than 80,000 words, the average length of the definition decreases with word length (until reaching 14 characters). This is a much clearer outcome than what I had expected.
I was also impressed by how quickly I moved from an initial intuition to the finished correlation analysis shown above.
Wondering why
Laziness and intuition
But why is this the case?
As a Computer Scientist, human laziness is the first explanation I found. When we speak, we aim to pack as much information into as few words as possible. This does not seem to apply to Whatsapp voice notes though.
Joking aside, we can notice this in abbreviations such as “can’t”, “I’m”, “wanna”, or “Wassup?”.
My partner has a relatively long first name. I always call her by her nickname (and sometimes even a shorter version of this nickname).
If I can choose between two words that mean the same thing, I will probably choose the shorter one.
I also believe that the more we use a word, the more distinct meanings it develops. We are lazy so we use shorter words. The more we use shorter words, the more meanings they take on.
This was the intuition that got this research started. But was I missing something?
Digging deeper
I started with a Google search. To my great disappointment, I could not find anything. I quickly moved to ChatGPT to see if it could do better. Generally, it agreed with the theory above, adding some more structure:
Shorter words are older and more frequently used and accumulate meanings (polysemy)
Shorter words generally have multiple grammatical roles. For example, the word “up” can be an adverb (“stand up”), a preposition (“go up the stairs”), an adjective (“the up escalator”) and a verb (“He upped his game”)
Shorter words encompass abstract foundational concepts like “love”, “time”, “have” or “with”, that are more complex to define
Longer words generally represent more complex concepts and can be built from one or more shorter words. As an example, you can define “aftermath” as the consequences of an event, without having to worry about defining “after”
Final Thoughts
Shorter words are used more widely and accumulate more meanings over time. They also serve various grammatical roles which all need to be defined. Also, WordNet is a fantastic tool for lexical analysis. As language modelling progresses, we should be able to learn such representations of words from a large corpus of texts (which probably includes WordNet…). This is the movement that was started with models like Word2Vec and everything that followed.
This was such a fun article to write. We live in an era in which it is very easy to indulge in one’s curiosity. It is too early to say the impact this will have on human knowledge. With all the existential risk posed by a possible AI-driven intelligence explosion, this is at least something to be excited about.