The LLM Inflation Paradox

I used Gemini in Gmail to turn bullet points into a professional email. My colleague got the email, saw the Gemini summary at the top, and read that instead. We both saved time. The information transferred perfectly. Something still feels off.

The Pattern

It's not just emails. Look around:

You draft a document in Notion - AI expands your outline into full paragraphs. Your teammate opens it, clicks "summarize," reads three bullet points. The meeting notes you spent five minutes generating with Claude get fed into another LLM by the next reader. Code comments you let Copilot write get compressed by the reviewer's AI assistant.

Slack messages. Google Docs. Confluence pages. Pull request descriptions. Everywhere you look, the same cycle: expand before sending, compress on arrival.

We've built this loop into everything we write.

Overhead Isn't New

We've always added overhead to communication. The question is: what does it buy us?

Bureaucracy adds paperwork for accountability. Every decision gets documented. Every approval needs a signature. A $50 purchase request needs five forms. The overhead serves a purpose: prevent fraud, enable audits, create paper trails. The cost is justified.

TCP/IP adds headers for reliability. Every packet carries 40 bytes of routing information. For a 1-byte payload, that's 4,000% overhead. But without those headers, packets get lost, arrive out of order, or corrupted. The overhead enables the system to function.

Professional emails grew verbose for signaling. "Approved" became "I hope this email finds you well. After careful review with stakeholders, I'm pleased to confirm we can proceed. Please let me know if you have concerns. Best regards." Same information. Different signal. The verbosity shows effort, politeness, care.

Then LLMs arrived and made signaling free. Everyone can sound professional instantly. So what happens?

The Inversion

Traditional compression: minimize what travels over the wire.

Traditional

Start

3 bytes

↓

Compress

1 byte

↓

Transmit

1 byte

↓

Decompress

3 bytes

Minimize wire cost

LLM Decompression

Start

3 bytes

↓

AI Expand

10 bytes

↓

Transmit

10 bytes

↓

AI Compress

3 bytes

Maximize wire cost

Same data in. Same data out. But look at what crosses the wire.

The individual time savings are real. Your cognitive effort drops. LLM processing happens in parallel - you're not waiting. The recipient gets exactly what they need. Every link in the chain gets faster.

But the system? The system's burning energy on both ends for the same result.

The Pollution Problem

Here's the thing about expanded content: it doesn't vanish after the recipient compresses it. It stays. Search indices. Vector databases. Email archives. Git history. Documentation sites. Every expanded message becomes permanent pollution.

Imagine if every time you spoke, someone recorded it, transcribed it, expanded the transcription to 10× length, archived all versions, and made them all searchable. Now imagine trying to find anything in that archive.

That's what's happening. Google search accuracy dropped 10% in the past year. AI content now comprises ~20% of results. When you search, click-through rates fall 47% because the AI summary is good enough - but finding the actual source takes longer. Zero-click searches went from 56% to 69% in twelve months.

We built tools to save time. Then filled every database with verbose copies of the same information, making search slower for everyone.

The Scale

One message? Negligible. Billions daily? The numbers get interesting. The calculator below uses rough estimates - not rigorous calculations, just enough to get a feel for the magnitude:

Global Impact Calculator

Adjust the sliders to explore different scenarios. These are rough estimates to illustrate scale.

Daily messages globally: 100B

100B500B

Messages using AI expansion: 10%

1%50%

Average expansion factor: 2×

2×10×

Recipients using AI compression: 0%

0%100%

Annual Global Impact

Energy

1.1 TWh

per year

CO₂

0.4M

million tonnes/yr

Bandwidth

1381 TB

extra (gzipped)

Storage

1726 TB

per year

Annual Costs (USD)

Energy cost:$158M

Bandwidth cost:$0M

Storage cost:$0M

Total waste:$158M

Show calculation assumptions

Parameter	Value	Source
Average message length	130 tokens	Business email avg: 75-100 words
Energy per token	0.00000111 kWh	arXiv 2505.09598v1 (LLM inference benchmark)
Compression energy ratio	50% of generation	Estimate (summarization uses less compute)
CO₂ per kWh	0.4 kg	US grid average
Token size	4 bytes	UTF-8 approximate
Electricity cost	$0.15/kWh	Global average
Cloud storage cost	$0.023/GB/year	AWS S3 pricing
Bandwidth cost	$0.08/GB	CDN average
Gzip compression (terse)	40%	Short messages compress well
Gzip compression (verbose)	60%	AI text compresses less (repetitive tokens)
Daily messages globally	~330B/day	~330B emails + ~20B+ chat messages

Note: These are rough estimates, not rigorous calculations. They're meant to give a sense of scale. Actual impact varies significantly by model efficiency, hardware, grid carbon intensity, adoption rates, and usage patterns.

What Now?

We've optimized each node while degrading the network. Every individual saves time. Every database fills with noise. Search gets slower. Storage gets more expensive. The energy meters spin faster. And we keep expanding because stopping means falling behind.

The technical fixes exist. Transmit intent, not verbosity. Let the recipient's AI render politeness client-side, matched to their culture and preferences. Store the compressed version. Tag AI-generated content so search engines can filter it. Build compression-first workflows instead of expansion-first ones. None of this is hard to imagine.

But here's the trap: those solutions require coordination. Someone has to build the protocol. Everyone has to adopt it. Search engines have to respect the tags. Email clients have to render intent identically. And right now, every individual's incentives point the other direction - expand your message, let someone else deal with compression.

Which raises harder questions. When professional tone became free, did the signal collapse? The pollution's already in the indices - can we clean it out, or is search permanently degraded? The energy costs compound with scale - at what point does the math force change? And most importantly, who profits from this?

Maybe we're watching a market failure play out in real time. Individually rational. Collectively wasteful. Familiar pattern.

(Irony: this article was written with LLM assistance. I expanded my notes, you're reading the result. Whether you compress it back is up to you.)

The LLM Inflation Paradox

The Pattern

Overhead Isn't New

The Inversion

The Pollution Problem

The Scale

Global Impact Calculator

Annual Global Impact

Annual Costs (USD)

What Now?

Enjoyed this? Share it with your friends!

References