Shannon Entropy: Measuring Information Content

simulation intermediate ~10 min
Loading simulation...

Formula

H = -\sum_{i=1}^{N} p_i \cdot \log_2(p_i)
H_{\max} = \log_2(N)
\text{Redundancy} = 1 - \frac{H}{H_{\max}}
H \leq \bar{L} < H + 1 \quad \text{(Huffman bound)}
In 1948, Claude Shannon asked a deceptively simple question: how do you measure information? His answer — entropy — became the foundation of the digital age. Shannon entropy H = -Σ p_i·log₂(p_i) measures the average surprise in a message. A fair coin flip carries exactly 1 bit of entropy: each outcome is equally surprising. A loaded coin with 90% heads carries only 0.47 bits per flip — the outcome is mostly predictable, so each flip conveys less information. The key insight is that information is fundamentally about uncertainty. A message that tells you something you already knew carries no information. A message that resolves genuine uncertainty carries maximum information. Entropy quantifies this precisely. For a source with N equally likely symbols, entropy reaches its maximum of log₂(N) bits. Any deviation from uniformity reduces entropy. English text, with its uneven letter frequencies (E appears 13% of the time, Z only 0.07%), has entropy well below the theoretical maximum — which is exactly why English text can be compressed. This simulator lets you build custom probability distributions and observe how entropy responds. Watch how concentrating probability on fewer symbols reduces entropy, increases redundancy, and changes the optimal code lengths assigned by Huffman coding. The symbol stream at the bottom makes the abstract concrete: high-entropy sources look random, while low-entropy sources show visible patterns.

FAQ

What is Shannon entropy?

Shannon entropy is a mathematical measure of the average information content (or uncertainty) in a message source. Defined as H = -Σ p_i·log₂(p_i), where p_i is the probability of each symbol, it quantifies the minimum number of bits needed per symbol to encode messages from that source. Claude Shannon introduced it in his 1948 paper 'A Mathematical Theory of Communication,' founding the field of information theory.

Why is entropy measured in bits?

The use of logarithm base 2 gives entropy in bits because each bit represents a binary choice. One bit resolves the uncertainty of a fair coin flip. The log₂ formulation directly tells you the minimum number of binary digits needed to encode each symbol on average. Using natural logarithms gives entropy in 'nats,' used in physics and machine learning.

What is the relationship between entropy and data compression?

Shannon's source coding theorem proves that no lossless compression algorithm can compress data below H bits per symbol on average. Huffman coding and arithmetic coding approach this theoretical limit. The difference between the uncompressed size and H·N (where N is message length) represents the maximum achievable compression.

How much entropy does English text have?

English text has approximately 4.7 bits per character when considering single-letter frequencies, but drops to about 1.0-1.5 bits per character when accounting for word structure, grammar, and context. Shannon estimated this through experiments where humans predicted the next character. The maximum for 26 letters would be log₂(26) ≈ 4.7 bits, so English uses roughly 25% of its theoretical capacity.

Sources

View source on GitHub