Shannon Entropy: Measuring Information Content
Formula
H = -\sum_{i=1}^{N} p_i \cdot \log_2(p_i)H_{\max} = \log_2(N)\text{Redundancy} = 1 - \frac{H}{H_{\max}}H \leq \bar{L} < H + 1 \quad \text{(Huffman bound)} FAQ
What is Shannon entropy?
Shannon entropy is a mathematical measure of the average information content (or uncertainty) in a message source. Defined as H = -Σ p_i·log₂(p_i), where p_i is the probability of each symbol, it quantifies the minimum number of bits needed per symbol to encode messages from that source. Claude Shannon introduced it in his 1948 paper 'A Mathematical Theory of Communication,' founding the field of information theory.
Why is entropy measured in bits?
The use of logarithm base 2 gives entropy in bits because each bit represents a binary choice. One bit resolves the uncertainty of a fair coin flip. The log₂ formulation directly tells you the minimum number of binary digits needed to encode each symbol on average. Using natural logarithms gives entropy in 'nats,' used in physics and machine learning.
What is the relationship between entropy and data compression?
Shannon's source coding theorem proves that no lossless compression algorithm can compress data below H bits per symbol on average. Huffman coding and arithmetic coding approach this theoretical limit. The difference between the uncompressed size and H·N (where N is message length) represents the maximum achievable compression.
How much entropy does English text have?
English text has approximately 4.7 bits per character when considering single-letter frequencies, but drops to about 1.0-1.5 bits per character when accounting for word structure, grammar, and context. Shannon estimated this through experiments where humans predicted the next character. The maximum for 26 letters would be log₂(26) ≈ 4.7 bits, so English uses roughly 25% of its theoretical capacity.
Sources
- [object Object]
- [object Object]
- [object Object]