Question 1

What is Shannon entropy?

Accepted Answer

Shannon entropy is a mathematical measure of the average information content (or uncertainty) in a message source. Defined as H = -Σ p_i·log₂(p_i), where p_i is the probability of each symbol, it quantifies the minimum number of bits needed per symbol to encode messages from that source. Claude Shannon introduced it in his 1948 paper 'A Mathematical Theory of Communication,' founding the field of information theory.

Question 2

Why is entropy measured in bits?

Accepted Answer

The use of logarithm base 2 gives entropy in bits because each bit represents a binary choice. One bit resolves the uncertainty of a fair coin flip. The log₂ formulation directly tells you the minimum number of binary digits needed to encode each symbol on average. Using natural logarithms gives entropy in 'nats,' used in physics and machine learning.

Question 3

What is the relationship between entropy and data compression?

Accepted Answer

Shannon's source coding theorem proves that no lossless compression algorithm can compress data below H bits per symbol on average. Huffman coding and arithmetic coding approach this theoretical limit. The difference between the uncompressed size and H·N (where N is message length) represents the maximum achievable compression.

Question 4

How much entropy does English text have?

Accepted Answer

English text has approximately 4.7 bits per character when considering single-letter frequencies, but drops to about 1.0-1.5 bits per character when accounting for word structure, grammar, and context. Shannon estimated this through experiments where humans predicted the next character. The maximum for 26 letters would be log₂(26) ≈ 4.7 bits, so English uses roughly 25% of its theoretical capacity.

Shannon Entropy: Measuring Information Content

Formula

FAQ

Sources

Shannon Entropy: Measuring Information Content

Formula

FAQ

Sources

Other simulations: Information Theory

Channel Capacity Explorer

Data Compression Simulator

Error Correction Simulator