Question 1

How does Huffman coding work?

Accepted Answer

Huffman coding assigns shorter binary codes to more frequent symbols and longer codes to rare symbols. The algorithm builds a binary tree bottom-up: repeatedly merge the two lowest-frequency nodes until one tree remains. Symbols at shallow leaves get short codes; deep leaves get long codes. The result is a prefix-free code — no codeword is a prefix of another — that minimizes the expected code length for the given frequency distribution.

Question 2

What is the Shannon source coding theorem?

Accepted Answer

Shannon's source coding theorem (noiseless coding theorem) states that a source with entropy H bits per symbol cannot be losslessly compressed below H bits per symbol on average. Conversely, it can be compressed to a rate arbitrarily close to H using sufficiently long block codes. This establishes entropy as the fundamental limit of lossless compression.

Question 3

Why can't you compress random data?

Accepted Answer

Truly random data has maximum entropy — every symbol is equally likely and there are no patterns. Since entropy equals log₂(N) for N equally likely symbols, the compressed size equals the original size. By pigeonhole principle, any compression scheme that shortens some inputs must lengthen others. For uniform random data, no input is more likely than another, so no systematic compression is possible.

Question 4

What is the difference between lossless and lossy compression?

Accepted Answer

Lossless compression (Huffman, LZ77, DEFLATE) preserves every bit of the original data — the decompressed output is identical to the input. Lossy compression (JPEG, MP3, H.264) discards information deemed perceptually unimportant, achieving much higher compression ratios at the cost of fidelity. Shannon's source coding theorem governs lossless compression; rate-distortion theory governs lossy compression.

Data Compression: From Huffman Codes to the Shannon Limit

Formula

FAQ

Sources

Data Compression: From Huffman Codes to the Shannon Limit

Formula

FAQ

Sources

Other simulations: Information Theory

Channel Capacity Explorer

Error Correction Simulator

Shannon Entropy Calculator