Huffman encoding is computer algorithm used for data compression.
History:
This algorithm is devised by David A. Huffman in 1951.
Working:
This algorithm is based on creating a binary tree of nodes.
- First of all we count the frequency of the letters appear in the string.
- We build the tree as the letters coming with higher frequency are at the bottom.
- Then with higher frequency letters are come.
- This process ends with a single node “root”.
- We mark from root to leaf as the left child as “zero” and right child as “one”.
- So we can calculate Huffman code of each character.
- Each character has a unique code.
Example:
Consider the following phrase:
Welcome to virtual university
Letter | Frequency |
NL | 1 |
SP | 3 |
A | 1 |
C | 1 |
E | 3 |
I | 3 |
L | 2 |
M | 1 |
N | 1 |
O | 2 |
R | 2 |
S | 1 |
T | 3 |
U | 2 |
V | 2 |
W | 1 |
Y | 1 |
Total | 30 |
NL used for null character and SP for space.
· Draw Huffman tree
· Huffman code for each character
Letter | code |
NL | 01010 |
SP | 1010 |
A | 01011 |
C | 01110 |
E | 1000 |
I | 0110 |
L | 0100 |
M | 01111 |
N | 10010 |
O | 1111 |
R | 1110 |
S | 10101 |
T | 00 |
U | 1100 |
V | 1101 |
W | 10110 |
Y | 10111 |
· Encode the above phrase using Huffman codes calculated above
welcome to virtual university
10110 1000 0100 01110 1111 01111 1000 1010 00 1111 1010 1101 0110 1110 00 1100 01011 0100 1010 1100 10010 0110 1101 1000 1110 10101 0110 00 10111 01010