Copyright | Copyright (C) 2021 Yoo Chung |
---|---|
License | GPL-3.0-or-later |
Maintainer | dev@chungyc.org |
Safe Haskell | Safe-Inferred |
Language | GHC2021 |
Part of Ninety-Nine Haskell Problems. Some solutions are in Solutions.P50.
Documentation
huffman :: [(Char, Int)] -> [(Char, String)] Source #
Given a list of characters and their number of occurrences, construct a list of the characters and their Huffman encoding.
In the encoding, '0'
and '1'
will denote the 0 and 1 bits.
Examples
>>>
huffman [('a',45),('b',13),('c',12),('d',16),('e',9),('f',5)]
[('a',"0"),('b',"101"),('c',"100"),('d',"111"),('e',"1101"),('f',"1100")]
The encoding table computed by huffman
can be used to compress data:
>>>
length $ encodeHuffman (huffman $ countCharacters text) text
3552
Compare this to the length of a fixed-length 5-bit encoding:
>>>
length $ encodeHuffman loweralpha text
4375
or the length of the more standard ASCII encoding with 8 bits:
>>>
length $ encodeHuffman ascii text
7000
Huffman coding is unambiguous, so we can get back the original text:
>>>
let table = huffman $ countCharacters text
>>>
let encodedText = encodeHuffman table text
>>>
let decodedText = decodeHuffman table encodedText
>>>
decodedText == text
True
Supporting functions
The functions below are not part of the problem. Instead, they are used to illustrate the use of Huffman coding.
countCharacters :: String -> [(Char, Int)] Source #
Count the number of occurrences of a character in a string.
encodeHuffman :: [(Char, String)] -> String -> String Source #
Given an encoding table and a string, encode the string.
While this is intended for use in illustrating Huffman coding, it is not limited to such. In particular, it can encode with a fixed-length encoding table.
decodeHuffman :: [(Char, String)] -> String -> String Source #
Given an encoding table and a string, decode the string.
While this is intended for use in illustrating Huffman coding, it is not limited to such. In particular, it can decode with a fixed-length encoding table.
loweralpha :: [(Char, String)] Source #
Fixed-length encoding of lower case letters and a space using 5 bits.