CS 173: Homework 6
Huffman Coding
Due Monday, April 21
You will write a program that compresses and decompresses files using a Huffman compression scheme.
1. Complete the code for a binary tree (see Lab 14)
In addition, you should add a method
public T getElement()
to your BTNode class. In the lab description, there is a test you can use with the toStringPreorder
method you can use to verify that your tree is being built properly.
2. Complete the code for a heap (see Lab 15)
You should have all the methods from the lab implemented. In addition, you may find the following method useful, but not
required.
- int size():
Return the number of elements stored in the heap.
To test your heap, the following code fills it with 100 pseudorandom numbers, and if your heap works, the output should
be in order:
Heap<Integer, Integer> h = new Heap<Integer, Integer>(100);
for (int i = 0; i < 100; i++) {
int x = (int)(Math.random() * 100 + 1);
h.insert(x, x);
}
try {
while (!h.isEmpty())
System.out.print(h.removeMin().getValue().toString() + " ");
System.out.println();
}
catch (HeapEmptyException e) {
System.out.println("Error: heap unexpectedly empty");
}
3. Complete the code that reads the bits from a file (Lab 16)
You should add the following method to the BitReader class:
- public void close() throws IOException
Closes the InputStream.
4. Complete the code that writes bits to a file (Lab 17)
Here is some code: TestBitWriter.java,
that can verify whether your BitReader and BitWriter are working. Each of the
test routines takes an input file and copies it to an output file. If your code is working correctly, the copy should be
identical to the original. For the testInts method, you need to use a file whose input size is a multiple of 4.
5. Write a HuffmanCode class.
The class should have three public methods. The descriptions of the first two are given later.
- static void encode(String inFileName, String outFileName)
- static void decode(String inFileName, String outFileName)
- static void main(String[] args):
Write a main method so that if you enter java HuffmanCode encode inFile outFile,
your program will compress inFile and produce outFile. If you enter
java HuffmanCode decode inFile outFile, your program will decompress inFile
and produce outFile.
6. Add a method to the HuffmanCode class that counts the number occurances of each byte value in
a file
Create a method
static int[] byteCounts(FileInputStream inStream)
That counts the number of times each byte value occurs in the inStream, and it returns these counts in an array.
The code should create an array of 256 ints (because there are 256 different values for a byte)
and initialize each element of the array to a 0. Then it should
read in all the bytes of inStream one at a time (for example, using the read method of inStream),
and for each byte value, increment the array element at that index. At the end, it should return the array.
Hint: a byte is signed. If you use the InputStream read method, it returns an int
so you can use this value as an index to the array. If you choose to use your BitReader to read the next byte,
you must deal with negative byte values. If the byte value is less than 0, add 256 to it to get a unique positive value.
7. Add a method to the HuffmanCode class that creates a Huffman tree.
Create a method
static BinaryTree<Byte> getHuffmanTree(int[] byteCounts)
that takes the array that contains the number of each byte value and forms a Huffman tree.
For each entry of byteCounts that is greater than 0, create a new BTNode containing that
index as its element and place it into a heap using the count for that value as its key. Hint: if you
made BTNode an inner class of BinaryTree, then you must first create the BinaryTree tree
and then use tree.new BTNode(...) to create the BTNode.
Then, until the heap has one element left, pull of the two minimum Entrys, create a new BTNode with each
of the Entry's values as its children (the element for this node is irrelevant), and place it back into the heap using the sum of the Entry's keys
as its key.
When there is only 1 item in the heap, remove it and make the BTNode stored in that Entry the
root of the BinaryTree, and return the tree.
8. Add a method to the HuffmanCode class that gets the code for each byte value from the Huffman tree.
Create a method
static String[] getByteCodes(BinaryTree<Byte> tree)
that places the code for each byte value into an array and returns the array.
The method should create a array of 256 Strings, and initialize each to null. Then write a routine to
traverses the binary tree and places the code for each leaf into the array. I suggest using a recursive method:
static void byteCodeTraversal(BinaryTree<Byte>.BTNode node, String code, String[] codes)
If the node is a leaf, place code into codes at the index corresponding to the node's element.
(Hint: the byte value stored is signed. If the value is negative, add 256 to it to the correct positive index.)
If the node is not a leaf, recurse appending "0" to code for the left child and appending "1"
to code for the right child.
If you have the recursive method, the getByteCodes method can just call it on the root of the tree using
the empty string as the initial code.
9. Write the encode method of HuffmanCode
The encode method should do the following.
- Verify that the input file exists and is readable and the output file does not exist.
- Create a FileInputStream or BitReader for the input file (we only need to read bytes).
- Call the byteCounts method to get the count for each byte in the file and then close the
FileInputStream.
- Call the getHuffmanTree method to get a Huffman tree for these byte values and byte counts.
- Call the getByteCodes method to get the bit codes for each byte value.
- Create a FileInputStream for the input file and open a BitWriter for the output file.
- Count the total number of bytes in the input file (you can get this from the array of byte counts and
write this value to the output file using the writeInt method of the BitWriter.
- Count the total number of entries in the byte count array that are positive. Write this value to the output file
using the writeByte method.
- For each element of the byte count array that is positive, write the index to the output file using the
writeByte method of BitWriter and then write its count using the writeInt method
of BitWriter.
- Read in each byte of the input file using the read method of FileInputStream (or your
readByte method of BitReader), and look up that byte's code from the byte code array, and then
send that code to the BitWriter's writeBits method.
- Close both the input stream and output BitWriter.
10. Write the decode method of HuffmanCode.
The decode method should do the following.
- Verify that the input file exists and is readable and the output file does not exist.
- Create a BitReader to read the input file and a FileOutputStream or BitWriter for
the output file (we only need to write bytes).
- Call the nextInt method of BitReader to get the number of bytes in the original file.
- Create an array of 256 ints, each element is initialized to 0. This array will count the
number of occurances of each byte in the original file.
- Call the nextByte method of BitReader to get the number of positive entries for the byte count
array.
- For each positive entry (use the number from the previous step to bound the loop), read in the byte value using
nextByte and read in the byte count using nextInt and then set the byte count array element at the
index of the byte to the value of the int. (Hint: the bytes are signed so if you have a byte value less than 0,
add 256 to it. Do not change the int values.)
- Call the getHuffmanTree method to get the Huffman tree for these byte counts.
- Loop as long as there is more data in the BitReader and you have not written out the number of
bytes for the output file, do the following:
- Start at the root of the binary tree.
- Until you are at a leaf, read in a bit from the BitReader. If the bit is a 1, go to the right child,
otherwise go to the left child.
- When you are at a leaf, get the byte value stored in the leaf and write it to the output file using the write
method of FileOutputStream or your writeByte method of BitWriter, and then go back to
step 1.
- Close the output FileOutputStream and your BitReader.
Hints
Use the debugger to help you. Create a small test file to help you test the code. Test after each step. For example,
make certain the byteCounts method has the correct counts for each byte. Use the tree.toString method to see if the
Huffman tree looks reasonable. Print the codes of the byte codes array to see if they match the Huffman tree.
Style reminders:
- All classes should have comments directly above the class definition.
- All methods should have comments above the definition.
- All class and instance variables should have comments next to the variable.
- All classes should have comments above the class.
- All loops and complicated code should have comments.
- All comments should be in JavaDoc style.
Submit your assignment
Email all your class files to me by the assignment deadline.