GitHub - tuncbkose/LempelZivWelch: For CSCI 421 Project 1

My Lempel-Ziv-Welch implementation, with a dynamic code table and increasing code-length.

Usage:

make # to generate executables
./encoder FILE_1 FILE_2 ... # creates FILE_1.comp, FILE_2.comp ...
./decoder FILE_1.comp FILE_2.comp ... # creates FILE_1.comp.decomp FILE_2.comp.decomp ...

I have a sneaky suspicion that forcing a smaller alphabet (by changing STARTING_CODE_SIZE in lzw.hh) will result in a bug with bitio logic (trailing 0's at the end.) But that seems outside the scope of this, since I have to have the code size sufficiently big to accomodate arbitrary files.

I'd like to note a few things about a design choice, at least as a reminder to my future self. C++ seems to deal with characters primarily as integers, so I decided to read everything as an integer, hoping to avoid the trouble of figuring whether things are ASCII, UNICODE or something else. This choice meant that during encoding, I had to create a table that maps arbitrary sequences of numbers into codes. My initial idea was to just convert the ints back to chars and save it in a string, but that didn't work well outside of ASCII. A better idea is to map vectors of numbers into codes, but that would require me to write a hash function as well, because for some reason unordered_map is not predefined for vector keys. Since string is defined, I decided to store the numbers as strings, with appending a comma in between so the sequence 1,2 will not be confused with 12.

This inelegance does cause me some pain, but I just happen to have more homework than usual, so I don't want to spend too much time on this.

The encoder output for the Canterbury Corpus can be found here, but here is a screenshot as well:

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Makefile		Makefile
README.md		README.md
bitio.cc		bitio.cc
bitio.hh		bitio.hh
cantrbry_results.png		cantrbry_results.png
decoder.cc		decoder.cc
encoder.cc		encoder.cc
lzw.cc		lzw.cc
lzw.hh		lzw.hh
output.txt		output.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

tuncbkose/LempelZivWelch

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages