LZB: LZ77 data compression and other utilities in pure Bash language

LZB is a data compression/decompression utility written in Bash language, using no external programs. It also includes additional utilities:

CRC-32
hexadecimal encoding/decoding
base64 encoding/decoding
"head" and "cut" commands

Key points:

Small: 11KB (including comments)
Efficient: e.g. 1111111111 is reduced to 6 bytes. 20 x "1" to 7 bytes, 1280 to 8 bytes, etc.
Platform-independent: it works in any platform supported by Bash 4 or later
Complete: it supports unlimited file size for compression and decompression
Vintage: compress and decompress at 1981 speeds!

Compatibility

The requirement of Bash 4 or later is because of the usage of associative arrays. I.e. this should work in most Linux and Unix distributions released after 2009 (I've tested it with the default 'bash' command in Ubuntu 18.04).

Overview

How to install, example:

sudo cp lzb /usr/local/bin # Installation
sudo rm /usr/local/bin/lzb # Uninstallation

Syntax:

        lzb [-d|-crc32|-hex|-dhex|-b64|-db64|-h]

Help:

        lzb -h

Compression example:

        lzb <input >input.lzb

Decompression example:

        lzb -d <input.lzb >output

Compression and decompression cycle example:

        lzb <input | lzb -d >output
	diff input output

CRC-32:

        lzb -crc32 <input

Hexadecimal encoding:

        lzb -hex <input >output

Hexadecimal decoding:

        lzb -dhex <input >output

Base64 encoding:

        lzb -b64 <input >output

Base64 decoding:

        lzb -db64 <input >output

'head' command (get first N bytes):

        lzb -head N <input >output

'cut' command (skip first N bytes, and take next M bytes):

        lzb -cut N M <input >output

Internals

LZB is a rewrite in Bash of the libsrt's library 'enc' example, not being compatible with its format, because of simplicity (the Bash version is hundred thousands times slower than the C version, and probably million times slower as the buffer size is increased). E.g. libsrt library 'enc' example has a infinite-size dictionary ("-ezh" flag) which is not supported by this program, nor the LUT tuning for speed, etc.

libsrt repository

libsrt LZ77 implementation

Editing the 'lzb' file you can change the LZ_BUF_SIZE variable value, for changing the data compression buffer size. The buffer size is related to the compression window: that means that the default 16384-byte buffer allows to locate patterns in that context. Increasing that value makes the compressor to reach more distant patterns, at the cost of execution speed. You can also reduce that value for speed-up, e.g. reducing it to 4096 (or to 3333, it is not restricted to powers of two). I'll probably add a parameter for selecting compression ratio, too.

Bash is not good handling binary data, but it is able to read and write it, so for convenience, the data is converted on the fly from raw to hexadecimal, doing all the compression/decompression in hexadecimal, having a post-process step from hexadecimal to raw, for writing the output. Both the pre-process and the post-process are done using pipes.

License

Contact

email: faragon.github (GMail account, add @gmail.com)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
lzb		lzb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LZB: LZ77 data compression and other utilities in pure Bash language

Compatibility

Overview

Internals

License

Contact

About

Releases

Packages

Languages

License

faragon/lzb

Folders and files

Latest commit

History

Repository files navigation

LZB: LZ77 data compression and other utilities in pure Bash language

Compatibility

Overview

Internals

License

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages