What do you do with a stupidly large JSON file?

Most easy-to-use JSON utilities assume your payload can fit into memory. If it's too large, you're gonna have to fall back on some kind of partial, streaming parser/tokenizer. This tool uses serde_json to stream the JSON primitives into a flat key-value store. Once it has been indexed, you can view parts of the data deep within the JSON tree.

Compression

The JSON is "indexed" into a flat key-value store. I'm currently using RocksDB, which has support for a variety of compression types. I tested this on a medium-sized JSON file, citylots, to see how those play out. Seems like Zstd is pretty good.

192M	citylots.json
307M	citylots-none/
160M	citylots-bz2/
135M	citylots-lz4/
132M	citylots-snappy/
 94M	citylots-zlib/
 92M	citylots-zstd/

Example usage

Index your giant file (this will take a while, you can monitor progress via du -s index-data/):

hjq index --data=my-giant-file.json --data-dir=index-data/

Once it's indexed you can explore it.

Keys

View all the top-level keys:

hjq keys --data-dir=index-data/

You can also print out keys deeper into the JSON:

hjq keys --data-dir=index-data/ --prefix=some/path/into/the/tree/

Full data

View the full data at some location inside your JSON:

hjq view --data-dir=index-data/ --prefix=some/path/into/the/tree/

Note that this will scale with the size of the JSON being printed, so if you try to print out the full data at the root of your giant JSON tree it will take a long, long time.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
data		data
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
sizebench.sh		sizebench.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What do you do with a stupidly large JSON file?

Compression

Example usage

Keys

Full data

About

Releases

Packages

Languages

License

ryanpbrewster/hjq-rs

Folders and files

Latest commit

History

Repository files navigation

What do you do with a stupidly large JSON file?

Compression

Example usage

Keys

Full data

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages