This v1.0
release reflects an application that meets our MVP requirements of comparing the outputs of two different code versions of Transmogrifier.
Some notable functionality in this v1.0
release:
- option to utilize a local S3 server via a Minio instance, which reduces data fetches from AWS S3
- CLI command that crawls TIMDEX S3 bucket, producing a CSV of input files for Transmogrifier based on criteria
- performance and memory management that supports very large runs (e.g. 1-2k input files @ 100+ gb of transformed files)
- webapp for viewing results from jobs, allowing drilling into particular sources or fields that have differences
What's Changed
- TIMX 339 - scaffold project by @ghukill in #1
- TIMX 338 - Add init-job functionality by @ghukill in #2
- TIMX 344 - update job and run structures by @ghukill in #4
- TIMX-340-build-ab-images by @ehanson8 in #9
- TIMX 345 - Add CLI command init-job by @ghukill in #13
- Timx 341 run ab transforms by @jonavellecuerdo in #6
- Add 'collate_ab_transforms' command by @jonavellecuerdo in #22
- TIMX 348 - Scaffold Flask app by @ghukill in #17
- TIMX 350, 351, 358, 346 - full app functionality via CLI by @ghukill in #26
- TIMX 367 - template JSON as direct javascript objects by @ghukill in #29
- TIMX 372- support runs with zero diffs by @ghukill in #31
- TIMX 354 - remove job name from docker image by @ghukill in #32
- Timx 369 core status checks by @jonavellecuerdo in #35
- Limit parallel containers of Transmogrifier by @ghukill in #37
- Add utility function to parse details from S3 URIs and filenames by @jonavellecuerdo in #42
- TIMX 371 - dedupe records by @ghukill in #43
- TIMX 379 - CSV for input files support and helpers by @ghukill in #45
- Use DeepDiff library for a/b record diffing by @ghukill in #47
- Add option to download input files using a local MinIO server by @jonavellecuerdo in #49
- TIMX 383 - pipeline tweaks for large runs by @ghukill in #48
- TIMX 385 - browsable, filterable Records table by @ghukill in #53
- TIMX 387 - Fix bug in Transmogrifier output filename by @ghukill in #62
- TIMX 388, 389 - handle missing A or B records and construction of final records dataset by @ghukill in #65
New Contributors
- @ghukill made their first contribution in #1
- @ehanson8 made their first contribution in #9
- @jonavellecuerdo made their first contribution in #6
Full Changelog: https://github.com/MITLibraries/transmogrifier-ab-diff/commits/v1.0