Skip to content

Releases: MITLibraries/transmogrifier-ab-diff

v1.0 - Initial Release

13 Nov 19:22
d150081
Compare
Choose a tag to compare

This v1.0 release reflects an application that meets our MVP requirements of comparing the outputs of two different code versions of Transmogrifier.

Some notable functionality in this v1.0 release:

  • option to utilize a local S3 server via a Minio instance, which reduces data fetches from AWS S3
  • CLI command that crawls TIMDEX S3 bucket, producing a CSV of input files for Transmogrifier based on criteria
  • performance and memory management that supports very large runs (e.g. 1-2k input files @ 100+ gb of transformed files)
  • webapp for viewing results from jobs, allowing drilling into particular sources or fields that have differences

What's Changed

  • TIMX 339 - scaffold project by @ghukill in #1
  • TIMX 338 - Add init-job functionality by @ghukill in #2
  • TIMX 344 - update job and run structures by @ghukill in #4
  • TIMX-340-build-ab-images by @ehanson8 in #9
  • TIMX 345 - Add CLI command init-job by @ghukill in #13
  • Timx 341 run ab transforms by @jonavellecuerdo in #6
  • Add 'collate_ab_transforms' command by @jonavellecuerdo in #22
  • TIMX 348 - Scaffold Flask app by @ghukill in #17
  • TIMX 350, 351, 358, 346 - full app functionality via CLI by @ghukill in #26
  • TIMX 367 - template JSON as direct javascript objects by @ghukill in #29
  • TIMX 372- support runs with zero diffs by @ghukill in #31
  • TIMX 354 - remove job name from docker image by @ghukill in #32
  • Timx 369 core status checks by @jonavellecuerdo in #35
  • Limit parallel containers of Transmogrifier by @ghukill in #37
  • Add utility function to parse details from S3 URIs and filenames by @jonavellecuerdo in #42
  • TIMX 371 - dedupe records by @ghukill in #43
  • TIMX 379 - CSV for input files support and helpers by @ghukill in #45
  • Use DeepDiff library for a/b record diffing by @ghukill in #47
  • Add option to download input files using a local MinIO server by @jonavellecuerdo in #49
  • TIMX 383 - pipeline tweaks for large runs by @ghukill in #48
  • TIMX 385 - browsable, filterable Records table by @ghukill in #53
  • TIMX 387 - Fix bug in Transmogrifier output filename by @ghukill in #62
  • TIMX 388, 389 - handle missing A or B records and construction of final records dataset by @ghukill in #65

New Contributors

Full Changelog: https://github.com/MITLibraries/transmogrifier-ab-diff/commits/v1.0