ACH file stream #1119

droidkfx · 2022-10-27T18:43:42Z

ACH Version

1.18

What were you trying to do?

I am trying to adapt the reader implementation to allow for a streamed approach that could keep from having a whole file in memory at once. The use case is that in large-volume applications with a service-based architecture reading a whole file into memory at once may not be possible. This would make some validations impossible of course and that is a trade-off consumers would need to make.

What did you expect to see?

// new signature proposed
func (r *Reader) Stream(ctx context.Context) (chan interface{}, <-chan error) { ... }
// Usage sample:

achReader := moov_ach.NewReader(reader)

opts := &moov_ach.ValidateOpts{
	// ...
}
achReader.SetValidation(opts)

fileRecords, errs := Stream(ctx)

var moreRecord, moreErr bool = true, true
var record interface{}
var err error
select {
case record, moreRecord = <-fileRecords:
	if header, ok := record.(moov_ach.FileHeader); ok {
		// handle header
	} else if batchHeader, ok := record.(moov_ach.BatchHeader); ok {
		// handle batch header
	} else if entry, ok := record.(moov_ach.EntryDetail); ok {
		// handle entry record
	} else if record, ok := record.(moov_ach.RecordType); ok {
		// handle other records
	}
case err, moreErr = <-errs:
	// process err
case <-ctx.Done():
	return ctx.Err()
}

adamdecaf · 2022-10-27T19:07:22Z

How large of files are we talking about? With Nacha's limit of 10k lines in each file that should fit into memory pretty easily. We've heard of people parsing files and emitting an event for each EntryDetail record (with the BatchHeader on the event), which can produce a few thousand events.

adamdecaf · 2022-10-27T19:08:06Z

I'm open to adding a Stream method on Reader, but am curious what sort of scale it needs to handle. Maybe it should be NextEntry() that returns a BatchHeader, EntryDetail and metadata of where in the file it came from.

droidkfx · 2022-10-27T19:23:55Z

How large of files are we talking about?

I don't have a precise file size in mind. I am most motivated to minimize the memory requirement for processing and safe guard against OOM crashes.

With Nacha's limit of 10k lines in each file that should fit into memory pretty easily

Where by chance does this information come from? I have not found a good source for this. The moov library itself states:

// github.com/moov-io/[email protected]/reader.go:32
var (
	// maxLines is the maximum number of lines a file can have. It is limited by the
	// EntryAddendaCount field which has 8 digits, and the BatchCount field which has
	// 6 digits in the File Control Record. So we can have at most the 2 file records,
	// 2 records for each of 10^6 batches, 10^8 entry and addenda records, and 8 lines
	// of 9's to round up to the nearest multiple of 10.
	maxLines = 2 + 2000000 + 100000000 + 8
)

at 102_000_010 lines of 94 UTF-8 single-byte chars, the file size would be around 76Gb if I didn't mess up my math. I don't see any applications with anywhere near that many. It does make the memory usage unbounded if that assumption is right though. The application I have in mind cannot utilize an arbitrary memory footprint.

Maybe it should be NextEntry()

I think this is fine too, I would want it to be more general though than just entries since I do need the file header and batch headers. As another alternative NextRecord() which would just get the next valid record and would have to be an interface{}, NextEntry(), and NextBatch() could read till the end of a batch and provide that whole set. That kind of builds up to the existing Read() method that is the whole file. Any consumer could choose the level of granularity they want and the limit to what can be read in one go from the input stream.

adamdecaf · 2022-10-27T19:42:57Z

I believe the 10k line limit comes from some ODFIs and vendors we were talking with when designing this library. I'm sure that varies and I wasn't able to find a limit in our Nacha rules handbook.

You bring up a good point in that 76GB is probably too much to consume inside of a reader. We can make that configurable and lower the default.

adamdecaf · 2022-10-27T19:44:27Z

Next Record/Batch/Entry make sense, but I'd like to look at what those entail a bit more before adding them.

adamdecaf · 2024-03-04T18:20:58Z

We've had an Iterator released for a while. Does that help solve your issue? There have been many performance (memory usage) improvements since 2022/2023 as well.

droidkfx · 2024-03-04T19:52:23Z

Off the cuff, it looks like it would. I want to see the implementation details, but the documentation suggests it does what I hope.

The only issue I see is that the docs state: "IAT entries are not currently supported." on the NextEntry() method. Our application would need to process IAT entries.

In practice, we just scaled the memory sizing available to the application that utilizes the file read. I think doing that worked really well. We see spikes near 500 MB during peak file processing. So that is manageable by just scaling.

We could save that by just using the iterator. Is there a workaround for IATs? What is the behavior? Will the iterator skip or error?

adamdecaf added the enhancement label Oct 27, 2022

adamdecaf mentioned this issue Oct 27, 2022

reader: lower hard limit of lines to consume and make it configurable #1120

Closed

adamdecaf added the question label Nov 1, 2022

adamdecaf added this to the Unplanned milestone Jan 12, 2023

adamdecaf modified the milestones: Unplanned, v2 Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ACH file stream #1119

ACH file stream #1119

droidkfx commented Oct 27, 2022

adamdecaf commented Oct 27, 2022

adamdecaf commented Oct 27, 2022 •

edited

Loading

droidkfx commented Oct 27, 2022

adamdecaf commented Oct 27, 2022

adamdecaf commented Oct 27, 2022

adamdecaf commented Mar 4, 2024

droidkfx commented Mar 4, 2024

ACH file stream #1119

ACH file stream #1119

Comments

droidkfx commented Oct 27, 2022

ACH Version

What were you trying to do?

What did you expect to see?

adamdecaf commented Oct 27, 2022

adamdecaf commented Oct 27, 2022 • edited Loading

droidkfx commented Oct 27, 2022

adamdecaf commented Oct 27, 2022

adamdecaf commented Oct 27, 2022

adamdecaf commented Mar 4, 2024

droidkfx commented Mar 4, 2024

adamdecaf commented Oct 27, 2022 •

edited

Loading