Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACH file stream #1119

Open
droidkfx opened this issue Oct 27, 2022 · 7 comments
Open

ACH file stream #1119

droidkfx opened this issue Oct 27, 2022 · 7 comments

Comments

@droidkfx
Copy link

ACH Version

1.18

What were you trying to do?

I am trying to adapt the reader implementation to allow for a streamed approach that could keep from having a whole file in memory at once. The use case is that in large-volume applications with a service-based architecture reading a whole file into memory at once may not be possible. This would make some validations impossible of course and that is a trade-off consumers would need to make.

What did you expect to see?

// new signature proposed
func (r *Reader) Stream(ctx context.Context) (chan interface{}, <-chan error) { ... }
// Usage sample:

achReader := moov_ach.NewReader(reader)

opts := &moov_ach.ValidateOpts{
	// ...
}
achReader.SetValidation(opts)

fileRecords, errs := Stream(ctx)

var moreRecord, moreErr bool = true, true
var record interface{}
var err error
select {
case record, moreRecord = <-fileRecords:
	if header, ok := record.(moov_ach.FileHeader); ok {
		// handle header
	} else if batchHeader, ok := record.(moov_ach.BatchHeader); ok {
		// handle batch header
	} else if entry, ok := record.(moov_ach.EntryDetail); ok {
		// handle entry record
	} else if record, ok := record.(moov_ach.RecordType); ok {
		// handle other records
	}
case err, moreErr = <-errs:
	// process err
case <-ctx.Done():
	return ctx.Err()
}
@adamdecaf
Copy link
Member

How large of files are we talking about? With Nacha's limit of 10k lines in each file that should fit into memory pretty easily. We've heard of people parsing files and emitting an event for each EntryDetail record (with the BatchHeader on the event), which can produce a few thousand events.

@adamdecaf
Copy link
Member

adamdecaf commented Oct 27, 2022

I'm open to adding a Stream method on Reader, but am curious what sort of scale it needs to handle. Maybe it should be NextEntry() that returns a BatchHeader, EntryDetail and metadata of where in the file it came from.

@droidkfx
Copy link
Author

How large of files are we talking about?

I don't have a precise file size in mind. I am most motivated to minimize the memory requirement for processing and safe guard against OOM crashes.

With Nacha's limit of 10k lines in each file that should fit into memory pretty easily

Where by chance does this information come from? I have not found a good source for this. The moov library itself states:

// github.com/moov-io/[email protected]/reader.go:32
var (
	// maxLines is the maximum number of lines a file can have. It is limited by the
	// EntryAddendaCount field which has 8 digits, and the BatchCount field which has
	// 6 digits in the File Control Record. So we can have at most the 2 file records,
	// 2 records for each of 10^6 batches, 10^8 entry and addenda records, and 8 lines
	// of 9's to round up to the nearest multiple of 10.
	maxLines = 2 + 2000000 + 100000000 + 8
)

at 102_000_010 lines of 94 UTF-8 single-byte chars, the file size would be around 76Gb if I didn't mess up my math. I don't see any applications with anywhere near that many. It does make the memory usage unbounded if that assumption is right though. The application I have in mind cannot utilize an arbitrary memory footprint.

Maybe it should be NextEntry()

I think this is fine too, I would want it to be more general though than just entries since I do need the file header and batch headers. As another alternative NextRecord() which would just get the next valid record and would have to be an interface{}, NextEntry(), and NextBatch() could read till the end of a batch and provide that whole set. That kind of builds up to the existing Read() method that is the whole file. Any consumer could choose the level of granularity they want and the limit to what can be read in one go from the input stream.

@adamdecaf
Copy link
Member

I believe the 10k line limit comes from some ODFIs and vendors we were talking with when designing this library. I'm sure that varies and I wasn't able to find a limit in our Nacha rules handbook.

You bring up a good point in that 76GB is probably too much to consume inside of a reader. We can make that configurable and lower the default.

@adamdecaf
Copy link
Member

Next Record/Batch/Entry make sense, but I'd like to look at what those entail a bit more before adding them.

@adamdecaf
Copy link
Member

We've had an Iterator released for a while. Does that help solve your issue? There have been many performance (memory usage) improvements since 2022/2023 as well.

@droidkfx
Copy link
Author

droidkfx commented Mar 4, 2024

Off the cuff, it looks like it would. I want to see the implementation details, but the documentation suggests it does what I hope.

The only issue I see is that the docs state: "IAT entries are not currently supported." on the NextEntry() method. Our application would need to process IAT entries.

In practice, we just scaled the memory sizing available to the application that utilizes the file read. I think doing that worked really well. We see spikes near 500 MB during peak file processing. So that is manageable by just scaling.

We could save that by just using the iterator. Is there a workaround for IATs? What is the behavior? Will the iterator skip or error?

@adamdecaf adamdecaf modified the milestones: Unplanned, v2 Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants