You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Salento expects as an input a sequence of packages.
The problem is that the file format that contains the sequence of packages is a JSON objects, which means that all packages must fit into memory to read them. We currently have some use cases where the datasets do not fit memory, so this architecture is a bottleneck for scalability.
We need to:
change the file format to something amenable to streaming packages
change the internals (say, train.py) such that data is loaded lazily and use as much as possible generators (versus creating lists upfront)
The text was updated successfully, but these errors were encountered:
Salento expects as an input a sequence of packages.
The problem is that the file format that contains the sequence of packages is a JSON objects, which means that all packages must fit into memory to read them. We currently have some use cases where the datasets do not fit memory, so this architecture is a bottleneck for scalability.
We need to:
train.py
) such that data is loaded lazily and use as much as possible generators (versus creating lists upfront)The text was updated successfully, but these errors were encountered: