Skip to content

Tengal-Teemo/lm_dataformat

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LM_Dataformat Build Status Coverage Status

Utilities for storing data for LM training.

Basic Usage

To write:

ar = Archive('output_dir')

for x in something():
  # do other stuff
  ar.add_data(somedocument, meta={
    'example': stuff,
    'someothermetadata': [othermetadata, otherrandomstuff],
    'otherotherstuff': True
  })

# remember to commit at the end!
ar.commit()

To read:

rdr = Reader('input_dir_or_file')

for doc in rdr.stream_data():
  # do something with the document

About

Fork of lm_dataformat

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%