VaryingSizeDataloader

Practical course "Data Management and Data Analysis" at Karlsruhe Institute of Technology during the Winter Semester 2020/21.

Supervisor: James Kahn

The goal was to implement two samplers that are able to deal with data of varying sizes, arranging them in batches so that the overhead introduced by padding the data to a uniform length is minimized.

Two different data loaders were implemented: SimilarSizeSampler and BucketingSampler. Both are implemented for drawing with and without replacement as well as with and without dropping the last batch (if its size is smaller than the batch size)

SimilarSizeSampler

The SimilarSizeSampler sorts the data samples by length first and then forms the batches, so that similar sized samples are always in the same batch

BucketingSampler

The BucketingSampler draws a mega batch of data samplers, then sorts this mega batch by length and yields those indices.

TestDataset

I provided a simple TestDataset that creates tensors with random numbers that can be used to test the samplers.

Unittest

Prvoided in this repository are also a couple of short unit tests to ensure the samplers are working correctly.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
BucketingSampler.py		BucketingSampler.py
README.md		README.md
SimilarSizeSampler.py		SimilarSizeSampler.py
TestDatasetGenerator.py		TestDatasetGenerator.py
test_var_size_dataloader.py		test_var_size_dataloader.py
varying_size_dataloader.py		varying_size_dataloader.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VaryingSizeDataloader

SimilarSizeSampler

BucketingSampler

TestDataset

Unittest

About

Releases

Packages

Languages

johannathiemich/VaryingSizeDataloader

Folders and files

Latest commit

History

Repository files navigation

VaryingSizeDataloader

SimilarSizeSampler

BucketingSampler

TestDataset

Unittest

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages