Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements reservoir sampler randomly sampling stream of features #33

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions robosat/osm/sampler.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
import random


class ReservoirSampler:
'''Randomly samples k items from a stream of unknown n items.
'''

def __init__(self, capacity):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reservoir is generally a list or array of predefined size.

'''Creates an new `ReservoirSampler` instance.

Args:
capacity: the number of items to randomly sample from a stream of unknown size.
'''

assert capacity > 0

self.capacity = capacity
self.reservoir = []
self.pushed = 0

def push(self, v):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we can begin adding data.

'''Adds an item to the reservoir.

Args:
v: the item from the stream to add to the reservoir.
'''

size = len(self.reservoir)

if size < self.capacity:
self.reservoir.append(v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until we encounter size elements, elements are added directly to reservoir

else:
assert size == self.capacity
assert size <= self.pushed

p = self.capacity / self.pushed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once reservoir is full, incoming data points have a size / counter chance to replace an existing sample point


if random.random() < p:
i = random.randint(0, size - 1)
self.reservoir[i] = v

self.pushed += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First we designate a counter, which will be incremented for every data point seen.


def __len__(self):
'''Returns the number of randomly sampled items.

Returns:
The number of randomly sampled items in the reservoir.
'''

return len(self.reservoir)

def __getitem__(self, k):
'''Returns a randomly sampled item in the reservoir.

Args:
k: the index for the kth item from the reservoir to return.

Returns:
The kth item in the reservoir of randomly sampled items.
'''

return self.reservoir[k]

def __repr__(self):
'''Returns the representation for this class.

Returns:
The string representation for this class.
'''

return '<{}: {}>'.format(self.__class__.__name__, list(self))