-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implements reservoir sampler randomly sampling stream of features #33
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
import random | ||
|
||
|
||
class ReservoirSampler: | ||
'''Randomly samples k items from a stream of unknown n items. | ||
''' | ||
|
||
def __init__(self, capacity): | ||
'''Creates an new `ReservoirSampler` instance. | ||
|
||
Args: | ||
capacity: the number of items to randomly sample from a stream of unknown size. | ||
''' | ||
|
||
assert capacity > 0 | ||
|
||
self.capacity = capacity | ||
self.reservoir = [] | ||
self.pushed = 0 | ||
|
||
def push(self, v): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
'''Adds an item to the reservoir. | ||
|
||
Args: | ||
v: the item from the stream to add to the reservoir. | ||
''' | ||
|
||
size = len(self.reservoir) | ||
|
||
if size < self.capacity: | ||
self.reservoir.append(v) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
else: | ||
assert size == self.capacity | ||
assert size <= self.pushed | ||
|
||
p = self.capacity / self.pushed | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
if random.random() < p: | ||
i = random.randint(0, size - 1) | ||
self.reservoir[i] = v | ||
|
||
self.pushed += 1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
def __len__(self): | ||
'''Returns the number of randomly sampled items. | ||
|
||
Returns: | ||
The number of randomly sampled items in the reservoir. | ||
''' | ||
|
||
return len(self.reservoir) | ||
|
||
def __getitem__(self, k): | ||
'''Returns a randomly sampled item in the reservoir. | ||
|
||
Args: | ||
k: the index for the kth item from the reservoir to return. | ||
|
||
Returns: | ||
The kth item in the reservoir of randomly sampled items. | ||
''' | ||
|
||
return self.reservoir[k] | ||
|
||
def __repr__(self): | ||
'''Returns the representation for this class. | ||
|
||
Returns: | ||
The string representation for this class. | ||
''' | ||
|
||
return '<{}: {}>'.format(self.__class__.__name__, list(self)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.