-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update legislative session automatically #26
Conversation
@antidipyramid I could use your help on testing this. I included a test to make sure this worked for the case that we're past the legislative session. It does work, however it's only because of the date that we're currently in, not because of the frozen date in the test. If you were to change the date on I imagined it's because the class is setting its
Ideally I would love to be able to test both the case where it should and shouldn't work. |
tests/test_jurisdiction.py
Outdated
''' | ||
fake_now = datetime.now() | ||
mocker.patch( | ||
"lametro.Lametro.today", return_value=fake_now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if you mocked datetime.datetime.now
, which is how the value of today
is determined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, @xmedr, no need to mock anything! Freezegun should inject the configured date universally. Does it work as expected if you remove the mock call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.s., if you wanted to parametrize your test to cover the different cases, you can use freeze_time
as a context manager in order to pass the date as a variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So whether I kept the mock or got rid of it, the datetime within the test gets changed which is great, but the datetime within the Lametro class is still the actual time. This is how it looks when checking things out with a breakpoint:
-> breakpoint()
(Pdb) datetime.now()
FakeDatetime(2024, 6, 23, 0, 0)
(Pdb) fake_now
FakeDatetime(2024, 6, 23, 0, 0)
(Pdb) Lametro.today
datetime.datetime(2024, 9, 11, 18, 48, 31, 188266)
And nice, that context manager bit is cool! I was actually hoping to parameterize the test once things start working
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be because legislative_sessions
is a static class variable that we're accessing without instantiating an Lametro
object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was it! It must've been difficult to recalculate the sessions as a static variable because it was already set by the time we mocked things.
So I've changed the legislative_sessions
static var into the get_legislative_sessions()
static method instead. This way it can still get called without instantiating an Lametro
object, while also encapsulating the calculation in a method as opposed to a free standing loop. So with a small adjustment to bills.py it still works fine, and can be affected by the frozen time in tests! Thank yall
This is set for review 🙌 Though a note: when running my test bill scrape, it errored out on a 404 HTTP response. This also happens with the same scrape on the main branch. Is this expected behavior? If not, I have a change that I can push up to continue after encountering 404s. |
@xmedr The 404 comes from private bills! You need to decrypt your |
@hancush Oooh I see. It looks like I might need to be added to blackbox here, would you be able to help me out with that?
|
@xmedr Just pushed a commit adding you as an admin! Pull from your branch, then try to decrypt your secrets file again. |
sweet, a bill scrape completed successfully for me so that did it! now this should be ready for review from either of you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great, @xmedr! A few things inline.
lametro/__init__.py
Outdated
@staticmethod | ||
def get_legislative_sessions(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want our subclass to have the same signature as the base Jurisdiction
class in pupa
. How about defining this as a @property
instead of a @staticmethod
?
lametro/__init__.py
Outdated
if (today.month == 6 and today.day >= 23) or today.month >= 7: | ||
allowed_years.append(this_year) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this. Can you add a short comment explaining the logic here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
''' | ||
Yield each year that we'd like to scrape today. | ||
Allow for the next fiscal year to be scraped during | ||
and after the last week of the current fiscal year. | ||
''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hell yeah, love a docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Literally keeps me sane haha
lametro/__init__.py
Outdated
for year in allowed_years: | ||
session = { | ||
"identifier": "{}".format(year), | ||
"start_date": "{}-07-01".format(year), | ||
"end_date": "{}-06-30".format(year + 1), | ||
} | ||
yield session |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
legislative_sessions
should be a list rather than a generator. Since we control its contents and know it will only grow by one each year, we can be sure that the list will be quite small, so we don't need to worry about the performance penalty of creating a list in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I'll get the change in! So generators seem to be more useful for medium to larger datasets and lists are fine when it's a small to medium set of values, does that sound right? Or is this more because we want to preserve the return value of the base class's legislative_sessions
property?
lametro/bills.py
Outdated
@@ -76,7 +76,7 @@ def session(self, action_date) : | |||
localize = pytz.timezone(self.TIMEZONE).localize | |||
fmt = '%Y-%m-%d' | |||
|
|||
for session in Lametro.legislative_sessions: | |||
for session in Lametro.legislative_sessions.fget(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is fget()
doing for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I switched to the property decorator, what's returned from this method is a property object, which seems like it can't be operated on as is. I thought it'd work similarly to a property decorator on a model where it just gives you the value, but this fget()
is the only way I've found so far to get the value when it's just a regular class.
(Pdb) Lametro.legislative_sessions
<property object at 0xffff99656de0>
(Pdb) Lametro.legislative_sessions.fget()
[{'identifier': '2014', 'start_date': '2014-07-01', 'end_date': '2015-06-30'}, ..., {'identifier': '2024', 'start_date': '2024-07-01', 'end_date': '2025-06-30'}]
(Pdb) type(Lametro.legislative_sessions)
<class 'property'>
(Pdb) type(Lametro.legislative_sessions.fget())
<class 'list'>
lametro/__init__.py
Outdated
} | ||
legislative_sessions.append(session) | ||
@property | ||
def legislative_sessions(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def legislative_sessions(): | |
def legislative_sessions(self): |
Does the property behave correctly (i.e., doesn't need a call to fget()
) with this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does if we instantiate the class beforehand! So that would mean that line 79 that's using this in bills.py
would need to change from
for session in Lametro.legislative_sessions.fget():
start_datetime = datetime.datetime.strptime(session['start_date'], fmt)
end_datetime = datetime.datetime.strptime(session['end_date'], fmt)
to something like
lametro_obj = Lametro()
for session in lametro_obj.legislative_sessions:
start_datetime = datetime.datetime.strptime(session['start_date'], fmt)
...
# OR just
for session in Lametro().legislative_sessions:
start_datetime = datetime.datetime.strptime(session['start_date'], fmt)
...
Is that change okay or were we hoping to not have to instantiate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question. What happens when we instantiate a jurisdiction? Are there side effects we wish to avoid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the base Jurisdiction class, I don't believe anything of major consequence would happen outside of maybe taking up a bit more space in memory whenever that session
method in bills.py
is called. Since every other method there would also need an instance in order to function, this is looking like this would be okay.
And aside from the test, there aren't any other spots in the scrapers that try to access legislative_sessions
so the above snippet would be the only place where a change would be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Then instantiate away, then this should be able to come in.
Overview
This branch does two things. It determines the current fiscal year programmatically as opposed to having it hardcoded, and it allows for the next fiscal year to be scraped during and after the last week of the current fiscal year.
This means that on June 10th, 2024 (before the last week), the latest fiscal year that would be scraped would end on June 30th, 2024. However once June 23rd, 2024 hits (the beginning of the last week), the latest year scraped would end on June 30, 2025 and would remain that way up until June 23 of 2025, wherein it would scrape up to '26, etc.
Testing Instructions
docker-compose run --rm scrapers pupa update lametro bills window=1 --rpm=0