Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update legislative session automatically #26

Merged
merged 13 commits into from
Sep 24, 2024

Conversation

xmedr
Copy link
Collaborator

@xmedr xmedr commented Sep 10, 2024

Overview

This branch does two things. It determines the current fiscal year programmatically as opposed to having it hardcoded, and it allows for the next fiscal year to be scraped during and after the last week of the current fiscal year.

This means that on June 10th, 2024 (before the last week), the latest fiscal year that would be scraped would end on June 30th, 2024. However once June 23rd, 2024 hits (the beginning of the last week), the latest year scraped would end on June 30, 2025 and would remain that way up until June 23 of 2025, wherein it would scrape up to '26, etc.

Testing Instructions

  • Run a scrape
    • docker-compose run --rm scrapers pupa update lametro bills window=1 --rpm=0
  • Confirm that the scrape completes as expected
  • Confirm that automated tests pass and that they are appropriately testing behavior

@xmedr
Copy link
Collaborator Author

xmedr commented Sep 10, 2024

@antidipyramid I could use your help on testing this. I included a test to make sure this worked for the case that we're past the legislative session. It does work, however it's only because of the date that we're currently in, not because of the frozen date in the test. If you were to change the date on freeze_time to be the 10th of June, it would still pass. I'm stumped on how to get past this.

I imagined it's because the class is setting its allowed_years before i've got a chance to mess with the date, so mocking today doesn't have an effect? I've tried a bunch of things, like reloading the whole lametro package after freezing the time and even deleting and re-importing the package from within the test, but no dice.

import imp
imp.reload(lametro)

# and

import sys
sys.modules.pop('lametro')
from lametro import Lametro

Ideally I would love to be able to test both the case where it should and shouldn't work.

'''
fake_now = datetime.now()
mocker.patch(
"lametro.Lametro.today", return_value=fake_now
Copy link
Collaborator

@hancush hancush Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you mocked datetime.datetime.now, which is how the value of today is determined?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, @xmedr, no need to mock anything! Freezegun should inject the configured date universally. Does it work as expected if you remove the mock call?

Copy link
Collaborator

@hancush hancush Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.s., if you wanted to parametrize your test to cover the different cases, you can use freeze_time as a context manager in order to pass the date as a variable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So whether I kept the mock or got rid of it, the datetime within the test gets changed which is great, but the datetime within the Lametro class is still the actual time. This is how it looks when checking things out with a breakpoint:

-> breakpoint()
(Pdb) datetime.now()
FakeDatetime(2024, 6, 23, 0, 0)
(Pdb) fake_now
FakeDatetime(2024, 6, 23, 0, 0)
(Pdb) Lametro.today
datetime.datetime(2024, 9, 11, 18, 48, 31, 188266)

And nice, that context manager bit is cool! I was actually hoping to parameterize the test once things start working

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be because legislative_sessions is a static class variable that we're accessing without instantiating an Lametro object?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was it! It must've been difficult to recalculate the sessions as a static variable because it was already set by the time we mocked things.

So I've changed the legislative_sessions static var into the get_legislative_sessions() static method instead. This way it can still get called without instantiating an Lametro object, while also encapsulating the calculation in a method as opposed to a free standing loop. So with a small adjustment to bills.py it still works fine, and can be affected by the frozen time in tests! Thank yall ☺️

@xmedr xmedr marked this pull request as ready for review September 11, 2024 20:53
@xmedr
Copy link
Collaborator Author

xmedr commented Sep 11, 2024

This is set for review 🙌

Though a note: when running my test bill scrape, it errored out on a 404 HTTP response. This also happens with the same scrape on the main branch. Is this expected behavior? If not, I have a change that I can push up to continue after encountering 404s.

@hancush
Copy link
Collaborator

hancush commented Sep 11, 2024

@xmedr The 404 comes from private bills! You need to decrypt your lametro/secrets.py file so the scrape will pass our API key with the requests.

@xmedr
Copy link
Collaborator Author

xmedr commented Sep 12, 2024

@hancush Oooh I see. It looks like I might need to be added to blackbox here, would you be able to help me out with that?

>>> blackbox_postdeploy
========== Importing keychain: START
gpg: Total number processed: 3
gpg:              unchanged: 3
========== Importing keychain: DONE
========== Decrypting new/changed files: START
gpg: public key decryption failed: No secret key
gpg: decryption failed: No secret key

@hancush
Copy link
Collaborator

hancush commented Sep 12, 2024

@xmedr Just pushed a commit adding you as an admin! Pull from your branch, then try to decrypt your secrets file again.

@xmedr
Copy link
Collaborator Author

xmedr commented Sep 12, 2024

sweet, a bill scrape completed successfully for me so that did it! now this should be ready for review from either of you

Copy link
Collaborator

@hancush hancush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great, @xmedr! A few things inline.

Comment on lines 20 to 21
@staticmethod
def get_legislative_sessions():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want our subclass to have the same signature as the base Jurisdiction class in pupa. How about defining this as a @property instead of a @staticmethod?

Comment on lines 26 to 27
if (today.month == 6 and today.day >= 23) or today.month >= 7:
allowed_years.append(this_year)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this. Can you add a short comment explaining the logic here?

tests/test_jurisdiction.py Show resolved Hide resolved
Copy link
Collaborator

@hancush hancush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes on your @Property revision inline! Nearly there, @xmedr.

Comment on lines +22 to +26
'''
Yield each year that we'd like to scrape today.
Allow for the next fiscal year to be scraped during
and after the last week of the current fiscal year.
'''
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hell yeah, love a docstring.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Literally keeps me sane haha

Comment on lines 35 to 41
for year in allowed_years:
session = {
"identifier": "{}".format(year),
"start_date": "{}-07-01".format(year),
"end_date": "{}-06-30".format(year + 1),
}
yield session
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

legislative_sessions should be a list rather than a generator. Since we control its contents and know it will only grow by one each year, we can be sure that the list will be quite small, so we don't need to worry about the performance penalty of creating a list in memory.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I'll get the change in! So generators seem to be more useful for medium to larger datasets and lists are fine when it's a small to medium set of values, does that sound right? Or is this more because we want to preserve the return value of the base class's legislative_sessions property?

lametro/bills.py Outdated
@@ -76,7 +76,7 @@ def session(self, action_date) :
localize = pytz.timezone(self.TIMEZONE).localize
fmt = '%Y-%m-%d'

for session in Lametro.legislative_sessions:
for session in Lametro.legislative_sessions.fget():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is fget() doing for you?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I switched to the property decorator, what's returned from this method is a property object, which seems like it can't be operated on as is. I thought it'd work similarly to a property decorator on a model where it just gives you the value, but this fget() is the only way I've found so far to get the value when it's just a regular class.

(Pdb) Lametro.legislative_sessions
<property object at 0xffff99656de0>
(Pdb) Lametro.legislative_sessions.fget()
[{'identifier': '2014', 'start_date': '2014-07-01', 'end_date': '2015-06-30'},  ..., {'identifier': '2024', 'start_date': '2024-07-01', 'end_date': '2025-06-30'}]

(Pdb) type(Lametro.legislative_sessions)
<class 'property'>
(Pdb) type(Lametro.legislative_sessions.fget())
<class 'list'>

tests/test_jurisdiction.py Show resolved Hide resolved
}
legislative_sessions.append(session)
@property
def legislative_sessions():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def legislative_sessions():
def legislative_sessions(self):

Does the property behave correctly (i.e., doesn't need a call to fget()) with this change?

Copy link
Collaborator Author

@xmedr xmedr Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does if we instantiate the class beforehand! So that would mean that line 79 that's using this in bills.py would need to change from

for session in Lametro.legislative_sessions.fget():
            start_datetime = datetime.datetime.strptime(session['start_date'], fmt)
            end_datetime = datetime.datetime.strptime(session['end_date'], fmt)

to something like

lametro_obj = Lametro()

for session in lametro_obj.legislative_sessions:
            start_datetime = datetime.datetime.strptime(session['start_date'], fmt)
            ...
# OR just

for session in Lametro().legislative_sessions:
           start_datetime = datetime.datetime.strptime(session['start_date'], fmt)
            ...

Is that change okay or were we hoping to not have to instantiate?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. What happens when we instantiate a jurisdiction? Are there side effects we wish to avoid?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the base Jurisdiction class, I don't believe anything of major consequence would happen outside of maybe taking up a bit more space in memory whenever that session method in bills.py is called. Since every other method there would also need an instance in order to function, this is looking like this would be okay.

And aside from the test, there aren't any other spots in the scrapers that try to access legislative_sessions so the above snippet would be the only place where a change would be needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Then instantiate away, then this should be able to come in.

@xmedr xmedr merged commit 08615e4 into main Sep 24, 2024
1 check passed
@xmedr xmedr deleted the patch/update-legislative-session branch September 24, 2024 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update legislative session automatically
3 participants