-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Scottish Parliament Scraper #172
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really good start, but think we need to do a bit more before switching to it. Have made a start on some at the download end. To sum up the I think important ones:
- Remembering the Presiding Officer / Deputy from their first use
- And/or using the SP person IDs
- Spotting topical question subheadings both in parse and convert
- Handling unspoken names using
or-bill-section-bold
- Spotting timestamps (which include suspend/resume as meta text to include I think)
- Spotting
or-italic
- Fixing speech after division
ae430fc
to
3c19e8e
Compare
762a000
to
f87d77a
Compare
That should be all the major issues tidied up - does this also need an adjustment in TWFY to pull from the sp_2024 directory it puts the finished files in? |
Yep, once the parser is updated and has pulled in some data, we can update it so it starts loading in from there |
This adds a new scraper for the Scottish Parliament's new site.
I've made a new sp_2024 folder and pulled across some of the elements needed for the ID parser.
There are three main steps:
(common and resolvenames are lightly re-formatted versions of the modules from the old scraper).
This seems to work as I'd expect for some recent ones - haven't tested actually loading the data.
There's some more special case stuff that could be loaded from the old scraper, but probably makes more sense to bring it across as things break?