-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How could we relate the historichansard_id IDs to historic Hansard slugs? #91
Comments
https://code.google.com/archive/p/hansard/downloads contains database dumps under reference_data. TWFY has its old import code for this in |
For reference, the Historic Hansard code is also available on GitHub https://github.com/millbanksystems/hansard and the running site is no longer a Rails app, it's (effectively) a flat file backup + a Sinatra app to replicate the original search functionality (um, https://github.com/lizconlan/hh-search-app I think, I should probably transfer that to the correct ownership) I have a full database backup somewhere... |
"n.b. some people in parlparse have the ID scheme historichansard_person_id and some have historichansard_id - I'm assuming they're the same ID space, but maybe not" - no, as with us, one is a person ID, one is a membership ID. |
Thanks, @dracos and @lizconlan - that's brilliant. |
I'm frustrated by this, because I think one of the first things I did when working for mySociety was working with @frabcus on importing people from the historic Hansard data but I can't remember enough of the detail to be able to answer my own question!
The Wikidata project has imported all the historic MPs from the historic Hansard records from http://hansard.millbanksystems.com/ using the slugs on people pages as IDs - this is Wikidata property P2015. parlparse, however, uses IDs for historic MPs with the scheme
historichansard_id
which is numeric. If we could find the mapping between these two ID spaces, that would able us to straightforwardly associate everyone in parlparse with the right Wikidata items, which would be brilliant.The problem is that I can't find any use of the
historichansard_id
values on http://hansard.millbanksystems.com/ at all now. It's not in the source of people pages or debate pages on that site. The credits page links to the XML data that site is based on: http://www.hansard-archive.parliament.uk/ but those don't appear to have IDs associated with members at all - the<member> ... </member>
tags have no attributes, and I can't see any other element that has them. (This is all worth double-checking, I should say!)Can anyone help with figuring this out? Is it possible that we used a different structured data source from those XML files when importing the historic MPs into parlparse, and I'm just not finding it now? (Looking through the history of this repository, I can't even see what script might have been used for the import now, though I imagine we did commit it.)
If the
historichansard_id
s were the database primary keys for the Rails site hosted here: http://hansard.millbanksystems.com/ (source code here: https://code.google.com/archive/p/hansard/downloads ) then perhaps we could get a dump of that mapping from the maintainers?To help with checking this kind of thing, an example:
mrs-margaret-thatcher
(page here: http://hansard.millbanksystems.com/people/mrs-margaret-thatcher/ )historichansard_person_id
of 5962n.b. some people in parlparse have the ID scheme
historichansard_person_id
and some havehistorichansard_id
- I'm assuming they're the same ID space, but maybe not.Cc: @dracos @crowbot
The text was updated successfully, but these errors were encountered: