-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement memento api #408
base: master
Are you sure you want to change the base?
Conversation
…e negotiation through timegate
Code compiles, but 3 unittest still fails because the expect "localhost as return from the embedded solr, but it is hostname instead.
happens to be a solrwaybackweb.properties in home. added line to initialize webproperties.
Minor unittest improvement+refactoring
does not try to access warc-files for payload by setting playback allowed.
that expected localhost:8080 but on my maven build machine it was <hostname>:8080 instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hard part of this task it to figure out that the API really is.
Good job with all the unittest.
The "timegate" (direct playback linking) part of the API is working and require no further work.
But only for the 2.1 spec with redirect 302. I have removed the payload, since
payload is never read from a 302. So this is a simple redirect to the playback-page.
There is no need to also support thre 2.2 spec. The current solution by returning the
playback payload directly does not even, if though it is the correct payload.
The reason is the playback API must run under solrwayback/sercvicesweb/ and not solrwayback/services/memento. All links are handling in root-servlet, serviceworker,
referrer fixing etc. require this specific url.
If we want to support 2.2 (which there is no need for), the only solution is to do as PyWb
bu returning a mimimum html page with a single frame that points to the playback url (as was constructed in 2.1). This will also remove the mixing of header-fields from memento and important playback header fields. (But no need to implement!)
But for the timemap API I think a few fixes of the output is required.
Timemap, link
Compared the two responses from PyWB and SW:
https://solrwb-test.kb.dk:4000/solrwayback/services/memento/timemap/link/http://prak10k.dk/?page_id=13
https://pywb-test.kb.dk/myindex/timemap/link/http://prak10k.dk/?page_id=13
Besides from order of the lines there is a difference:
rel="memento" vs rel="first memento"
Also collection name is missing.
But why so few results in SolrWayback when I compared?
https://solrwb-test.kb.dk:4000/solrwayback/services/memento/timemap/link/http://news.dk/
https://pywb-test.kb.dk/myindex/timemap/link/http://news.dk/
Timemap, json
These two responses are very different
https://solrwb-test.kb.dk:4000/solrwayback/services/memento/timemap/json/http://news.dk/
https://pywb-test.kb.dk/myindex/timemap/json/http://news.dk/
I am looking at this this evening. I don't have access to the test servers at KB. Maybe you can help me gaining access tomorrow. For once I found a good description of the API: here
I will look at the timemap json and link collections when I get access to the test server as this makes comparing way easier. |
Looking into this today. the timemap link implemented in solr wayback uses paging of results, while pywb doesn't, thats why the results are looking different there. I'll change the paging amount from 2, which it seems to be as of now. Thinking of making it 20 or something like that. I'll do a deeper dive into the json format as these are completely different |
Implements the memento framework at the endpoint services/memento.
Two features are central for memento:
DatetimeNegotiation and Timemaps.
DatetimeNegotiation can be implemented in multiple ways. I've implemented the 2.2 pattern, which is recommended for webarchives. - https://www.rfc-editor.org/rfc/rfc7089.html#page-24
Pattern 2.1 can also be chosen through the property: memento.redirect
Timemaps can be delivered in two different formats link-type and json as specified in the memento specification: http://mementoweb.org/guide/rfc/#Pattern6
Closes #42