-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
human readable date formatter #79
Comments
Cole and Rebecca tried out few-shot prompting for going from human-readable dates to EDTF. We provided the EDTF spec and a list of example string-EDTF pairs. Cole tested GPT-4o. It did fairly well and responded well to corrections when it missed a kind of formatting - it parsed similar examples correctly after the correction. Rebecca tested Claude Sonnet 3.5; it did similarly well with formatting, but struggled with calendar conversions. We're going to stick with parsing human-readable dates to EDTF for now. The core prompt:
|
We are going to try to standardize a prompt with the needed EDTF spec and examples as context to include with each query. We create some test data and a script that we can use with multiple LLMs (maybe using |
I was using the version of Claude integrated with the Zed editor, which takes advantage of Anthropic's context caching. Here's the prompt I started with:
I was curious how or if it would handle calendar conversion, so I tried a few examples from the Princeton Geniza Project; it did ok on some of them but others had different results (or different precision) than what we have in PGP. Here's the full transcript of my experiments, minus the contents of the LOC EDTF spec which the |
@ColeDCrawford I got the chance to chat with @statsmaths about the kinds of dates he had to wrangle to use the FSA/OWA Library of Congress collection. One of the interesting things that wasn't on my radar previously is that LLMs should do better at taking advantage of context and we will want to test; e.g. one of the photos in the examples Taylor shared was listed as July of a particular year in the date field but it's evident from the title that it is July 4th. Probably other general context would be relevant, like hemisphere if you're using seasons, or US / Britain context to disambiguate month/year vs year/month. I've started a google sheets spreadsheet in the shared drive for this project with provisional structure for starting to collect example dates from different projects for whenever we're able to circle back to this. I've set it up so we can keep track of when we have example dates from specific projects or sources. |
ideally should support conversion in both directions; implementation should support localization for so it works in multiple language; this either overlaps with or can leverage #78
The text was updated successfully, but these errors were encountered: