Releases: Future-House/paper-qa
Releases · Future-House/paper-qa
v5.2.0: OpenAlex
Highlights
Added a new metadata provider OpenAlex for scholarly work, researchers, institutions, journals, and research topics.
- Responses can include open access information and raw pdf locations.
- Doesn't require authentication, but does prioritize requests with an email in the
mailto
URL parameter, exposed as an environment variableOPENALEX_MAILTO
Implemented an opt-in bypass around the litellm.Router
for LLM completions (see #563)
What's Changed
- Fixed pickle-ability of
LiteLLMModel
by @jamesbraza in #560 - Refactoring
LiteLLMModel
before removingRouter
by @jamesbraza in #561 - Adds openalex client as a default client by @nadolskit in #555
- Moving to
setup-uv
andhynek/build-and-inspect-python-package
in CI by @jamesbraza in #564 - Ability to bypass usage of
litellm.Router
by @jamesbraza in #563 - Propagating
hynek/build-and-inspect-python-package
's output location topypa/gh-action-pypi-publish
by @jamesbraza in #565 - Downloading
Package
artifact forpypa/gh-action-pypi-publish
by @jamesbraza in #566
Full Changelog: v5.1.1...v5.2.0
v5.1.1
What's Changed
- Lock file maintenance by @renovate in #545
- Validating for broken index by @jamesbraza in #544
- Added example how to use ollama hosted models by @grg-ffb in #536
- Making parsing resistant to failed inference of citations by @whitead in #551
- Exposed log verbosity configuration function by @jamesbraza in #552
- Cleaned up log verbosity code by @jamesbraza in #554
New Contributors
Full Changelog: v5.1.0...v5.1.1
v5.1.0: rate limits, refactored settings
Highlights
In-housed rate limits management
- Centers on a moving window algorithm with either a Redis or in-memory state
- Supports dynamically defined rates for different models or providers.
- New bundled configurations for different OpenAI rate limit tiers
- Accomplished using new third party dependencies
coredis
andlimits
Refactored Settings
to allow for increased flexibility
- Indexing
- Indexes can use relative paths, enabling sharing across machines
- Paper search now no longer rebuilds the index every invocation
- Index parameter now are grouped in
IndexSettings
- This release begins a deprecation cycle for the original hyperparameters
- Index builds now have a
rich.Progress
bar
- Parsing
- Chunking and embedding can now be deferred to inference time
- Agents
- Agents now have a
max_timesteps
parameter to upper-bound trajectory length - Default agent is now a simple tool calling agent (
ToolSelector
), instead of a deterministic sequence of tool calls ("fake" agent)
- Agents now have a
Several bug fixes centered on retry-able errors:
- Flaky Semantic Scholar and Crossref SSL errors and connection reset errors
- LLM completions and text embeddings
What's Changed
- Cleaning up #489's implementation by @jamesbraza in #503
- chore(deps): lock file maintenance by @renovate in #504
- chore(deps): lock file maintenance by @renovate in #506
- chore(deps): update all non-major dependencies by @renovate in #505
- Filtering two more
DeprecationWarning
s by @jamesbraza in #509 - Refactor to create
settings.agent.index
grouping by @jamesbraza in #510 - Removed extra
save_index
calls, and added missingchanged
by @jamesbraza in #513 - Not rebuilding
SearchIndex
everypaper_search
by @jamesbraza in #512 - Updated citation to arxiv preprint by @whitead in #514
- Aviary agent
max_timesteps
and fixedtest_gather_evidence_rejects_empty_docs
by @jamesbraza in #515 - Moved
reset_log_levels
tousefixtures
by @jamesbraza in #517 - Decomposing
Answer.could_not_answer
by @jamesbraza in #516 - Fixing
IndexSettings.use_absolute_paper_directory
leading to relative index file paths by @jamesbraza in #518 - Moving
run_ldp_agent
to center onRolloutManager
by @jamesbraza in #519 - Retrying on known Semantic Scholar flaky SSL error in
get_s2_doc_details_from_doi
by @jamesbraza in #522 - Converted
PyMuPDF
message to warning logs by @jamesbraza in #523 rich.Progress
bar for monitoring index builds by @jamesbraza in #521- Better descriptions and log messages by @jamesbraza in #524
- Made it possible to skip chunking by @whitead in #526
- Retrying on
aiohttp.ClientConnectionResetError
by @jamesbraza in #529 - Add rate limits for LLMs and Embedding Models by @mskarlin in #520
- Disallowing confusing
None
fromIndexSettings.index_directory
, andIndexSettings.get_named_index_directory
by @jamesbraza in #531 - Add router_kwargs in separate control flow step by @mskarlin in #532
- Propagating
AgentSettings.agent_type
default for synchrony by @jamesbraza in #533 - Adding retrying of
aembedding
if it fails by @jamesbraza in #535 - Add limits+coredis to mypy by @mskarlin in #537
- Lock file maintenance by @renovate in #534
- Controlling for
pymupdf
version intest_pdf_reader_match_doc_details
VCR by @jamesbraza in #538 - Lock file maintenance by @renovate in #539
- Fixed yet another
api.semanticscholar.org:443 ssl:default
error via retrying by @jamesbraza in #540
Full Changelog: v5.0.10...v5.1.0
v5.0.10
What's Changed
- Discovered Renovate
:automergeMinor
and preventingopenai
version bumps by @jamesbraza in #493 - Fixing
LitQATaskDataset
deserialization from config by @jamesbraza in #494 - chore(deps): update all non-major dependencies by @renovate in #498
- Broken reader ut by @nadolskit in #497
- Fixing
LitQATaskDatasetcompute_trajectory_metrics
crash with bad status extraction by @jamesbraza in #500 - For autogenerated
Router
kwargs, specifyingtimeout
of 60-sec by @jamesbraza in #501
Full Changelog: v5.0.9...v5.0.10
v5.0.9
What's Changed
- Fixing
tests/tests/cassettes
issue by using absolute path by @jamesbraza in #482 - Retrying on known Crossref flaky SSL error in
doi_to_bibtex
by @jamesbraza in #479 - Cleaning up and testing
get_directory_index
by @jamesbraza in #483 - Modernizing Renovate config by @jamesbraza in #487
- Allowing
parse_text
to be given astr
path by @jamesbraza in #491 - Refactor to expose
agents.RichHandler
by @jamesbraza in #489
Full Changelog: v5.0.8...v5.0.9
v5.0.8
What's Changed
- Documenting and cleaning up manifest file logic by @jamesbraza in #448
- Latest dependencies for
pylint
3.3 by @jamesbraza in #463 - Down-pinning
openai
1.47 since it breaks CI by @jamesbraza in #466 - Lock file maintenance by @renovate in #462
- chore: add .gitattributes for cassettes file by @devstein in #468
- Documenting Python 3.11+ in README by @jamesbraza in #467
- Fixing flaky
test_tool_failure
by @jamesbraza in #465 - Documenting manifest CSV pathing a bit more by @jamesbraza in #469
- Handling S2
KeyError
crash during indexing by @jamesbraza in #472 - Fixing
pymupdf.mupdf.FzErrorFormat
crash by recasting as anImpossibleParsingError
by @jamesbraza in #474 - Updating
test_tool_failure
cassette by @jamesbraza in #476 - Simplifying the indexing of
action
tokens by @jamesbraza in #477 - Truncating failing
test_evaluation
viamax_rollout_steps
by @jamesbraza in #475
New Contributors
Full Changelog: v5.0.7...v5.0.8
v5.0.7
What's Changed
- Fixing flaky test
test_pdf_reader_match_doc_details
by @jamesbraza in #447 - Retrying Crossref's
aiohttp.ClientConnectorError
by @jamesbraza in #444 - Handling
AttributeError
on structured citation prompt failure by @jamesbraza in #445
Full Changelog: v5.0.6...v5.0.7
v5.0.6
What's Changed
- Avoiding div0 crash in
LitQATaskDataset.compute_trajectory_metrics
by @jamesbraza in #439 - Added some documentation and adjusted field names for ease of use in DocDetails by @whitead in #440
Full Changelog: v5.0.5...v5.0.6
v5.0.5
What's Changed
build_index
with defaults by @jamesbraza in #430- Less common
CROSSREF_XYZ
warnings by @jamesbraza in #431 - DRY'd up default
indexes
default location by @jamesbraza in #432 - Including paper and evidence counts in metrics by @jamesbraza in #435
- Allowing case insensitive
"fake"
agent type by @jamesbraza in #437 - Many documentation improvements by @jamesbraza in #438
Full Changelog: v5.0.4...v5.0.5
v5.0.4
What's Changed
DocMetadataClient
can now take instantiated providers and processors by @geemi725 in #414- Promoting agent factories to
Settings
by @jamesbraza in #407 - Add stub_data_dir fixture to retraction test by @geemi725 in #420
pytest-recording
docs inCONTRIBUTING.md
by @jamesbraza in #410- Fixing
test_crossref_journalquality_fields_filtering
incorrect cassette set up by @jamesbraza in #423 - Broken title search ut by @nadolskit in #411
- Update pre-commit-ci/lite-action action to v1.0.3 by @renovate in #422
- Pinning min
pymupdf
version by @jamesbraza in #424 - Preventing
Environment
s from sharing oneDocs
by @jamesbraza in #425 - Dropped
PyCryptodome
andbuild
dependencies by @jamesbraza in #426
New Contributors
- @nadolskit made their first contribution in #411
Full Changelog: v5.0.3...v5.0.4