Think of diverse and powerful test cases for the current system that can validate functioning of isolated individual components.
Explicitly parse equations from LaTeX source to explicitly store along with a parsed paper.
Extract comments from Reddit and Twitter about a paper along with the engagments of upvotes/likes to include in search ranking and also store along with parsed research content.
Move projects such as arxiv-table-miner
,arxiv-table-miner-ml-models
, and semantic-scholar-data-pipeline
into one project under arxiv-miner
Explore integration of vector search engines like vlad