Skip to content

Latest commit

 

History

History

docs

ArXiv-Miner

ArXiv Miner is a toolkit for mining research papers on CS ArXiv.

What is ArXiv-Miner

arxiv-miner is a quick handy library that helps power Sci-Genie [Project is no longer hosted and parts of it will be open-sourced in the future]. Sci-Genie was a search engine to quickly search through full text of papers on CS ArXiv.

arxiv-miner helps extract and parse LaTeX documents from CS ArXiv. It also supports storage and search of those parsed documents using Elasticsearch. The library can be applicable for all other domains like Math, Physics, Biology etc.

Why was ArXiv-Miner created ?

ArXiv Miner was created for easily scraping, parsing and searching research content on ArXiv. This library was created after stitching together a solutions from the code of various tools like arxiv-sanity, arxiv-vanity/engrafo, arxivscraper, tex2py, cso-classifier and axcell. Parsed structure of the content can be useful in search or any scientific research mining/AI applications as a heuristic baseline.

Core Components of ArXiv-Miner

  • Scraping
  • Parsing
  • Indexing/Storage

Family Of Projects With ArXiv-Miner

Credits and Appreciation

This project like all others has been built on shoulders of giants. A big thanks to the creators of the following libraries/open source projects that aided the development of arxiv-miner, and it's family of projects:

Licence

MIT