GitHub

Extracting PDFs of Authors

📝 Table of Contents

About
Built Using
Getting Started
Usage
TODO
Authors

🧐 About

Tool to obtain list of papers of interested profs from a CSV and parse PDFs into text for creating embeddings and query with GPT

Built using

SerpAPI
PyPDF2
OpenAI GPT 3
Tiktoken

🏁 Getting Started

Prerequisites

One needs to have an account with SerpAPI. SerpAPI is used to query Google Scholar, and it allows upto 100 free queries per month.
Additionally, one needs access to OpenAI GPT APIs.

Create a config.yaml file with the following keys:

csv: <CSV FILE NAME>
serpapi_key: <SerpAI API_KEY>
openai:
  api_key: <OpenAI API_KEY>
  organization: <Org name registered with OpenAI>

Installing

Create the environment

conda env create -f environment.yml

🎈 Usage

To fetch all papers from 2022 onwards of profs of interest: python fetch.py

This should create a folder papers which contain the PDFs

Then to extract data from PDFs run

python extract.py

This should create a folder papers_parse which contain the parsed data from each PDF

Finally, to ask a question from GPT run

python gpt.py -question <QUESTION> -new <True/False>

Set the -new flag to True if one wants to create new embeddings. Else set to False.

TODO

Instead of using PyPDF, use Grobid for better PDF parsing
Finetune GPT model

✍️ Authors

@saksham36

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
utils		utils
README.md		README.md
__init__.py		__init__.py
environment.yml		environment.yml
extract.py		extract.py
fetch.py		fetch.py
gpt.py		gpt.py
papers.pkl		papers.pkl
professors.py		professors.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extracting PDFs of Authors

📝 Table of Contents

🧐 About

Built using

🏁 Getting Started

Prerequisites

Installing

🎈 Usage

TODO

✍️ Authors

About

Releases

Packages

Languages

saksham36/PaperList

Folders and files

Latest commit

History

Repository files navigation

Extracting PDFs of Authors

📝 Table of Contents

🧐 About

Built using

🏁 Getting Started

Prerequisites

Installing

🎈 Usage

TODO

✍️ Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages