Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add image features. Add repeat detection to save tokens #88

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ccetaw
Copy link

@ccetaw ccetaw commented Mar 22, 2023

实现以下功能:

  1. 在本地保存 实验/结果 (Experiments, Evaluation, Results...) 之前的图片, 并添加到summary最后
  2. 支持非精确匹配关键词,使用--coarse激活
  3. 避免重复下载分析同名论文以减少token的使用,使用--repeat来强制summarize
  4. 现在总是会覆盖同名.md文件

改动集中在 get_paper_from_pdf.py中的get_image_path(self, image_path='')chat_paper.pydownload_pdf()部分.

I've implemented the followin features:

  1. Could now save images on local device and add them to the end of generated markdown files
  2. Support non-exact key word matching, use --coarse to activate
  3. Now the program avoids download and summarize already downloaded paper to save your tokens, use --repeat to force downloading and summarizing
  4. Always overwrite the generated markdown files.

The changes mainly happen in get_paper_from_pdf.py function get_image_path() and chat_paper.py function download_pdf()

… Could do coarse search. Avoid repeat summarizing papers to save tokens. Always overwrite existing files.
@ccetaw
Copy link
Author

ccetaw commented Mar 22, 2023

已知的bug:

  1. PDF里可能包含未显示的图片,或者一个图片有多个图层,这样提取出来的图片会非常混乱
  2. 生成的.md文件可能会有很多indent

Known bugs:

  1. A PDF file may contain images not showed, or one image is actually composed of multiple images. After extraction it would be a mess
  2. Many indents might be added to the .md files with unknown reasons

@thusjr
Copy link

thusjr commented Mar 26, 2023

加油

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants