-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTODO
38 lines (37 loc) · 3.45 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
[Done] Add download folder specification in config file
[Done] Add category and page number specification in config file
[Done] (use FastAPI and Uvicorn to run Python API server, relatively high performance) Decide whether to have a seperate program to serve public api or make the crawler itself to serve public api (might not have a good performance), or just no public api at all
[Done] Add whether to download all pages of a illustration or just the first page, or just ignore the illustration if it has multiple pages
[Done] (always use original) Add image quality specification in config file (original, large, medium)
[Done] Add store mode that only stores image info (id, title, author, tags, url) in database (sqlite, or other sql/nosql serverless database)
[Done] Port ConfigParser to ConfigUpdater to avoid config file/comment corruption (need to add .value to all getters)
[Done] (always in background and crawl periodically) Decide whether to make the crawler runs freshly and periodically (i.e. using a crontab to update images) or make the crawler stays in background and runs periodically (specify fetch time interval in config)
[Done] Format all prints to use proper logging module
[Done] Build API server that contains: config that could use local image cache/remote urls, reload config, specify reverse proxy of pixiv image; allowing users to randomize image according to given tags/r18/ai_type/author/id; allowing users to specify certain db id to lookup certain picture; having two serve modes that could either return json to user or directly return image file to user (might be unnecessary)
[Done] Add limit to the number of images/tags/authors to crawl/return
[Done] Add ability to crawl from more than one ranking mode
[Done] Fix linux cannot find chrome binary issue
[Done] Fix the last_update_time in auth is always -1 bug
[Done] Add download_reverse_proxy in config file to specify reverse proxy for downloading images
[Done] Fix log of default stream output cannot be seen on supervisor (access stream is visible)
[Done] Add ability to exclude certain tags (e.g. exclude Gay/boy/BL/muscle/furry/futa tags) when crawling images
[Done] Add ability to crawl from a specific date or a specific period of time in the past
[Done] Add force_update option in config file to force update/insert crawled images in database
[Done] Add image quality specification in config file (original, large, medium)
[Done] Fix crawling 0 images bug
[Done] Add param explanation in docs
[Done] Add ability to return 302 redirect to image url in light mode or use param to specify whether to return 302 redirect or image file
[Done] Remove local_filename field from /api/v1 response
[Done] Add HTML return that includes image url and file name as page title
[Done] Fix that some images cannot be added into database
[Done] Add config to minify database file
[Done] Add crawler status when requesting crawl manually
[Done] Add image compression (e.g. jpeg) to reduce image size (done manually via API calls)
[Done] Add query cache to reduce database query time
[Done] Fix that crawling will download compressed images again
[Done] Add that when encountering a non-valid image file, the api will keep randomizing until a valid image is found and also remove the non-valid images from database and disk
[Done] Fix that urls in database gets replaced by reverse proxy
[Done] Add ID in html response title
[Done] Convert tinyDB to sqlite3
[Done] Add proxy support for crawling and authentication
Write a proper home page for the project