=======
- All crawling modules completed
- Revisions to error completed
- Need to put them in json format
- Changed the target URL to only crawl the actual weibo excluding the ads and sponsored pages (this can be easily implemented later depending on the circumstances)
- Need to surmount the issue of logging in to view the entire list of tweet and resolve the issue of crawling frequency
- can log in the weibo through input ID and PASS
- Can crawl the page after logging in and continue to move to next pages
- How to stop this crawler? When to stop this crawler?
- doesn't take long time to compare the latest (one-by-one) prior to making it in json file
- What happens after page 50? (It repeats)
- Repeat of page 1
- Stopped at max 50
- Duplicate prevention method to prevent the crawler from crawling duplicate posts compared with previous crawling session
- Exception handling for weird posts without names & urls (Resolved)
- possibly exception handling for index error
- Or is there a better way to handle this without loss of posts?
- How to stop this crawler? When to stop this crawler? (Resolved)
- Stopped when click button doesn't exit
- consequently move to different time period
- perhaps better naming?