Crawler

=======

Weibo Content Crawler

2015-01-27:

All crawling modules completed
Revisions to error completed
Need to put them in json format

2015-02-02:

Changed the target URL to only crawl the actual weibo excluding the ads and sponsored pages (this can be easily implemented later depending on the circumstances)
Need to surmount the issue of logging in to view the entire list of tweet and resolve the issue of crawling frequency

2015-02-03:

can log in the weibo through input ID and PASS
Can crawl the page after logging in and continue to move to next pages

2015-02-08:

Things to Tweak Around

How to stop this crawler? When to stop this crawler?
- doesn't take long time to compare the latest (one-by-one) prior to making it in json file
What happens after page 50? (It repeats)
- Repeat of page 1
- Stopped at max 50
Duplicate prevention method to prevent the crawler from crawling duplicate posts compared with previous crawling session
Exception handling for weird posts without names & urls (Resolved)
- possibly exception handling for index error
- Or is there a better way to handle this without loss of posts?

2015-02-24:

How to stop this crawler? When to stop this crawler? (Resolved)
- Stopped when click button doesn't exit
- consequently move to different time period
perhaps better naming?

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
README.md		README.md
aggregate_final.py		aggregate_final.py
aggregate_working.py		aggregate_working.py
aggregate_working_past.py		aggregate_working_past.py
modules.py		modules.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crawler

Weibo Content Crawler

2015-01-27:

2015-02-02:

2015-02-03:

2015-02-08:

Things to Tweak Around

2015-02-24:

About

Releases

Packages

Languages

hank110/Crawler

Folders and files

Latest commit

History

Repository files navigation

Crawler

Weibo Content Crawler

2015-01-27:

2015-02-02:

2015-02-03:

2015-02-08:

Things to Tweak Around

2015-02-24:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages