Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

爬取成功,但是db文件是空的 #3

Open
wangbiao92 opened this issue Apr 9, 2017 · 8 comments
Open

爬取成功,但是db文件是空的 #3

wangbiao92 opened this issue Apr 9, 2017 · 8 comments

Comments

@wangbiao92
Copy link

登入也成功,爬的是北京的,没有报错,就是db文件是空的,请问哪里出错了

@wangbiao92
Copy link
Author

我就试了下爬东城的,结果数据库还是空的
E:\Git\LianJiaSpider-master>python LianJiaSpider.py
d:\Anaconda2\lib\site-packages\bs4_init_.py:166: UserWarning: No parser was explicitly spe
arser for this system ("lxml"). This usually isn't a problem, but if you run this code on ano
ment, it may use a different parser and behave differently.

To get rid of this warning, change this:

BeautifulSoup([your markup])

to this:

BeautifulSoup([your markup], "lxml")

markup_type=markup_type))
爬下了 东城 区全部的小区信息
done
all done ^_^

@lanbing510
Copy link
Owner

链家最近加了严格的限制(验证码和流量限制),代码还没有进行更新

@wangbiao92
Copy link
Author

我把代码改了,把小区信息爬了下来,就是爬到成交记录,就ip异常,我再去找找有没有解决的办法,谢谢了

@pfsun
Copy link

pfsun commented Apr 11, 2017

@lanbing510 @wangbiao92 我第一次趴下来的时候数据db也是空的,你后来怎么解决数据库空的,运行第二次就一直有验证码和流量限制了 没办法login了,可以share 一下相应的code或者解决方法么?谢谢

@wangbiao92
Copy link
Author

@pfsun ,链家的网页改动了,所以代码要改动,但还是没有解决流量异常的问题,用了ip代理没用

@XuefengHuang
Copy link

可以试试我这个爬虫 数据会存在mysql。https://github.com/XuefengHuang/lianjia-scrawler

@pfsun
Copy link

pfsun commented Apr 14, 2017

@wangbiao92 好的 我再试试 谢谢

@pfsun
Copy link

pfsun commented Apr 14, 2017

@XuefengHuang 谢谢 我试试去。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants