东方财富网的股票信息存在于行情中心页面的“table” tag里面,网页改版后,不能从源代码读取table信息,因此我们想要爬的股票信息不能直接从html网页获取,需要利用开发者工具分析网页请求来源。
Task: 获取东方财富网中当日沪深A股股票名称和交易数据
original url: http://quote.eastmoney.com/center/gridlist.html#hs_a_board
idea: urllib.request—>response—>bs4 parser—>get and transform json data—>save csv file
__
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'}
request = r.Request(url_shsz, headers=headers)
response = r.urlopen(request)
html = response.read().decode('utf-8','ignore')
soup = BeautifulSoup(html,'html.parser')
__
complete code in link: https://github.com/Ariannahs/Web-Crawling-Stock-Data-/blob/master/Python%20Web%20Crawling%20-%20%E8%82%A1%E7%A5%A8%E6%95%B0%E6%8D%AE%E7%88%AC%E5%8F%96.ipynb