Cha 2 -编写你的第一个网络爬虫.ipynb #2

Github-Minghui · 2019-01-31T02:16:09Z

当我试图运行下面这段编码的时候，编译器报错。在想是否因为我在用的是英文系统，无法encode中文。请教
import requests
from bs4 import BeautifulSoup #从bs4这个库中导入BeautifulSoup

link = "http://www.santostang.com/"
headers = {'User-Agent' : 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'}
r = requests.get(link, headers= headers)

soup = BeautifulSoup(r.text, "html.parser") #使用BeautifulSoup解析这段代码
title = soup.find("h1", class_="post-title").a.text.strip()
print (title)

with open('title_test.txt', "a+") as f:
f.write(title)
f.close()

===================================================================
4.3 通过selenium 模拟浏览器抓取

UnicodeEncodeError Traceback (most recent call last)
in
11
12 with open('title_test.txt', "a+") as f:
---> 13 f.write(title)
14 f.close()

~\Anaconda3\lib\encodings\cp1252.py in encode(self, input, final)
17 class IncrementalEncoder(codecs.IncrementalEncoder):
18 def encode(self, input, final=False):
---> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
20
21 class IncrementalDecoder(codecs.IncrementalDecoder):

UnicodeEncodeError: 'charmap' codec can't encode characters in position 4-5: character maps to

shimmer07 · 2019-04-02T03:03:10Z

我在我的电脑上运行没有问题，你试试抓取英文的可以吗

ffflora · 2019-06-26T08:37:39Z

试试这样，可以解决这个问题

import codecs
with codecs.open('title_test.txt','a+','utf-8') as f:
    f.write(title)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cha 2 -编写你的第一个网络爬虫.ipynb #2

Cha 2 -编写你的第一个网络爬虫.ipynb #2

Github-Minghui commented Jan 31, 2019

shimmer07 commented Apr 2, 2019

ffflora commented Jun 26, 2019 •

edited

Loading

Cha 2 -编写你的第一个网络爬虫.ipynb #2

Cha 2 -编写你的第一个网络爬虫.ipynb #2

Comments

Github-Minghui commented Jan 31, 2019

=================================================================== 4.3 通过selenium 模拟浏览器抓取

shimmer07 commented Apr 2, 2019

ffflora commented Jun 26, 2019 • edited Loading

===================================================================
4.3 通过selenium 模拟浏览器抓取

ffflora commented Jun 26, 2019 •

edited

Loading