Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mission/toujuan/m1.2 #135

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions works/toujuan/M1.1/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
import sys
import requests
import json


def solve(username):
url = 'https://codeforces.com/api/user.info'

params = {
'handles': username
}

response = requests.get(url, params=params)

if response.status_code == 200: # 检查响应状态码
data = response.json() # 将响应数据解析为 JSON 格式
if data['status'] == 'OK': # 检查 API 返回的状态
result = data['result'][0]


if 'rating' in result:

output_data = {
"handle": username,
"rating": result['rating'],
"rank": result['rank'],
}
else:

output_data = {
"handle": username
}

data_json = json.dumps(output_data)
sys.stdout.write(data_json + "\n")
sys.exit(0)
else:
sys.stderr.write("no such handle\n")
sys.exit(1)



def main():


username = sys.argv[1]
solve(username)


if __name__ == '__main__':
main()
4 changes: 4 additions & 0 deletions works/toujuan/M1.1/summarize.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
总结:
1 学会了命令行返回异常的情况,如报错403之后的处理
2 学会了API 通常返回结构化的数据(如 JSON 等)。这些数据直接按照一定的格式组织,易于解析和处理。如果使用爬取html的方法,需要使用一个解析库来提取信息,相对麻烦。
3 学会了命令行输入输出指令
59 changes: 59 additions & 0 deletions works/toujuan/M1.2/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
import sys
import requests
import json


def solve(username):
url = 'https://codeforces.com/api/user.info'

params = {
'handles': username
}

response = requests.get(url, params=params)
if response.status_code == 403:
sys.stderr.write("Access Forbidden: {}".format(response.status_code) + '\n')
sys.exit(1)
elif response.status_code == 404:
sys.stderr.write("Not Found: {}".format(response.status_code) + '\n')
sys.exit(1)
elif response.status_code == 503:
sys.stderr.write("Service Unavailable: {}".format(response.status_code) + '\n')
sys.exit(1)
elif response.status_code == 200: # 检查响应状态码
data = response.json() # 将响应数据解析为 JSON 格式
if data['status'] == 'OK': # 检查 API 返回的状态
result = data['result'][0]


if 'rating' in result:

output_data = {
"handle": username,
"rating": result['rating'],
"rank": result['rank'],
}
else:

output_data = {
"handle": username
}

data_json = json.dumps(output_data)
sys.stdout.write(data_json + "\n")
sys.exit(0)
else:
sys.stderr.write("no such handle\n")
sys.exit(1)



def main():


username = sys.argv[1]
solve(username)


if __name__ == '__main__':
main()
19 changes: 19 additions & 0 deletions works/toujuan/M1.2/summarize.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
总结:
爬取 API
优点
结构化数据:API 通常返回 JSON 或 XML 格式的结构化数据,易于解析和处理。
数据可靠性:API 设计上更稳定,数据变化较少,通常有版本控制。
效率:API 通常只返回所需数据,减少了数据传输量,提高了抓取效率。
实时性:API 通常提供最新的数据,适合需要实时或频繁更新数据的场景。
缺点
数据有限:API 提供的数据通常是有限的,可能无法覆盖所有网站上可见的信息。
依赖性:依赖于 API 提供方的服务,若 API 关闭或修改,爬虫需要重新设计。
爬取 HTML
优点
数据广泛性:可以抓取网页上的所有可见信息,适用范围广。
自由度高:不受 API 限制,可以自行选择抓取的内容和频率。
缺点
稳定性差:网页结构可能频繁变化,导致爬虫代码需要经常更新。
效率低:HTML 页面包含大量不相关的内容(如广告、导航栏等),传输数据量大。

知道了如503,403,404等错误状态码,503是访问速度过快导致