Skip to content

Commit

Permalink
Merge pull request #3 from deedy5/dev
Browse files Browse the repository at this point in the history
v0.2.1
1) Fix memory overhead: do not pass resp object to struct Response, only...
2) pytest: remove pytest-retry, use decorator
3) Update tests
4) Client:request(): remove Python::with_gil
5) update Readme
  • Loading branch information
deedy5 authored Apr 18, 2024
2 parents 4844354 + cd0db94 commit fb4bb98
Show file tree
Hide file tree
Showing 10 changed files with 300 additions and 309 deletions.
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "pyreqwest_impersonate"
version = "0.2.0"
version = "0.2.1"
edition = "2021"
description = "HTTP client that can impersonate web browsers, mimicking their headers and `TLS/JA3/JA4/HTTP2` fingerprints"
authors = ["deedy5"]
Expand Down
38 changes: 23 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

The fastest python HTTP client that can impersonate web browsers by mimicking their headers and `TLS/JA3/JA4/HTTP2` fingerprints.</br>
Binding to the Rust `reqwest_impersonate` library.</br>
🏁 Check the benchmarks for more details.
🏁 Check the [benchmark](https://github.com/deedy5/pyreqwest_impersonate/tree/main/benchmark) for more details.


Provides precompiled wheels:
Expand All @@ -29,7 +29,7 @@ pip install -U pyreqwest_impersonate
## Usage
### I. Client

A blocking HTTP client that can impersonate web browsers.
A blocking HTTP client that can impersonate web browsers. Not thread-safe!
```python3
class Client:
"""Initializes a blocking HTTP client that can impersonate web browsers.
Expand All @@ -53,6 +53,10 @@ class Client:
verify (bool, optional): Verify SSL certificates. Default is True.
http1 (bool, optional): Use only HTTP/1.1. Default is None.
http2 (bool, optional): Use only HTTP/2. Default is None.
Note:
The Client instance is not thread-safe, meaning it should be initialized once and reused across a multi-threaded environment.
"""
```

Expand Down Expand Up @@ -92,7 +96,8 @@ Performs a POST request to the specified URL.
```python
from pyreqwest_impersonate import Client

client = Client(impersonate="chrome_123")
# Not thread-safe! Initialize the Client instance once and reuse it across threads
client = Client(impersonate="chrome_123")

# get request
resp = client.get("https://tls.peet.ws/api/all")
Expand All @@ -118,28 +123,31 @@ TODO

#### Response attributes and methods

- `cookies`: Fetches the cookies from the response as a dictionary.
- `headers`: Retrieves the headers from the response as a dictionary.
- `status_code`: Gets the status code of the response as an integer.
- `url`: Returns the URL of the response as a string.
- `content`: Provides the content of the response as bytes.
- `text`: Decodes the response body into text, automatically detecting the character encoding.
- `json()`: Parses the response body as JSON, converting it into a Python object for easy manipulation.
- `content` (bytes): Provides the content of the response as bytes.
- `cookies` (dict): Fetches the cookies from the response as a dictionary.
- `headers` (dict): Retrieves the headers from the response as a dictionary.
- `json()` (function): Parses the response body as JSON, converting it into a Python object for easy manipulation.
- `raw` (list[int]): Contains the raw byte representation of the HTTP response body.
- `status_code` (int): Gets the status code of the response as an integer.
- `text` (str): Decodes the response body into text, automatically detecting the character encoding.
- `url` (str): Returns the URL of the response as a string.

#### Example

```python
from pyreqwest_impersonate import Client

# Not thread-safe! Initialize the Client instance once and reuse it across threads
client = Client()

response = client.get("https://example.com")

print(response.status_code) # Access the status code
print(response.url) # Access the URL
print(response.headers) # Access headers
print(response.cookies) # Access cookies
print(response.content) # Get the content as bytes
print(response.text) # Decode the content as text
print(response.cookies) # Access cookies
print(response.headers) # Access headers
print(response.json()) # Parse the content as JSON
print(response.raw) # Raw response
print(response.status_code) # Access the status code
print(response.text) # Decode the content as text
print(response.url) # Access the URL
```
12 changes: 6 additions & 6 deletions benchmark/1_threads.csv
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name,threads,cpu_time 50k,cpu_time 5k,duration 50k,duration 5k
curl_cffi,1,5.735,1.521,7.957,3.25
httpx,1,3.801,2.116,6.117,3.987
pyreqwest_impersonate,1,0.855,0.297,1.977,1.238
requests,1,5.787,2.814,8.355,4.73
tls_client,1,6.414,1.96,6.941,3.153
name,threads,cpu_time 50k,cpu_time 5k,time 50k,time 5k
curl_cffi 0.6.2,1,5.617,1.618,7.681,3.367
httpx 0.27.0,1,2.58,1.934,4.206,3.605
pyreqwest_impersonate 0.2.1,1,1.706,0.38,3.486,1.133
requests 2.31.0,1,4.852,3.121,6.993,4.743
tls_client 1.0.1,1,5.608,1.87,6.333,2.71
12 changes: 6 additions & 6 deletions benchmark/4_threads.csv
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name,threads,cpu_time 50k,cpu_time 5k,duration 50k,duration 5k
curl_cffi,4,4.014,1.255,1.567,0.866
httpx,4,2.105,1.461,1.505,1.307
pyreqwest_impersonate,4,1.15,0.399,0.875,0.751
requests,4,4.14,3.006,3.356,2.802
tls_client,4,3.803,1.357,1.382,0.832
name,threads,cpu_time 50k,cpu_time 5k,time 50k,time 5k
curl_cffi 0.6.2,4,3.859,1.124,1.415,0.703
httpx 0.27.0,4,2.172,1.422,1.485,1.228
pyreqwest_impersonate 0.2.1,4,1.168,0.477,2.025,1.617
requests 2.31.0,4,4.036,3.237,3.221,3.08
tls_client 1.0.1,4,3.52,1.185,1.252,0.723
85 changes: 56 additions & 29 deletions benchmark/benchmark.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
from importlib.metadata import version
import pandas as pd
import requests
import httpx
Expand All @@ -8,32 +9,48 @@
import curl_cffi.requests

results = []
PACKAGES = [
("requests", requests.Session),
("httpx", httpx.Client),
("tls_client", tls_client.Session),
("curl_cffi", curl_cffi.requests.Session),
("pyreqwest_impersonate", pyreqwest_impersonate.Client),
]


def add_package_version(packages):
return [(f"{name} {version(name)}", classname) for name, classname in packages]


def session_get_test(session_class, requests_number):
s = session_class()
for _ in range(requests_number):
s.get(url).text


PACKAGES = add_package_version(PACKAGES)

# one thread
requests_number = 2000
for response_size in ["5k", "50k"]:
url = f"http://127.0.0.1:8000/{response_size}"
print(f"\nOne worker, {response_size=}, {requests_number=}")
for name, session_class in [
("requests", requests.Session),
("httpx", httpx.Client),
("tls_client", tls_client.Session),
("curl_cffi", curl_cffi.requests.Session),
("pyreqwest_impersonate", pyreqwest_impersonate.Client),
]:
for name, session_class in PACKAGES:
start = time.perf_counter()
cpu_start = time.process_time()
session_get_test(session_class, requests_number)
dur = round(time.perf_counter() - start, 3)
cpu_dur = round(time.process_time() - cpu_start, 3)
results.append({"name": name, "threads": 1, "response_size": response_size, "duration": dur, "cpu_time": cpu_dur})
print(f" name: {name:<22} {response_size=} {dur=} {cpu_dur=}")
results.append(
{
"name": name,
"threads": 1,
"size": response_size,
"time": dur,
"cpu_time": cpu_dur,
}
)
print(f" name: {name:<30} time: {dur} cpu_time: {cpu_dur}")


# multiple threads
Expand All @@ -42,36 +59,46 @@ def session_get_test(session_class, requests_number):
for response_size in ["5k", "50k"]:
url = f"http://127.0.0.1:8000/{response_size}"
print(f"\n{threads_number} workers, {response_size=}, {requests_number=}")
for name, session_class in [
("requests", requests.Session),
("httpx", httpx.Client),
("tls_client", tls_client.Session),
("curl_cffi", curl_cffi.requests.Session),
("pyreqwest_impersonate", pyreqwest_impersonate.Client),
]:
for name, session_class in PACKAGES:
start = time.perf_counter()
cpu_start = time.process_time()
with ThreadPoolExecutor(threads_number) as executor:
futures = [executor.submit(session_get_test, session_class, requests_number) for _ in range(threads_number)]
futures = [
executor.submit(session_get_test, session_class, requests_number)
for _ in range(threads_number)
]
for f in as_completed(futures):
f.result()
dur = round(time.perf_counter() - start, 3)
cpu_dur = round(time.process_time() - cpu_start, 3)
results.append({"name": name, "threads": threads_number, "response_size": response_size, "duration": dur, "cpu_time": cpu_dur})
print(f" name: {name:<22} {response_size=} {dur=} {cpu_dur=}")

results.append(
{
"name": name,
"threads": threads_number,
"size": response_size,
"time": dur,
"cpu_time": cpu_dur,
}
)
print(f" name: {name:<30} time: {dur} cpu_time: {cpu_dur}")


df = pd.DataFrame(results)
pivot_df = df.pivot_table(index=['name', 'threads'], columns='response_size', values=['duration', 'cpu_time'], aggfunc='mean')
pivot_df = df.pivot_table(
index=["name", "threads"],
columns="size",
values=["time", "cpu_time"],
aggfunc="mean",
)
pivot_df.reset_index(inplace=True)
pivot_df.columns = [' '.join(col).strip() for col in pivot_df.columns.values]
pivot_df = pivot_df[['name', 'threads'] + [col for col in pivot_df.columns if col not in ['name', 'threads']]]
unique_threads = pivot_df['threads'].unique()
pivot_df.columns = [" ".join(col).strip() for col in pivot_df.columns.values]
pivot_df = pivot_df[
["name", "threads"]
+ [col for col in pivot_df.columns if col not in ["name", "threads"]]
]
unique_threads = pivot_df["threads"].unique()
for thread in unique_threads:
thread_df = pivot_df[pivot_df['threads'] == thread]
thread_df = pivot_df[pivot_df["threads"] == thread]
print(f"\nTable for {thread} threads:")
print(thread_df.to_string(index=False))
thread_df.to_csv(f'{thread}_threads.csv', index=False)



thread_df.to_csv(f"{thread}_threads.csv", index=False)
6 changes: 0 additions & 6 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -32,13 +32,7 @@ dependencies = []
[project.optional-dependencies]
dev = [
"pytest>=8.1.1",
"pytest-retry>=1.6.2",
]

[tool.maturin]
features = ["pyo3/extension-module"]

[tool.pytest.ini_options]
retries = 3
retry_delay = 0.5
cumulative_timing = false
Loading

0 comments on commit fb4bb98

Please sign in to comment.