Skip to content

Commit

Permalink
[FEATURE] add authorization header to httpstep (#170)
Browse files Browse the repository at this point in the history
## Description
Changes to the `HttpStep` class to add comprehensive unit tests. The changes to the `HttpStep` class include:
- Addition of `decode_sensitive_headers` method to decode `SecretStr` values in headers.
- Modification of `get_headers` method to dump headers into JSON without `SecretStr` masking.
- Addition of methods (`get`, `post`, `put`, `delete`) to handle different HTTP methods.
- Addition of the `auth_header` field to handle authorization headers, replacing the previous implementation.

The unit tests cover:
- Successful and failed HTTP requests for GET, POST, PUT, and DELETE methods.
- Handling of authorization headers, including Bearer tokens and Digest authentication.
- Retry logic for handling HTTP errors with configurable retry counts.
- Backoff strategies for retrying requests with exponential backoff.

## Related Issue
#162 

## Motivation and Context
This pull request addresses potential data leaks regarding authorization headers. The changes ensure that sensitive information is handled securely and that the `HttpStep` class can handle various HTTP methods.
The comprehensive unit tests help prevent regressions and ensure that the class behaves as expected under different conditions.

---------

Co-authored-by: Danny Meijer <[email protected]>
  • Loading branch information
dannymeijer and dannymeijer authored Feb 26, 2025
1 parent 522fd70 commit 744fab6
Show file tree
Hide file tree
Showing 2 changed files with 125 additions and 32 deletions.
92 changes: 74 additions & 18 deletions src/koheesio/steps/http.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
This module contains a few simple HTTP Steps that can be used to perform API Calls to HTTP endpoints
This module contains several HTTP Steps that can be used to perform API Calls to HTTP endpoints
Example
-------
Expand Down Expand Up @@ -64,9 +64,34 @@ class HttpStep(Step, ExtraParamsMixin):
"""
Can be used to perform API Calls to HTTP endpoints
Authorization
-------------
The optional `auth_header` parameter in HttpStep allows you to pass an authorization header, such as a bearer token.
For example: `auth_header = "Bearer <token>"`.
The `auth_header` value is stored as a `SecretStr` object to prevent sensitive information from being displayed in logs.
Of course, authorization can also just be passed as part of the regular `headers` parameter.
For example, either one of these parameters would semantically be the same:
```python
headers = {
"Authorization": "Bearer <token>",
"Content-Type": "application/json"
}
```
# or
auth_header = "Bearer <token>"
```
The `auth_header` parameter is useful when you want to keep the authorization separate from the other headers, for
example when your implementation requires you to pass some custom headers in addition to the authorization header.
> Note: The `auth_header` parameter can accept any authorization header value, including basic authentication
tokens, digest authentication strings, NTLM, etc.
Understanding Retries
----------------------
This class includes a built-in retry mechanism for handling temporary issues, such as network errors or server
downtime, that might cause the HTTP request to fail. The retry mechanism is controlled by three parameters:
`max_retries`, `initial_delay`, and `backoff`.
Expand All @@ -91,6 +116,38 @@ class HttpStep(Step, ExtraParamsMixin):
`6 seconds`, and `12 seconds`. If you set `initial_delay=2` and `backoff=3`, the delays before the retries would be
`2 seconds`, `6 seconds`, and `18 seconds`. If you set `initial_delay=2` and `backoff=1`, the delays before the
retries would be `2 seconds`, `2 seconds`, and `2 seconds`.
Parameters
----------
url : str, required
API endpoint URL.
headers : Dict[str, Union[str, SecretStr]], optional, default={"Content-Type": "application/json"}
Request headers.
auth_header : Optional[SecretStr], optional, default=None
Authorization header. An optional parameter that can be used to pass an authorization, such as a bearer token.
data : Union[Dict[str, str], str], optional, default={}
Data to be sent along with the request.
timeout : int, optional, default=3
Request timeout. Defaults to 3 seconds.
method : Union[str, HttpMethod], required, default='get'
What type of Http call to perform. One of 'get', 'post', 'put', 'delete'. Defaults to 'get'.
session : requests.Session, optional, default=requests.Session()
Existing requests session object to be used for making HTTP requests. If not provided, a new session object
will be created.
params : Optional[Dict[str, Any]]
Set of extra parameters that should be passed to the HTTP request. Note: any kwargs passed to the class will be
added to this dictionary.
Output
------
response_raw : Optional[requests.Response]
The raw requests.Response object returned by the appropriate requests.request() call.
response_json : Optional[Union[Dict, List]]
The JSON response for the request.
raw_payload : Optional[str]
The raw response for the request.
status_code : Optional[int]
The status return code of the request.
"""

url: str = Field(
Expand All @@ -103,27 +160,30 @@ class HttpStep(Step, ExtraParamsMixin):
description="Request headers",
alias="header",
)
bearer_token: Optional[SecretStr] = Field(
auth_header: Optional[SecretStr] = Field(
default=None,
description="Bearer token for authorization",
alias="token",
repr=False,
description="[Optional] Authorization header",
alias="authorization_header",
examples=["Bearer <token>"],
)
data: Optional[Union[Dict[str, str], str]] = Field(
data: Union[Dict[str, str], str] = Field(
default_factory=dict, description="[Optional] Data to be sent along with the request", alias="body"
)
params: Optional[Dict[str, Any]] = Field( # type: ignore[assignment]
default_factory=dict,
description="[Optional] Set of extra parameters that should be passed to HTTP request",
)
timeout: Optional[int] = Field(default=3, description="[Optional] Request timeout")
timeout: int = Field(default=3, description="[Optional] Request timeout")
method: Union[str, HttpMethod] = Field(
default=HttpMethod.GET,
description="What type of Http call to perform. One of 'get', 'post', 'put', 'delete'. Defaults to 'get'.",
)
session: requests.Session = Field(
default_factory=requests.Session,
description="Requests session object to be used for making HTTP requests",
description=(
"Existing requests session object to be used for making HTTP requests. If not provided, a new session "
"object will be created."
),
exclude=True,
repr=False,
)
Expand Down Expand Up @@ -164,18 +224,15 @@ def get_proper_http_method_from_str_value(cls, method_value: str) -> str:
return method_value

@model_validator(mode="after")
def encode_sensitive_headers(self) -> dict:
def encode_sensitive_headers(self) -> "HttpStep":
"""
Encode potentially sensitive data into pydantic.SecretStr class to prevent them
being displayed as plain text in logs.
"""
if token := self.bearer_token:
_secret_token = token.get_secret_value()
if auth_header := self.auth_header:
# ensure the token is preceded with the word 'Bearer'
self.headers["Authorization"] = (
_secret_token if _secret_token.startswith("Bearer") else f"Bearer {_secret_token}"
)
del self.bearer_token
self.headers["Authorization"] = auth_header
del self.auth_header
if auth := self.headers.get("Authorization"):
self.headers["Authorization"] = auth if isinstance(auth, SecretStr) else SecretStr(auth)
return self
Expand Down Expand Up @@ -263,10 +320,9 @@ def _request(
The last exception that was caught if `requests.request()` fails after `self.max_retries` attempts.
"""
_method = (method or self.method).value.upper()
self.log.debug(f"Making {_method} request to {self.url} with headers {self.headers}")
options = self.get_options()

self.log.debug(f"Making {_method} request to {options['url']} with headers {options['headers']}")

with self.session.request(method=_method, **options, stream=stream) as response:
response.raise_for_status()
self.log.debug(f"Received response with status code {response.status_code} and body {response.text}")
Expand Down
65 changes: 51 additions & 14 deletions tests/steps/test_http.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
import requests_mock
from urllib3 import Retry

from koheesio.logger import LoggingFactory
from koheesio.models import SecretStr
from koheesio.steps.http import (
HttpDeleteStep,
HttpGetStep,
Expand All @@ -25,6 +27,8 @@
STATUS_503_ENDPOINT = f"{BASE_URL}/status/503"


log = LoggingFactory.get_logger(name="test_delta", inherit_from_koheesio=True)

@pytest.mark.parametrize(
"endpoint,step,method,return_value,expected_status_code",
[
Expand Down Expand Up @@ -138,28 +142,61 @@ def test_http_step_request():
assert response.status_code == 200



EXAMPLE_DIGEST_AUTH = (
'Digest username="Zaphod", realm="[email protected]", nonce="42answer", uri="/dir/restaurant.html", '
'response="dontpanic42", opaque="babelfish"'
)

@pytest.mark.parametrize(
"params",
"params, expected",
[
dict(url=GET_ENDPOINT, headers={"Authorization": "Bearer token", "Content-Type": "application/json"}),
dict(url=GET_ENDPOINT, headers={"Content-Type": "application/json"}, bearer_token="token"),
dict(url=GET_ENDPOINT, headers={"Content-Type": "application/json"}, token="token"),
pytest.param(
dict(url=GET_ENDPOINT, headers={"Authorization": "Bearer token", "Content-Type": "application/json"}),
"Bearer token",
id="bearer_token_in_headers"
),
pytest.param(
dict(url=GET_ENDPOINT, headers={"Content-Type": "application/json"}, auth_header="Bearer token"),
"Bearer token",
id="bearer_token_through_auth_header",
),
pytest.param(
dict(url=GET_ENDPOINT, auth_header=EXAMPLE_DIGEST_AUTH),
EXAMPLE_DIGEST_AUTH,
id="digest_auth_with_nonce",
),
],
)
def test_get_headers(params):
def test_get_headers(params: dict, expected: str, caplog: pytest.LogCaptureFixture) -> None:
"""
Authorization headers are being converted into SecretStr under the hood to avoid dumping any
sensitive content into logs. However, when calling the `get_headers` method, the SecretStr is being
converted back to string, otherwise sensitive info would have looked like '**********'.
"""
# Arrange and Act
step = HttpStep(**params)
with requests_mock.Mocker() as rm:
rm.get(params["url"], status_code=int(200)) # Mock the request to be always successful
step = HttpStep(**params)
caplog.set_level("DEBUG", logger=step.log.name)
auth = step.headers.get("Authorization")
step.execute()

# Check that the token doesn't accidentally leak in the logs
assert len(caplog.records) > 1, "No logs were generated"
for record in caplog.records:
assert expected not in record.message

# Ensure that the Authorization header is properly parsed to a SecretStr
assert auth is not None, "Authorization header is missing"
assert isinstance(auth, SecretStr)
assert str(auth) == "**********"
assert auth.get_secret_value() == expected

# Ensure that the Content-Type header is properly parsed while not being a SecretStr
assert step.headers["Content-Type"] == "application/json"


# Assert
actual_headers = step.get_headers()
assert actual_headers["Authorization"] != "**********"
assert actual_headers["Authorization"] == "Bearer token"
assert actual_headers["Content-Type"] == "application/json"


@pytest.mark.parametrize(
Expand All @@ -171,7 +208,7 @@ def test_get_headers(params):
pytest.param(17, STATUS_404_ENDPOINT, 503, 1, HTTPError, id="max_retries_17_404"),
],
)
def test_max_retries(max_retries, endpoint, status_code, expected_count, error_type):
def test_max_retries(max_retries: int, endpoint: str, status_code: int, expected_count: int, error_type: Exception) -> None:
session = requests.Session()
retry_logic = Retry(total=max_retries, status_forcelist=[status_code])
session.mount("https://", HTTPAdapter(max_retries=retry_logic))
Expand All @@ -180,7 +217,7 @@ def test_max_retries(max_retries, endpoint, status_code, expected_count, error_t

step = HttpGetStep(url=endpoint, session=session)

with pytest.raises(error_type):
with pytest.raises(error_type): # type: ignore
step.execute()

first_pool = [v for _, v in session.adapters["https://"].poolmanager.pools._container.items()][0]
Expand All @@ -197,7 +234,7 @@ def test_max_retries(max_retries, endpoint, status_code, expected_count, error_t
pytest.param(1, [0, 2, 4], id="backoff_1"),
],
)
def test_initial_delay_and_backoff(monkeypatch, backoff, expected):
def test_initial_delay_and_backoff(monkeypatch: pytest.FixtureRequest, backoff: int, expected: list) -> None:
session = requests.Session()
retry_logic = Retry(total=3, backoff_factor=backoff, status_forcelist=[503])
session.mount("https://", HTTPAdapter(max_retries=retry_logic))
Expand Down

0 comments on commit 744fab6

Please sign in to comment.