-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add proxy, token refresh, and multi_endpoint features to generators #878
base: main
Are you sure you want to change the base?
Conversation
au70ma70n
commented
Sep 4, 2024
- add proxy support to rest generator
- add token refresh support to rest generator
- add multi_endpoint_rest to generators
- add proxy support to multi_endpoint_rest generator
- add token refresh support to multi_endpoint_rest generator
- add token refresh support to rest generator - add multi_endpoint_rest to generators - add proxy support to multi_endpoint_rest generator - add token refresh support to multi_endpoint_rest generator
DCO Assistant Lite bot All contributors have signed the DCO ✍️ ✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the progress, this PR tries to accomplish a number of goals at once.
A number of items to consider refactoring:
- The new class seems to fit in the existing
rest
module, therest.MultiEndpointGenerator
class can be specified on the command line while leaving theRestGenerator
as default. - Configuration specific to a single plugin should not be added as top level options.
- Since the current implementation of
proxy
support isrest
specific the config can be consolidated as part of thegenerator_options_file
. The same applies toverify_ssl
. - Configuration of token refresh as a separate json config file creates complexity for a feature that is not
globally
applicable.
- Since the current implementation of
- Please add an example of how all the new functionality configuration should be formatted, this could be something like an added
json
segment in the class description for when aproxy
ortoken_refresh
is to be utilized. - Since the parameter names for each
stage
can be identified byprefix
using the existingrest.Generator
as a web request client can greatly reduce the code required. There is an offered example for possibly nesting the config for each endpoint to avoid the need forprefix
values on the keys that might defer more of the code required. - As implemented
verify_ssl
only applies to token refresh, if aproxies
value is passed ssl control likely either needs to beon/off
for all request or to have configurable values for each request made.
garak/cli.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the arguments here make sense to be top level cli
entires. If exposed at this level users may expect them to apply to all plugins
. Prefer options that can only be utilized by a specific plugin
object be configured for that object specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The arguments were added as top level options to provide the opportunity to integrate the functionality into other parts of the project, however if you'd rather they be moved into the class configuration file that's simple enough.
|
||
# Load the token refresh configuration | ||
if hasattr(config_root.transient.cli_args, "token_refresh_config"): | ||
self.token_refresh_path = config_root.transient.cli_args.token_refresh_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__init__
should not access global _config
directly this ties in with my comment about the cil
options not being top level items. The concept
of token refresh is rest
generator specific and the level of complexity here in requiring a separate configuration is not desired.
Consider all configuration for a specific plugin class should be provided via one configuration object applied via the Configurable
pattern, validation of the values set on the object post load is valid.
# Load the token refresh configuration | ||
if hasattr(config_root.transient.cli_args, "token_refresh_config"): | ||
self.token_refresh_path = config_root.transient.cli_args.token_refresh_config | ||
with open(self.token_refresh_path, "r") as f: | ||
self.token_refresh_config = json.load(f) | ||
if not isinstance(self.token_refresh_config["method"], str): | ||
raise ValueError("token_refresh_config set but does not contain method") | ||
if not isinstance(self.token_refresh_config["required_secrets"], list): | ||
raise ValueError("token_refresh_config set but does not contain required_secrets list") | ||
|
||
if len(self.token_refresh_config["required_secrets"]) == 0: | ||
raise ValueError("token_refresh_config required_secrets list is empty") | ||
|
||
self.token_refresh_http_function = getattr(requests, self.token_refresh_config["method"].lower()) | ||
|
||
secrets = {} | ||
for secret in self.token_refresh_config["required_secrets"]: | ||
if secret in os.environ: | ||
secrets[secret] = os.environ[secret] | ||
else: | ||
raise ValueError(f"token_refresh_config required secret: {secret} not found in environment") | ||
self.token_refresh_config["secrets"] = secrets | ||
|
||
|
||
if (hasattr(self, "req_template_json_object") and self.req_template_json_object is not None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue here, the __init__
should not need to load the file, and should not access transient
. Configuration for the class should be fully contained in the Configurable
pattern from the initial config file.
response_fields = [] | ||
for required_return in self.first_stage_required_returns: | ||
field_path_expr = jsonpath_ng.parse(required_return["json_field"]) | ||
tmp_output_var = field_path_expr.find(first_stage_response_object) | ||
if len(tmp_output_var) == 1: | ||
tmp = { | ||
"name": required_return["name"], | ||
"value": tmp_output_var[0].value | ||
} | ||
response_fields.append(tmp) | ||
else: | ||
logging.error( | ||
"RestGenerator JSONPath in first_stage_required_returns yielded nothing. Response content: %s" | ||
% repr(first_stage_response_object) | ||
) | ||
|
||
# Populate second stage request body with the first stage response fields | ||
self._populate_second_stage(response_fields) | ||
second_stage_request_data = self._populate_template(self.second_stage_req_template, prompt) | ||
# Populate second stage request headers | ||
second_stage_request_headers = dict(self.second_stage_headers) | ||
# Populate placeholders in the second stage headers | ||
for k, v in self.second_stage_headers.items(): | ||
second_stage_request_headers[k] = self._populate_template( | ||
v, prompt) | ||
|
||
second_stage_req_kArgs = { | ||
second_stage_data_kw: second_stage_request_data, | ||
"headers": second_stage_request_headers, | ||
"timeout": self.request_timeout, | ||
"proxies": self.proxies, | ||
"verify": self.verify_ssl, | ||
} | ||
|
||
second_stage_resp = self.second_stage_http_function(self.second_stage_uri, **second_stage_req_kArgs) | ||
|
||
|
||
if second_stage_resp.status_code in self.ratelimit_codes: | ||
raise RateLimitHit( | ||
f"Rate limited: {second_stage_resp.status_code} - {second_stage_resp.reason}") | ||
|
||
elif str(second_stage_resp.status_code)[0] == "3": | ||
raise NotImplementedError( | ||
f"REST URI redirection: {second_stage_resp.status_code} - {second_stage_resp.reason}" | ||
) | ||
|
||
elif str(second_stage_resp.status_code)[0] == "4": | ||
# Token is expired, refresh it | ||
if first_stage_resp.status_code == 401: | ||
self.need_token_refresh = True | ||
raise RateLimitHit( | ||
f"Rate limited: {first_stage_resp.status_code} - {first_stage_resp.reason}") | ||
else: | ||
raise ConnectionError( | ||
f"REST URI client error: {first_stage_resp.status_code} - {first_stage_resp.reason}" | ||
) | ||
|
||
elif str(second_stage_resp.status_code)[0] == "5": | ||
error_msg = f"REST URI server error: {second_stage_resp.status_code} - {second_stage_resp.reason}" | ||
if self.retry_5xx: | ||
raise IOError(error_msg) | ||
else: | ||
raise ConnectionError(error_msg) | ||
|
||
second_stage_response_object = json.loads(second_stage_resp.content) | ||
|
||
|
||
# if response_json_field starts with a $, treat is as a JSONPath | ||
assert (self.second_stage_response_json), "second_stage_response_json must be True at this point; if False, we should have returned already" | ||
assert isinstance(self.second_stage_response_json_field,str), "second_stage_response_json_field must be a string" | ||
assert (len(self.second_stage_response_json_field) >0), "second_stage_response_json_field needs to be complete if second_stage_response_json is true; ValueError should have been raised in constructor" | ||
if self.second_stage_response_json_field[0] != "$": | ||
second_stage_json_extraction_result = [ | ||
second_stage_response_object[self.second_stage_response_json_field]] | ||
else: | ||
field_path_expr = jsonpath_ng.parse(self.second_stage_response_json_field) | ||
second_stage_json_extraction_results = field_path_expr.find(second_stage_response_object) | ||
if len(second_stage_json_extraction_results) == 1: | ||
response_value = second_stage_json_extraction_results[0].value | ||
if isinstance(response_value, str): | ||
second_stage_json_extraction_result = [response_value] | ||
elif isinstance(response_value, list): | ||
second_stage_json_extraction_result = response_value | ||
elif len(second_stage_json_extraction_results) > 1: | ||
second_stage_json_extraction_result = [ | ||
r.value for r in second_stage_json_extraction_results] | ||
else: | ||
logging.error( | ||
"MultiEndpointGenerator JSONPath in response_json_field yielded nothing. Response content: %s" | ||
% repr(second_stage_response_object) | ||
) | ||
return [None] | ||
|
||
return second_stage_json_extraction_result | ||
|
||
################################################################################ | ||
|
||
job_id = self.prompt_sender._call_model(prompt, generations_this_call) | ||
self.response_retriever.uri = f"{self.get_uri}/{job_id}" | ||
return self.response_retriever._call_model(job_id, generations_this_call) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of this code could be deferred to the existing RestGenerator
object, if it cannot reuse RestGenerator
at a minimum it should be refactored into methods reflecting the common error handling patterns here.
secrets = {} | ||
for secret in self.token_refresh_config["required_secrets"]: | ||
if secret in os.environ: | ||
secrets[secret] = os.environ[secret] | ||
else: | ||
raise ValueError(f"token_refresh_config required secret: {secret} not found in environment") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Values extracted from os.environ
are handled in _validate_env_var()
, this consolidates mapping for values from environment variables in a common location and allows the configurable process to have already set values or defer them to os.environ
, this is another reason a plugin should have a single consolidated configuration.
"verify": self.verify_ssl, | ||
} | ||
|
||
# TODO: add error handling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider extracting token refresh as a method call that can then be tagged for backoff
and provide consolidated error handling that can offer a clear indicator that authentication refresh
caused the failure.
This might even make sense to be a class
that is mixed in to RestGenerator
to provide periodic authentication capabilities.
"multi_endpoint_rest": { | ||
"MultiEndpointGenerator": { | ||
"name": "example service", | ||
"post_uri": "https://example.ai/llm", | ||
"post_headers": { | ||
"X-Authorization": "$KEY" | ||
}, | ||
"post_req_template_json_object": { | ||
"text": "$INPUT" | ||
}, | ||
"post_response_json": true, | ||
"post_response_json_field": "job_id", | ||
"get_uri": "https://example.ai/llm", | ||
"get_headers": { | ||
"X-Authorization": "$KEY" | ||
}, | ||
"get_req_template_json_object": { | ||
"text": "$INPUT" | ||
}, | ||
"post_response_json": true, | ||
"post_response_json_field": "text" | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example config does not match the class expectations.
{
"multi_endpoint_rest": {
"MultiEndpointGenerator": {
"name": "example service",
"first_stage_uri": "https://example.ai/llm",
"first_stage_headers": {
"X-Authorization": "$KEY"
},
"first_stage_req_template_json_object": {
"text": "$INPUT"
},
"first_stage_response_json": true,
"first_stage_response_json_field": "job_id",
"second_stage_uri": "https://example.ai/llm",
"second_stage_headers": {
"X-Authorization": "$KEY"
},
"second_stage_req_template_json_object": {
"text": "$INPUT"
},
"second_stage_response_json": true,
"second_stage_response_json_field": "text"
}
}
}
first_stage_
& second_stage_
prefix are very generic would there be value in a more specific network concepts focused prefix like send_
& recv_
?
{
"multi_endpoint_rest": {
"MultiEndpointGenerator": {
"name": "example service",
"send_uri": "https://example.ai/llm",
"send_headers": {
"X-Authorization": "$KEY"
},
Another option might be to organize the config in line with reuse of the existing rest generator.
{
"multi_endpoint_rest": {
"MultiEndpointGenerator": {
"first_stage": {
"name": "request service",
"uri": "https://example.ai/llm",
"method": "post",
"headers": {
"X-Authorization": "$KEY",
},
"req_template_json_object": {
"text": "$INPUT"
},
"response_json": true,
"response_json_field": "text"
},
"second_stage": {
"name": "response service",
"uri": "https://example.ai/llm",
"method": "post",
"headers": {
"X-Authorization": "$KEY",
},
"req_template_json_object": {
"text": "$INPUT"
},
"response_json": true,
"response_json_field": "text"
}
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first and second stage variable names were intentionally generic for flexibility, as it can perform any two calls to any two systems. A post and delete for example, or two puts (one to make a change and the other to revert it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this for? What's the use-case? Where can it be tested? Why should we integrate this and maintain it? Where are the docs (should be in .rst format under docs/
)?
-- Not saying we shouldn't include this, but without understanding what's gained by this, it's hard to triage the value of this PR. Can we get some clarity around what this brings to the project and why we should add it to the set of things we're maintaining?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this for?
This is for supporting LLM application flows that require multiple endpoint interactions.
What's the use case?
The use case is left intentionally generic to support as many use cases as possible, however some example use cases would be the following:
- An LLM application requires a POST request to generate a response, and a GET request to fetch it
- Testing against an LLM application without permissions to create new responses, and using two PUT requests, one to change an initial conversation, the other to change it back
- Testing against an LLM application in production, and using an additional DELETE request to clean up old conversation histories.
Where can it be tested?
This could be tested against anything that you are able to interact with via a rest API. I've tested with ChatGPT, however I'm sure there are many other suitable candidates.
Where are the docs?
Docs have yet to be added as I suspected there would be many refactoring changes, as @jmartin-tech has pointed out.
Can we get some clarity around what this brings to the project and why we should add it to the set of things we're maintaining?
In my opinion the main thing this PR brings to the project is flexibility. This functionality will be required for the testing of many internally developed LLM applications, which I suspect the project values. I know of several LLM applications across various companies that were developed internally and require multiple api interactions across various endpoints to use. This PR is primarily designed to address those applications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This functionality will be required for the testing of many internally developed LLM applications, which I suspect the project values. I know of several LLM applications across various companies that were developed internally and require multiple api interactions across various endpoints to use.
I think the primary question here is how and should the project support something that does not have a public example? Anything accepted into a release has to be maintained to some extent. If there is no public reference implementation, it can get very difficult to support consumers as reproduction of issues becomes a high effort activity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The project claims to be "nmap for LLMs." If that's the case, then yes the project should support features required by industry professionals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question posed is reasonable, how can the project establish the value of supporting this access pattern? Is there a reference example where this is actually encountered in a professional setting that the project can use for testing?
The answers can be I don't know
or Nothing that can be publicly shared
, simply having information about patterns seen in the wild can influence value.
If there is an available tool or application that follows this pattern for access to inference, this generator has more value and more reason to be maintained and supported.
Supporting the three use cases you provided may meet the criteria, the second and third especially sound interesting and examples of applications that have these patterns may even help this project and it's community further research and build guidance on using this type of access pattern.
Note the third use case is not yet supported. As implemented the second stage response needs to contain the result of generation, I can see it as viable that a DETELE request may not return the data deleted which leads me to suggest there may be another configuration option needed provide a mechanism to return a value from the first stage as the result
of the attempt to be evaluated by detectors, and obtain another value from the response to use as the identifier posted in the second stage to remove the history. I am not suggesting this would be required to land this PR, only that this factors into the equation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reference example where this is actually encountered in a professional setting that the project can use for testing?
The answer is yes, I developed this PR specifically for the deployment of garak in an internal, professional setting. Unfortunately the details of the application cannot be publicly shared. Also yes, the third use case is not currently supported. In the future I'd like to see this functionality become more modular, supporting n number of requests and containing configuration options for how to get and return the response data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, OK. Something we have on the roadmap is for custom plugins to go in user-local custom dirs instead of in the garak package directory. This might be a sensible route for developing this.
As maintainers, when we merge a PR, we agree to take on the technical debt for the concepts and features in that PR for the lifetime of the project. As humans, our time and bandwidth is finite, so each new external-origin feature means a permanent degradation in our ability to steer and progress the project, and a permanent reduction in the velocity we have on our roadmap. I hope you can appreciate, therefore, how seriously we take this sort of thing, especially when it's an agreement to support a target that is invisible to us (and all the other garak users).
For example, because we cannot test things we can't observe, architectural changes are prone to breaking functions like this, even despite our best efforts: we won't have "real" tests, so we'll be unable to perfectly detect when things break. Given that, what kind of SLA would you expect if an architectural change silently broke this plugin? What would be fair? What would be the incentive from either side to engage in that feature/project at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Architectural changes within garak or externally? I wouldn't expect external architectural changes to affect the garak project, however things like changing how responses are processed for example would. I think a reasonable SLA would be maintaining the data flow into and out of the generator, and I would expect the community to patch the generator to fit architectural changes to their target applications.
I have read the DCO Document and I hereby sign the DCO |
I find additional documentation to rarely be a bad thing, however if you disagree then we can remove it Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: au70ma70n <[email protected]>
I mean changes in garak architecture that cause alterations in how
high-level components operate, pretty day-to-day stuff in a software
project.
…On Fri, Sep 6, 2024, 19:05 au70ma70n ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
On garak/generators/multi_endpoint_rest.py
<#878 (comment)>:
Architectural changes within garak or externally? I wouldn't expect
external architectural changes to affect the garak project, however things
like changing how responses are processed for example would. I think a
reasonable SLA would be maintaining the data flow into and out of the
generator, and I would expect the community to patch the generator to fit
architectural changes to their target applications.
—
Reply to this email directly, view it on GitHub
<#878 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA5YTS2ADNMYUKBDYTVJDTZVHOEJAVCNFSM6AAAAABNTRM2FKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDEOBWG43TKMBWGU>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Circling back here, given the number of changes requested could this be separated to isolate the various changes? I can see the following possible independent PRs:
|