Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add decoder for calling Anthropic models via Amazon Bedrock #151

Merged
merged 5 commits into from
Oct 29, 2023

Conversation

billcai
Copy link
Contributor

@billcai billcai commented Oct 25, 2023

Adding an additional decoder for Amazon Bedrock-based Anthropic models for an additional option to interact with Anthropic Claude models.

The decoder uses AWS Python SDK (boto3) to call Bedrock service. The current code uses boto3's default way of obtaining credentials, see this link for more details.

Added tests that passed with valid IAM credentials:

python3.10 -m pytest -v --slow tests/integration_tests/test_decoders_integration.py
platform darwin -- Python 3.10.13, pytest-7.4.3, pluggy-1.3.0 -- /opt/homebrew/opt/[email protected]/bin/python3.10
cachedir: .pytest_cache
rootdir: ****
configfile: pytest.ini
plugins: skip-slow-0.0.5, anyio-3.7.1
collected 1 item                                                                                                                                                                                                                                                                                  

tests/integration_tests/test_decoders_integration.py::test_bedrock_anthropic_completions_integration PASSED 

@YannDubs
Copy link
Collaborator

LGTM @billcai although I haven't tested it
It would be great to have at least one model using this decoding code for people to know how to use it! Did you try evaluating any bedrock_claude_2? If so can you also push the results?
Thanks !

@billcai
Copy link
Contributor Author

billcai commented Oct 28, 2023

hi @YannDubs added bedrock claude outputs and eval by bedrock claude on bedrock claude outputs (compared against the default baseline). Generated it by using default commands:

alpaca_eval evaluate_from_model --model_configs bedrock_claude --annotators_config bedrock_claude

Results look similar to Claude on Claude:

                win_rate  standard_error  n_total  avg_length
bedrock_claude     76.83            1.48      805        1278

Will not add a new leaderboard for now, as this is Bedrock Claude as evaluator.

@YannDubs YannDubs merged commit 6e6d11f into tatsu-lab:main Oct 29, 2023
2 checks passed
@YannDubs
Copy link
Collaborator

Great work, thanks @billcai !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants