Skip to content

Commit

Permalink
Minor fixes to the documentation (#104)
Browse files Browse the repository at this point in the history
  • Loading branch information
shchur authored Jan 17, 2024
1 parent 8f2dc63 commit 2a8f757
Show file tree
Hide file tree
Showing 2 changed files with 78 additions and 25 deletions.
16 changes: 8 additions & 8 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,11 +54,11 @@ test_data = pd.read_csv("https://autogluon.s3.amazonaws.com/datasets/Inc/test.cs
test_data.drop(columns=["class"], inplace=True)
predictor_init_args = {
"label": "class"
} # init args you would pass to AG TabularPredictor
} # args used when creating TabularPredictor()
predictor_fit_args = {
"train_data": train_data,
"time_limit": 120
} # fit args you would pass to AG TabularPredictor
} # args passed to TabularPredictor.fit()
cloud_predictor = TabularCloudPredictor(cloud_output_path="YOUR_S3_BUCKET_PATH")
cloud_predictor.fit(
predictor_init_args=predictor_init_args, predictor_fit_args=predictor_fit_args
Expand All @@ -85,10 +85,10 @@ test_data = pd.read_parquet("https://autogluon-text.s3-accelerate.amazonaws.com/
test_data.drop(columns=["label"], inplace=True)
predictor_init_args = {
"label": "label"
} # init args you would pass to AG MultiModalPredictor
} # args used when creating MultiModalPredictor()
predictor_fit_args = {
"train_data": train_data
} # fit args you would pass to AG MultiModalPredictor
} # args passed to MultiModalPredictor.fit()
cloud_predictor = MultiModalCloudPredictor(cloud_output_path="YOUR_S3_BUCKET_PATH")
cloud_predictor.fit(
predictor_init_args=predictor_init_args, predictor_fit_args=predictor_fit_args
Expand Down Expand Up @@ -117,11 +117,11 @@ target="target"

predictor_init_args = {
"target": target
} # init args you would pass to AG TimeSeriesCloudPredictor
} # args used when creating TimeSeriesPredictor()
predictor_fit_args = {
"train_data": data,
"time_limit": 120
} # fit args you would pass to AG TimeSeriesCloudPredictor
} # args passed to TimeSeriesPredictor.fit()
cloud_predictor = TimeSeriesCloudPredictor(cloud_output_path="YOUR_S3_BUCKET_PATH")
cloud_predictor.fit(
predictor_init_args=predictor_init_args,
Expand Down Expand Up @@ -158,7 +158,7 @@ result = cloud_predictor.predict(
pip install -U pip
pip install -U setuptools wheel
pip install --pre autogluon.cloud # You don't need to install autogluon itself locally
pip install --upgrade sagemaker # This is required to ensure the information about newly released containers is available.
pip install -U sagemaker # This is required to ensure the information about newly released containers is available.
```

```{toctree}
Expand Down Expand Up @@ -191,4 +191,4 @@ hidden:
TabularCloudPredictor <api/autogluon.cloud.TabularCloudPredictor>
MultiModalCloudPredictor <api/autogluon.cloud.MultiModalCloudPredictor>
TimeSeriesCloudPredictor <api/autogluon.cloud.TimeSeriesCloudPredictor>
```
```
87 changes: 70 additions & 17 deletions docs/tutorials/autogluon-cloud.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,10 @@ The containers can be used to train models with CPU and GPU instances and deploy

We offer the [autogluon.cloud](https://github.com/autogluon/autogluon-cloud) module to utilize those containers and [AWS SageMaker](https://aws.amazon.com/sagemaker/) underneath to train/deploy AutoGluon backed models with simple APIs.

**Costs for running cloud compute are managed by AWS SageMaker, and storage costs are managed by AWS S3. AutoGluon-Cloud is a wrapper to these services at no additional charge. While AutoGluon-Cloud makes an effort to simplify the usage of these services, it is ultimately the user's responsibility to monitor compute usage within their account to ensure no unexpected charges.**
```{attention}
Costs for running cloud compute are managed by AWS SageMaker, and storage costs are managed by AWS S3. AutoGluon-Cloud is a wrapper to these services at no additional charge. While AutoGluon-Cloud makes an effort to simplify the usage of these services, it is ultimately the user's responsibility to monitor compute usage within their account to avoid unexpected charges.
```


## Installation
`autogluon.cloud` does not come with the default `autogluon` installation. You can install it via:
Expand All @@ -17,34 +20,84 @@ pip3 install autogluon.cloud
Also ensure that the latest version of sagemaker python API is installed via:

```bash
pip3 install --upgrade sagemaker
pip3 install -U sagemaker
```

This is required to ensure the information about newly released containers is available.

## Prepare an AWS Role with Necessary Permissions
`autogluon.cloud` utilizes various AWS resources to operate.
To help you to setup the necessary permissions, you can generate trust relationship and iam policy with our utils through
To help you to setup the necessary permissions, you can generate trust relationship and IAM policy with our utils through

```python
from autogluon.cloud import TabularCloudPredictor # Can be other CloudPredictor as well

TabularCloudPredictor.generate_default_permission(
backend="BACKNED_YOU_WANT" # We currently support sagemaker and ray_aws
backend="BACKNED_YOU_WANT", # We currently support "sagemaker" and "ray_aws"
account_id="YOUR_ACCOUNT_ID", # The AWS account ID you plan to use for CloudPredictor.
cloud_output_bucket="S3_BUCKET" # S3 bucket name where intermediate artifacts will be uploaded and trained models should be saved. You need to create this bucket beforehand.
)
```

The util function above would give you two json files describing the trust replationship and the iam policy.
**Make sure you review those files and make necessary changes according to your use case before applying them.**
```{note}
Make sure you review the trust relationship and IAM policy files, and make necessary changes according to your use case before applying them.
```

We recommend you to create an IAM Role for your IAM User to delegate as IAM Role doesn't have permanent long-term credentials and is used to directly interact with AWS services. Here is how this can be done using the AWS CLI.

```{note}
Make sure to replace `AUTOGLUON-ROLE-NAME` with your desired role name, `AUTOGLUON-POLICY-NAME` with your desired policy name, and `222222222222` with your AWS account number.
```

1. Create the IAM role.
```bash
aws iam create-role --role-name AUTOGLUON-ROLE-NAME --assume-role-policy-document file://ag_cloud_sagemaker_trust_relationship.json
```
This method will return the **role ARN** that looks similar to `arn:aws:iam::222222222222:role/AUTOGLUON-ROLE-NAME`. Keep it for further reference.

2. Create the IAM policy.
```bash
aws iam create-policy --policy-name AUTOGLUON-POLICY-NAME --policy-document file://ag_cloud_sagemaker_iam_policy.json
```
This method will return the **policy ARN** that looks similar to `arn:aws:iam::222222222222:policy/AUTOGLUON-POLICY-NAME`. Keep it for further reference.

3. Attach the IAM policy to the role.
```bash
aws iam attach-role-policy --role-name AUTOGLUON-ROLE-NAME --policy-arn "arn:aws:iam::222222222222:policy/AUTOGLUON-POLICY-NAME"
```

4. Assume the IAM role using AWS CLI or boto3.

<details><summary>AWS CLI</summary>

We recommend you to create an IAM Role for your IAM User to delegate as IAM Role doesn't have permanent long-term credentials and is used to directly interact with AWS services.
Refer to this [tutorial](https://aws.amazon.com/premiumsupport/knowledge-center/iam-assume-role-cli/) to
See section "Assume the IAM role" in this [tutorial](https://repost.aws/knowledge-center/iam-assume-role-cli).

</details>

<details><summary>Python/boto3</summary>

```python
import boto3
session = boto3.Session()
response = session.client("sts").assume_role(
RoleArn="arn:aws:iam::222222222222:role/AUTOGLUON-ROLE-NAME",
RoleSessionName="AutoGluonCloudSession",
)
credentials = response['Credentials']
boto3.setup_default_session(
aws_access_key_id=credentials['AccessKeyId'],
aws_secret_access_key=credentials['SecretAccessKey'],
aws_session_token=credentials['SessionToken'],
)
```
Now when you use `autogluon.cloud` in the same Python script / Jupyter notebook, the correct IAM role will be used.

</details>



For more details on setting up IAM roles and policies, refer to this [tutorial](https://aws.amazon.com/premiumsupport/knowledge-center/iam-assume-role-cli/).

1. create the IAM Role with the trust relationship and iam policy you generated above
2. setup the credential
3. assume the role

## Training
Using `autogluon.cloud` to train AutoGluon backed models is simple and not too much different from training an AutoGluon predictor directly.
Expand All @@ -61,8 +114,8 @@ cloud_predictor = TabularCloudPredictor(
).fit(
predictor_init_args=predictor_init_args,
predictor_fit_args=predictor_fit_args,
instance_type="ml.m5.2xlarge" # Checkout supported instance and pricing here: https://aws.amazon.com/sagemaker/pricing/
wait=True # Set this to False to make it an unblocking call and immediately return
instance_type="ml.m5.2xlarge", # Checkout supported instance and pricing here: https://aws.amazon.com/sagemaker/pricing/
wait=True, # Set this to False to make it an unblocking call and immediately return
)
```

Expand All @@ -86,7 +139,7 @@ If you want to deploy a predictor as a SageMaker endpoint, which can be used to
```python
cloud_predictor.deploy(
instance_type="ml.m5.2xlarge", # Checkout supported instance and pricing here: https://aws.amazon.com/sagemaker/pricing/
wait=True # Set this to False to make it an unblocking call and immediately return
wait=True, # Set this to False to make it an unblocking call and immediately return
)
```
Expand Down Expand Up @@ -182,7 +235,7 @@ result = cloud_predictor.predict(
# If False, returns nothing. You will have to download results separately via cloud_predictor.download_predict_results
download=True,
persist=True, # If True and download=True, the results file will also be saved to local disk.
save_path=None # Path to save the downloaded results. If None, CloudPredictor will create one with the batch inference job name.
save_path=None, # Path to save the downloaded results. If None, CloudPredictor will create one with the batch inference job name.
)
```
Expand All @@ -200,14 +253,14 @@ To perform batch inference and getting prediction probability:
```python
result = cloud_predictor.predict_proba(
'test.csv', # can be a DataFrame as well and the results will be stored in s3 bucket
include_predict=True # Will return a tuple (prediction, prediction probability). Set this to False to get prediction probability only.
include_predict=True, # Will return a tuple (prediction, prediction probability). Set this to False to get prediction probability only.
instance_type="ml.m5.2xlarge", # Checkout supported instance and pricing here: https://aws.amazon.com/sagemaker/pricing/
wait=True, # Set this to False to make it an unblocking call and immediately return
# If True, returns a Pandas Series object of predictions.
# If False, returns nothing. You will have to download results separately via cloud_predictor.download_predict_results
download=True,
persist=True, # If True and download=True, the results file will also be saved to local disk.
save_path=None # Path to save the downloaded results. If None, CloudPredictor will create one with the batch inference job name.
save_path=None, # Path to save the downloaded results. If None, CloudPredictor will create one with the batch inference job name.
)
```
Expand Down

0 comments on commit 2a8f757

Please sign in to comment.