diff --git a/README.md b/README.md index 60c363aa..87e555d5 100644 --- a/README.md +++ b/README.md @@ -38,6 +38,7 @@ The solution consists of #1 a configuration environment (either Google Sheet or - **Dataflow** enabled - **Cloud storage** enabled - **Cloud scheduler** enabled + - **App Engine** enabled - At least one of: - **Google Ads** API Access - **Campaign Manager** API Access @@ -59,7 +60,7 @@ Those are the minimum roles necessary to deploy Megalista: - Service Consumer ### APIs -Required APIs will depend on upload endpoints in use. We recomend you to enable all of them: +Required APIs will depend on upload endpoints in use. - Google Sheets (required if using Sheets configuration) [[link]](https://console.cloud.google.com/apis/library/sheets.googleapis.com) - Google Analytics [[link]](https://console.cloud.google.com/apis/library/analytics.googleapis.com) - Google Analytics Reporting [[link]](https://console.cloud.google.com/apis/library/analyticsreporting.googleapis.com) @@ -67,138 +68,17 @@ Required APIs will depend on upload endpoints in use. We recomend you to enable - Campaign Manager [[link]](https://console.cloud.google.com/apis/library/dfareporting.googleapis.com) - Google Cloud Firestore [[link]](https://console.cloud.google.com/apis/library/firestore.googleapis.com) - -## Installation - -### Configure Megalista +## Configure Megalista Megalista can be configured via Google Sheets, a JSON file or a Google Cloud Firestore collection. Expected data schemas (Sources) and metadata (Destinations) for each use case defined in [the Megalista Wiki](https://github.com/google/megalista/wiki). -To configure using Google Sheets: - - Make a copy of the [Sheets template](https://docs.google.com/spreadsheets/d/1Mu0yj7RWw_cr3bevWCnjXyODlCbdmabWDLsBvPv2EVY?resourcekey=0-ZqbZ72FJVIa8X4YgelHUHA) - - In the "Intro" sheet provide Account IDs for Google Ads, Analytics, CM360, etc. - - Configure source/input data in the "Sources" sheet - - Configure destinations in the "Destinations" sheet - + The "Help" sheet has a chart to document the required metadata for each use case. - - Configure connections from Sources to Destinations in the "Connect" sheet - - Make note of the Sheet's ID in the URL: `docs.google.com/spreadsheets/d/`**EXAMPLEIDHERE**`/edit` - - To configure using JSON: - - Make a copy of the [JSON template](https://github.com/google/megalista/tree/main/cloud_config/configuration_sample.json) - - Provide Account IDs for Google Ads, Analytics, CM360, etc. - - Configure Sources and Destinations, separating entries with commas - - Connect Sources to Destinations in the "Connections" field - - Save and upload this JSON file to a Google Cloud Storage bucket (default bucket options are fine) - - Make note of the files's Authenticated URL in the Cloud Storage UI: `storage.cloud.google.com/`**mybucketname/myconfig.json** - - To configure using Firestore (recommended for use with web interface - WIP): - - Verify that the Firestore API is enabled on Google Cloud (see the APIs section) - - Create a [collection](https://firebase.google.com/docs/firestore/data-model) in Firestore, with the name of your choice - - Inside the collection, create a Firestore document named "account_config", and provide Account IDs for Google Ads, Analytics, CM360, etc. See schema below. - - Inside the collection, create separate Firestore documents for each Source/Destination pair. See schema below. - -#### Firestore document schemas - -**account_config document schema:** - -| Field name | Description | -|---|---| -| google_ads_id | Google Ads account ID (customer ID) used by default | -| mcc_trix | TRUE/FALSE. Indicates whether the Google Ads account is an MCC (parent account) or not | -| google_analytics_account_id | Google Analytics account ID | -| campaign_manager_account_id | Google Campaign Manager account ID | -| app_id | Android/iOS app ID for Google Ads or Appsflyer used by default | - -**Source+Destination document schema:** - -Universal parameters (mandatory for any upload type): -| Field name | Description | -|---|---| -| active | yes/no. Enables or disables scheduled uploads for the Source/Destination pair | -| bq_dataset | Name of the souce Big Query dataset | -| bq_table | Name of the source Big Query table | -| source | BIG_QUERY. Data source for the uploads. | -| source_name | Display-only name for the source, shown on logs. Example: "Top spending customers" | -| destination_name | Display-only name for the destination, shown on logs. Example: "Customer match" | - -Google Ads conversions -| Field name | Description | -|---|---| -| gads_conversion_name | Name of the Google Ads conversion registered on the platform | -| type | ADS_OFFLINE_CONVERSION | - -Google Ads Store Sales Direct -| Field name | Description | -|---|---| -| gads_conversion_name | Name of the Google Ads conversion registered on the platform | -| gads_external_upload_id | External upload ID | -| type | ADS_SSD_UPLOAD | - -Google Ads Customer Match - Contact -| Field name | Description | -|---|---| -| gads_audience_name | Name of the Google Ads audience registered on the platform | -| gads_operation | ADD/REMOVE. Indicates whether the user list should be added or removed from the audience | -| gads_hash | TRUE/FALSE. Enables hashing on data | -| gads_account | (Optional) Google Ads account ID (Customer ID). Overlaps the default account ID in the account_config document | -| type | ADS_CUSTOMER_MATCH_CONTACT_INFO_UPLOAD | - -Google Ads Customer Match - Mobile -| Field name | Description | -|---|---| -| gads_audience_name | Name of the Google Ads audience registered on the platform | -| gads_operation | ADD/REMOVE. Indicates whether the user list should be added or removed from the audience | -| gads_account | (Optional) Google Ads account ID (Customer ID). Overlaps the default account ID in the account_config document | -| gads_app_id | (Optional) Android/iOS app ID for Google Ads. Overlaps the default app ID in the account_config document | -| type | ADS_CUSTOMER_MATCH_MOBILE_DEVICE_ID_UPLOAD | - -Google Ads Customer Match - User ID -| Field name | Description | -|---|---| -| gads_audience_name | Name of the Google Ads audience registered on the platform | -| gads_operation | ADD/REMOVE. Indicates whether the user list should be added or removed from the audience | -| gads_account | (Optional) Google Ads account ID (Customer ID). Overlaps the default account ID in the account_config document | -| gads_hash | TRUE/FALSE. Enables hashing on data | -| type | ADS_CUSTOMER_MATCH_USER_ID_UPLOAD | - -Google Analytics - Measurement Protocol -| Field name | Description | -|---|---| -| google_analytics_property_id | Google Analytics property ID (UA) | -| google_analytics_non_interaction | 1/0. Indicates whether the event is a non-interaction hit | -| type | GA_MEASUREMENT_PROTOCOL | - -Google Analytics - Data Import -| Field name | Description | -|---|---| -| google_analytics_property_id | Google Analytics property ID (UA) | -| google_analytics_data_import_name | Name of the data import set in Google Analytics | -| type | GA_DATA_IMPORT | - -Google Analytics - User List -| Field name | Description | -|---|---| -| google_analytics_property_id | Google Analytics property ID (UA) | -| google_analytics_view_id | Google Analytics view ID in the property selected | -| google_analytics_data_import_name | Name of the data import set in Google Analytics | -| google_analytics_user_id_list_name | Name of the user ID list | -| google_analytics_user_id_custom_dim | User ID custom dimension | -| google_analytics_buyer_custom_dim | Buyer custom dimension | -| type | GA_USER_LIST_UPLOAD | - -Campaign Manager -| Field name | Description | -|---|---| -| campaign_manager_floodlight_activity_id | Floodlight activity ID | -| campaign_manager_floodlight_configuration_id | Floodlight configuration ID | -| type | CM_OFFLINE_CONVERSION | - -Appsflyer S2S events -| Field name | Description | -|---|---| -| appsflyer_app_id | | -| type | APPSFLYER_S2S_EVENTS | +Instructions for each configuration method method can be found in the Megalista wiki +- [Google Sheets] (https://github.com/google/megalista/wiki) +- [JSON] (https://github.com/google/megalista/wiki) +- [Firestore] (https://github.com/google/megalista/wiki) +## Deployment +These guide assumes it'll be followed inside Google Cloud Platform Console. ### Creating required access tokens To access campaigns and user lists on Google's platforms, this dataflow will need OAuth tokens for an account that can authenticate in those systems. @@ -206,75 +86,32 @@ To access campaigns and user lists on Google's platforms, this dataflow will nee In order to create it, follow these steps: - Access GCP console - Go to the **API & Services** section on the top-left menu. - - On the **OAuth Consent Screen** and configure an *Application name* + - On the **OAuth Consent Screen** and configure an *Internal Consent Screen* - Then, go to the **Credentials** and create an *OAuth client Id* with Application type set as *Desktop App* - - This will generate a *Client Id* and a *Client secret* + - This will generate a *Client Id* and a *Client secret*. Save these values as they are required during the deployment - Run the **generate_megalista_token.sh** script in this folder providing these two values and follow the instructions - Sample: `./generate_megalista_token.sh client_id client_secret` - This will generate the *Access Token* and the *Refresh token* - -### Creating a bucket on Cloud Storage -This bucket will hold the deployed code for this solution. To create it, navigate to the *Storage* link on the top-left menu on GCP and click on *Create bucket*. You can use Regional location and Standard data type for this bucket. - -## Running Megalista - -We recommend first running it locally and make sure that everything works. -Make some sample tables on BigQuery for one of the uploaders and make sure that the data is getting correctly to the destination. -After that is done, upload the Dataflow template to GCP and try running it manually via the UI to make sure it works. -Lastly, configure the Cloud Scheduler to run Megalista in the frequency desired and you'll have a fully functional data integration pipeline. - -### Running locally -Only set one configuration parameter (setup_sheet_id, setup_json_url or setup_firestore_collection) -```bash -python3 megalista_dataflow/main.py \ - --runner DirectRunner \ - --developer_token ${GOOGLE_ADS_DEVELOPER_TOKEN} \ - --setup_sheet_id ${CONFIGURATION_SHEET_ID} \ - --setup_json_url ${CONFIGURATION_JSON_URL} \ - --setup_firestore_collection ${CONFIGURATION_FIRESTORE_COLLECTION} - --refresh_token ${REFRESH_TOKEN} \ - --access_token ${ACCESS_TOKEN} \ - --client_id ${CLIENT_ID} \ - --client_secret ${CLIENT_SECRET} \ - --bq_ops_dataset %{BQ_OPS_DATASET} \ - --project ${GCP_PROJECT_ID} \ - --region us-central1 \ - --temp_location gs://{$GCS_BUCKET}/tmp -``` + - The user who opened the generated link and clicked on *Allow* must have access to the platforms that Megalista will integrate, including the configuration Sheet, if this is the chosen method for configuration. ### Deploying Pipeline To deploy the full Megalista pipeline, use the following command from the root folder: `./terraform_deploy.sh` +The script will required some parameters, between them: +- Auxliary bigquery dataset for Megalista operations to create + - This dataset will be used for storing operational data and will be created by Terraform +- Google Cloud Storage Bucket to create + - This Cloud Storage Bucket will be used to store Megalista compiled binary, metadata and temp files and will be created by Terraform. +- *Setup Firestore collection*, *URL for JSON configuration* and *Setup Sheet Id* + - Only one of these three should be filled and the other should be left black accordingly to the chosen configuration method. -#### Manually executing pipeline using Dataflow UI -To execute the pipeline, use the following steps: -- Go to **Dataflow** on GCP console -- Click on *Create job from template* -- On the template selection dropdown, select *Custom template* -- Find the *megalista* file on the bucket you've created, on the templates folder -- Fill in the parameters required and execute -### Scheduling pipeline -To schedule daily/hourly runs, go to **Cloud Scheduler**: -- Click on *create job* -- Add a name and frequency as desired -- For *target* set as HTTP -- Configure a *POST* for url: https://dataflow.googleapis.com/v1b3/projects/${YOUR_PROJECT_ID}/locations/${LOCATION}/templates:launch?gcsPath=gs://${BUCKET_NAME}/templates/megalista, replacing the params with the actual values -- For a sample on the *body* of the request, check **cloud_config/scheduler_sample.json** -- Add OAuth Headers -- Scope: https://www.googleapis.com/auth/cloud-platform -#### Creating a Service Account -It's recommended to create a new Service Account to be used with the Cloud Scheduler -- Go to IAM & Admin > Service Accounts -- Create a new Service Account with the following roles: - - Cloud Dataflow Service Agent - - Dataflow Admin - - Storage Objects Viewer +### Updating the Binary ## Usage -Every upload method expects as source a BigQuery data with specific fields, in addition to specific configuration metadata. For details on how to setup your upload routines, refer to the [Megalista Wiki](https://github.com/google/megalista/wiki) or the [Megalista user guide](https://github.com/google/megalista/blob/main/documentation/Megalista%20-%20Technical%20User%20Guide%20-%20EXTERNAL.pdf). +Every upload method expects as source a BigQuery data with specific fields, in addition to specific configuration metadata. For details on how to setup your upload routines, refer to the [Megalista Wiki](https://github.com/google/megalista/wiki). ## Note about Google Ads API access Calls to the Google Ads API will fail if the user that generated the OAuth2 credentials (Access Token and Refresh Token) doesn't have direct access to the Google Ads account which the calls are being directed to. It's not enough for the user to have access to a MCC above this account and being able to access the account through the interface, it's required that the user has permissions on the account itself.