The data pipeline to extract experience analytics data from Sitecore's Experience Database (xDB) and push it into Amazon Redshift in near real time. Amazon Kinesis Data Firehose delivery stream serves as the backbone for the pipeline. Amazon S3 is used as the staging area for the data before it's copied into a Redshift table. The pipeline logic includes checkpointing functionality that records the last processed datapoint (based on a timestamp) and Amazon DynamoDB used as the storage for the checkpointing.
This instructions are designed to get you up and running on your local dev machine. CloudFormation is used to setup all the AWS resources (see AWS CloudFormation Getting Started for more info).
- You can leverage an existing Redshift dev cluster or create new single-node cluster using RedshiftClusterStack.yaml CloudFormation template.
- Connect to your Redshift dev database and create destination table using this redshift_schema.sql script.
- Deploy SitecoreStreamingRedshiftDestinationStack.yaml CloudFormation template which will create Kinesis Firehose delivery stream, S3 staging bucket, DynamoDB table and required IAM resources. You do need to provide following parameters when creating the CloudFormation stack:
- Kinesis delivery stream name
- DynamoDB checkpoint table name
- S3 staging bucket name
- Redshift database connection string
- Redshift username
- Redshift password
- Redshift destination table name
- Upload redshift-jsonpaths.json to the S3 staging bucket.
- Override settings.xml with your xConnect and AWS settings (see my blog post on how to configure xConnect client)
xConnect required admin permissions, so you must run the solution as an administrator.