Skip to content

Secure Databricks cluster with Data exfiltration Protection and Privatelink for Storage, KeyVault and EventHub using Bicep

Notifications You must be signed in to change notification settings

jamesleeht/databricks-all-in-one-bicep-template

 
 

Repository files navigation


Secure Databricks cluster with Data exfiltration Protection and Privatelink for Storage, KeyVault and EventHub using Bicep.

Architecture and Key FeaturesTo DoHow To UseCreditsSupportReferenceLicense

Deploy to Azure

Visualize

Why Bicep?

Bicep is free and supported by Microsoft support and is fun, easy, and productive way to build and deploy complex infrastructure on Azure. If you are currently using ARM you will love Bicep simple syntax. Bicep also support declaring existing resources. More resources available at this Link

Architecture and Key Features

Architecture

  • Based on best practices from Azure Databricks Best Practices and template from Anti-Data-Exfiltration Reference architecture
  • Hub and Spoke VNETs.Link
  • Databricks cluster created in spoke VNET. Link
  • Firewall with UDR to allow only required Databricks endpoints. Link
  • Storage account with Private endpoint. Link
  • Azure Key Vault with Private endpoint. Link
  • Create Databricks backed secret scope.
  • Azure Event Hub with Private endpoint. Link
  • Create cluster with cluster logging and init script for monitoring.Link
  • Sample Databricks notebooks into workspace.
  • Secured Windows Virtual machine with RDP (Protect data from export).[Link]
  • Configure Log analytics workspace and collect metrics from spark worker node
    • Configure Diagnostic logging.Link
    • Configure sending logs to Azure Monitor using mspnp/spark-monitoring
    • Configure overwatch for fine grained monitoring. Link
  • Create Azure ML workspace for Model registry and assist in deploying model to AKS
  • Create AKS compute for AML for real time model inference/scoring

To Do

  • Create Databricks secret scope backed by Azure Key Vault. Link
  • Create Azure SQL with Private link. Link
  • Create an integrated ADF pipeline
  • Integrate into Azure DevOps
  • Create Databricks performance dashboards
  • Create and configure External metastore
  • Configure Databricks access to specific IP only
  • More sample Databricks notebooks
  • Add description to all parameters

Prerequisites

  • Managed Identity needs to be enabled as a resource provider inside Azure
  • For the bash script, jq must be installed.

Client password

  • Client PC password complexity requirements: The supplied password must be between 8-123 characters long and must satisfy at least 3 of password complexity requirements from the following:
    • Contains an uppercase character
    • Contains a lowercase character
    • Contains a numeric digit
    • Contains a special character
    • Control characters are not allowed

How To Use

To clone and run this repo, you'll need Git, Bicep and azure-cli installed on your computer. Strongly recommend to use vs code to edit the file with bicep extension installed (instructions) for intellisense and other completions. From your command line:

Option 1:

Deploy to Azure

Click on the above link to deploy the template.

Option 2

If you need to customize the template you can use the following command:

# Clone this repository
$ git clone https://github.com/lordlinus/databricks-all-in-one-bicep-template.git

# Go into the repository
$ cd databricks-all-in-one-bicep-template

# Update main.bicep file with variables as required. Default is for southeastasia region.
# Refer to Azure Databricks UDR section under References for region specific parameters.
$ code main.bicep

# Run the build shell script to create the resources
$ ./build.sh

Note: Build script assume Linux environment, If you're using Windows, see this guide on running Linux

Credits

This template is based on ARM templates from the below repo:

Support

This repo code is provided as-is and if you need help/support on bicep reach out to Azure support team (Bicep is supported by Microsoft support and 100% free to use.)

Reference

License

MIT


GitHub @lordlinus  ·  Twitter @lordlinus  ·  Linkedin Sunil Sattiraju

About

Secure Databricks cluster with Data exfiltration Protection and Privatelink for Storage, KeyVault and EventHub using Bicep

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Bicep 72.4%
  • Shell 13.8%
  • Jupyter Notebook 13.8%