This ARM template deploys multiple HDInsight clusters (Spark
+ Kafka
) in the same Virtual Network. Spark's storage is primarily backed by Azure Data Lake Store while Kafka uses Blob Storage
.
Since ADLS
on HDInsight
requires Service Principal with certificate, we've created a Bash
script to automate entire deployment. Script creates a self-signed certificate and converts it to PKCS12
format.
- For simplicity we've kept as many resource names as
$CLUSTER_NAME
as possible. VNet
address space, VM Sizes and number of Head/Worker/Zookeeper
nodes are hardcoded inside the template.
./deploy.sh <CLUSTER_NAME>
Provide password when prompted. It will be used for accessing all dashboards and SSH
.
It takes ~20 minutes to deploy all resources.
- It's not possible to create
Service Principal
inside anARM
template, since it resides outsideresource groups
. - As of now
ADLS
is only available in these regions. Kafka
doesn't supportADLS
as primary storage.HDInsight
doesn't allow direct connection toKafka
over public internet.- Once an
HDInsight
cluster is provisioned, only number of worker nodes can be scaled, not the size of VMs. - Existing
HDInsight
cluster cannot join a newVNet
.