Skip to content

Latest commit

 

History

History
220 lines (163 loc) · 9.34 KB

README.md

File metadata and controls

220 lines (163 loc) · 9.34 KB

Experimental Project header

GitHub forks GitHub stars GitHub watchers

GitHub all releases GitHub release (latest by date) GitHub last commit GitHub Release Date

GitHub issues GitHub issues closed GitHub pull requests GitHub pull requests closed

New Relic integration for Apache Spark

This New Relic standalone integration polls the Apache Spark REST API for metrics and pushes them into New Relic using Metrics API It uses the New Relic Telemetry sdk for go

Installation

Requires Apache Spark runnning in standalone mode (YARN and mesos not yet supported)

  1. Download the latest package from Release.

  2. Install the NR Spark Metric plugin plugin using the following command

    sudo tar -xvzf /tmp/nri-spark-metric.tar.gz -C /

    The following files will be installed

    /etc/nri-spark-metric/
    /etc/nri-spark-metric/nr-spark-metric
    /etc/systemd/system/
    /etc/systemd/system/nr-spark-metric.service
    /etc/init/nr-spark-metric.conf
    

Installation

The integration can be deployed independently on linux 64 system or as a databricks integration using a notebook. The sections below suggests each.

Standalone deployment
  1. Create "nr-spark-metric-settings.yml" file in the the folder "/etc/nri-spark-metric/" using the following format

    sparkmasterurl: "http://localhost:8080"  <== FQDN ofspark master URL
    clustername:  mylocalcluster             <== Name of the cluster
    insightsapikey: xxxx                     <== Insights api key
    pollinterval: 5                          <== Polling interval
    clustermode:                             <== Set mode to *spark_driver_mode* for Single Node clusters
    tags:                                    <== Additional tags to be added to metrics
      nr_sample_tag_org: newrelic_labs
      nr_sample_tag_practice: odp
    
  2. Run the following command.

    service nr-spark-metric start

  3. Check for metrics in "Metric" event type in Insights

Databricks Init script creator notebook

This notebook and configuration is for reference purpose only, deployment should customize this to fulfill the needs

  1. Create a new notebook to deploy the cluster intialization script
  2. Copy the relevant script below. You do not need to set or touch the $DB_ values in the script, Databricks populates these for us. a Optional : Based on cluster install mode, uncommment SingleNodeCluster install , comment Standalone b Optional : Install infra agent, update with latest version
  3. Replace > with your New Relic Insights Insert Key.
  4. Add/Remove/Update tags require in the tag section, sample tags are configured using nr_sample_tag*
  5. Run this notebook to create to deploy the new_relic_install.sh script in dbfs in configured folder.
  6. Ensure the script is attached to your cluster and is listed in the notebooks of the cluster
  7. Running this script will create the file at dbfs:/nr/nri-spark-metric.sh
  8. Configure target cluster with the newrelic_install.sh cluster-scoped init script using the UI, Databricks CLI, or by invoking the Clusters API. This setting is found in Cluster configuration tab -> Advanced Options -> Init Scripts
  9. Add dbfs:/nr/nri-spark-metric.sh and click add.
  10. Restart your cluster
  11. Metrics should start reporting under the Metrics section in New Relic with the prefix of spark.X.X - you should get Job, Stage Executors and Stream metrics.
dbutils.fs.put("dbfs:/nr/nri-spark-metric.sh",""" 
#!/bin/sh
echo ">>> Check if this is driver? $DB_IS_DRIVER"
echo ">>> Spark Driver ip: $DB_DRIVER_IP"

#Create Cluster init script
cat <<EOF >> /tmp/start_spark-metric.sh

#!/bin/sh

if [ \$DB_IS_DRIVER ]; then
  # Root user detection
  if [ \$(echo "$UID") = "0" ];                                      
  then                                                                     
    sudo=''                                                                
  else
    sudo='sudo'                                                        
  fi
  
  echo ">>> Check if this is driver? $DB_IS_DRIVER"
  echo ">>> Spark Driver ip: $DB_DRIVER_IP"
    
# Optional install infra agent
  # Add license key 
  echo "license_key: <<NR LICENCE KEY >>" | \$sudo tee -a /etc/newrelic-infra.yml         
 
  #Determine OS version. Assuming this is Ubuntu
  OS_VERSION=\$(grep VERSION_ID /etc/os-release | cut -d = -f 2 | xargs echo | cut -d "." -f 1)
  echo ">>> OS_VERSION: \$OS_VERSION"
  
  #add Newrelic GPG key 
  \$sudo curl -s https://download.newrelic.com/infrastructure_agent/gpg/newrelic-infra.gpg | sudo apt-key add -
 
  #Add the infrastructure monitoring agent repository, midify this if OS version changes
  if [ \$OS_VERSION = "18" ];
  then 
    echo ">>> Bionic release"
    \$sudo printf "deb https://download.newrelic.com/infrastructure_agent/linux/apt bionic main" | sudo tee -a /etc/apt/sources.list.d/newrelic-infra.list
  else
    echo ">>> Other release, customize script"
  fi
 
  #Refresh repos 
  \$sudo apt-get update

 #install newreli-infra
 \$sudo apt-get install newrelic-infra -y

 ## adding logs configuration 
    echo "logs:
  - name: databricks.\$DB_CLUSTER_NAME
    file: /databricks/driver/logs/*.log
    attributes:
      nrlabs: data
      entity: databricks
      clustername: \$DB_CLUSTER_NAME
      IP: $DB_DRIVER_IP" > /etc/newrelic-infra/logging.d/spark.yml

# end of infra agent install   

# Install nr-spark-metric integration
   #Download nr-spark-metric integration
  \$sudo wget https://github.com/hsinghkalsi/nri-spark/releases/download/1.2.0/nri-spark-metric.tar.gz -P /tmp

  #Extract the contents to right place
  \$sudo tar -xvzf /tmp/nri-spark-metric.tar.gz -C /
  
  # Check which mode is the cluster running in  
  # Start of  SingleNodeCluster install , using "spark_driver_mode"', uncomment this section and comment out Standalone cluster
  #  echo '  > SingleNodeCluster, using "spark_driver_mode"'
  #  DB_DRIVER_PORT=\$(grep -i "CONF_UI_PORT" /tmp/driver-env.sh | cut -d'=' -f2)
  #  SPARK_CLUSTER_MODE='spark_driver_mode'
  # end of SingleNodeCluster install
    
  # Start of Standalone Cluster, use the below section 
  # Identify driver port in standalone mode 
    echo '  > Standalone cluster, using "spark_standalone_mode", waiting for master-params...'
    while [ -z \$is_available ]; do
      if [ -e "/tmp/master-params" ]; then
        DB_DRIVER_PORT=\$(cat /tmp/master-params | cut -d' ' -f2)
        SPARK_CLUSTER_MODE=''
        is_available=TRUE
      fi
      sleep 2
    done
  # end of Standalone Cluster section
  
  # Configure nr-spark-metric-settings.yml file 

  echo "sparkmasterurl: http://\$DB_DRIVER_IP:\$DB_DRIVER_PORT
clustername: \$DB_CLUSTER_NAME
insightsapikey: NRII-XXXXXXXXXXXXXXXX	
pollinterval: 5
clustermode: \$SPARK_CLUSTER_MODE
tags:
  nr_sample_tag_org: newrelic_labs
  nr_sample_tag_practice: odp" > /etc/nri-spark-metric/nr-spark-metric-settings.yml

  echo ' >>> Configured  nr-spark-metric-settings.yml \n $(</etc/nri-spark-metric/nr-spark-metric-settings.yml)'

#Enable the service
 \$sudo systemctl enable nr-spark-metric.service

  #Start the service 
  
 \$sudo systemctl start nr-spark-metric.service
 \$sudo start nr-spark-metric

fi
EOF

# Start 
if [ \$DB_IS_DRIVER ]; then
  chmod a+x /tmp/start_spark-metric.sh
  /tmp/start_spark-metric.sh >> /tmp/start_spark-metric.log 2>&1 & disown
fi

""",True)

Support

New Relic has open-sourced this project. This project is provided AS-IS WITHOUT WARRANTY OR DEDICATED SUPPORT. Issues and contributions should be reported to the project here on GitHub.

We encourage you to bring your experiences and questions to the Explorers Hub where our community members collaborate on solutions and new ideas.

Contributing

We encourage your contributions to improve [project name]! Keep in mind when you submit your pull request, you'll need to sign the CLA via the click-through using CLA-Assistant. You only have to sign the CLA one time per project. If you have any questions, or to execute our corporate CLA, required if your contribution is on behalf of a company, please drop us an email at [email protected].

License

New Relic Infrastructure Integration for Apache Spark is licensed under the Apache 2.0 License.