Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.NoClassDefFoundError: org/apache/atlas/ApplicationProperties #299

Open
terriblegirl opened this issue May 27, 2020 · 4 comments
Open

Comments

@terriblegirl
Copy link

spark version 2.4.5
atlas version 2.0.0
use maven execute
mvn package -DskipTests
successful!!
this screenshot
image
copy 1100-spark_model.json to <ATLAS_HOME>/models/1000-Hadoop

execute
spark-shell --jars spark-atlas-connector_2.11-0.1.0-SNAPSHOT.jar
--conf spark.extraListeners = com.hortonworks.spark.atlas.SparkAtlasEventTracker
--conf spark.sql.queryExecutionListeners = com.hortonworks.spark.atlas.SparkAtlasEventTracker
--conf spark.sql.streaming.streamingQueryListeners = com.hortonworks.spark.atlas.SparkAtlasStreamingQueryEventTracker

image
but I compiled successfully! why it says java.lang.NoClassDefFoundError: org/apache/atlas/ApplicationProperties what can I do?

@shivsood
Copy link

Looks like you missed supplying the application properties file.

@dhineshns
Copy link

any updates on this?

@YanXiangSong
Copy link

This is due to a missing jar package. You can use the /spark-atlas-connector-assembly/target directory of the
spark-atlas-connector-assembly-0.1.0-SNAPSHOT.jar
But I'm having trouble with this one.
java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String at com.hortonworks.spark.atlas.AtlasClientConf.get(AtlasClientConf.scala:50) at com.hortonworks.spark.atlas.types.AtlasEntityUtils$class.clusterName(AtlasEntityUtils.scala:29) at com.hortonworks.spark.atlas.sql.CommandsHarvester$.clusterName(CommandsHarvester.scala:45) at com.hortonworks.spark.atlas.types.AtlasEntityUtils$class.tableToEntity(AtlasEntityUtils.scala:60) at com.hortonworks.spark.atlas.sql.CommandsHarvester$.tableToEntity(CommandsHarvester.scala:45) at com.hortonworks.spark.atlas.sql.CommandsHarvester$InsertIntoHiveTableHarvester$.harvest(CommandsHarvester.scala:56) at com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor$$anonfun$2.apply(SparkExecutionPlanProcessor.scala:126) at com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor$$anonfun$2.apply(SparkExecutionPlanProcessor.scala:89) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor.process(SparkExecutionPlanProcessor.scala:89) at com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor.process(SparkExecutionPlanProcessor.scala:63) at com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:72) at com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:71) at scala.Option.foreach(Option.scala:257) at com.hortonworks.spark.atlas.AbstractEventProcessor.eventProcess(AbstractEventProcessor.scala:71) at com.hortonworks.spark.atlas.AbstractEventProcessor$$anon$1.run(AbstractEventProcessor.scala:38)

@kennydataml
Copy link

kennydataml commented Mar 30, 2021

Looks like you missed supplying the application properties file.

This is partially correct. As per the readme, the atlas-application.properties needs to be discoverable by spark. ie - needs to be in classpath (if cluster mode, use --files to ship to executor).

You also need to either

  1. provide the apache atlas jars (atlas-intg) to the spark submit (as well as many other jar dependencies)
  2. use the fat jar under spark-atlas-connector-assembly/target

NOTE: I am trying to make this work in Azure Databricks, which requires an init script.

I am only using the RestAtlasClient.scala. This leverages AtlasClientConf.scala which uses ApplicationProperties.java
Take a look at the ApplicationProperties.java in atlas repo.
You can see that if ATLAS_CONFIGURATION_DIRECTORY_PROPERTY == null then it will search under the classpath using ApplicationProperties.class.getClassLoader() which seems to be completely useless because that falls under the webapp section of Atlas.
So that means there's an assumption that spark workloads are running on the same VM as atlas web app? This is unclear to me.

If you look at the static variable of the ApplicationProperties class, you can see that ATLAS_CONFIGURATION_DIRECTORY_PROPERTY is set to java system property "atlas.conf". This stackoverflow post has the comment showing that if you set System.setProperty("atlas.conf", "<path to your properties>") in your spark job, then it will work.

Spark Conf

extra class path (not working)

I've tried setting the following spark conf options during spark-submit:

  • --conf "spark.driver.extraClassPath=path/to/properties-folder/*"
  • --conf "spark.executor.extraClassPath=path/to/properties-folder/*"

I tried multiple variations of folder paths, using the name of the file, not using the name of the file, using local:/folderpath, etc.
This does not work.
Log output:

21/03/30 18:54:46 INFO ApplicationProperties: Looking for atlas-application.properties in classpath
21/03/30 18:54:46 INFO ApplicationProperties: Looking for /atlas-application.properties in classpath
21/03/30 18:54:46 INFO ApplicationProperties: Loading atlas-application.properties from null

Summarized error:

ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Exception when registering SparkListener
...
Caused by: org.apache.atlas.AtlasException: Failed to load application properties
...
Caused by: org.apache.commons.configuration.ConfigurationException: Cannot locate configuration source null

We can see that the url variable is null.

extra java options (working)

I then tried setting Java system properties. specifically atlas.conf. There are 2 ways to do this:

  1. using spark-defaults.conf. The default Spark properties file is $SPARK_HOME/conf/spark-defaults.conf
  2.  --conf "spark.driver.extraJavaOptions=-Datlas.conf=path/to/properties-folder/" 
     --conf "spark.executor.extraJavaOptions=-Datlas.conf=path/to/properties-folder/"
    

I opted for using --conf which worked successfully.

Modified source code

I ended up setting the System property (tied to environment variable) within the class constructor of AtlasClientConf and object AtlasClienctConf
This didn't work either. Setting Java system parameters in Spark conf is the solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants