Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue writing in synapse spark 3.2 #43

Open
siege089 opened this issue Feb 1, 2024 · 4 comments
Open

Issue writing in synapse spark 3.2 #43

siege089 opened this issue Feb 1, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@siege089
Copy link

siege089 commented Feb 1, 2024

I'm using azure synapse and nothing I'm doing is allowing me to write models. I've explicitly included spark-avro in my pom file and loaded the spark-avro package into the spark pool workspace.

    <properties>
        <spark.version>3.2.0</spark.version>
        <scala.version.major>2.12</scala.version.major>
        <scala.version.minor>15</scala.version.minor>
    </properties>
    <dependencies>
        <dependency>
            <groupId>com.linkedin.isolation-forest</groupId>
            <artifactId>isolation-forest_${spark.version}_${scala.version.major}</artifactId>
            <version>3.0.3</version>
        </dependency>

        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.version.major}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.version.major}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-mllib_${scala.version.major}</artifactId>
            <version>${spark.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-avro_${scala.version.major}</artifactId>
            <version>${spark.version}</version>
        </dependency>

        <dependency>
            <groupId>com.microsoft.azure.synapse</groupId>
            <artifactId>synapseutils_${scala.version.major}</artifactId>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.jmockit</groupId>
            <artifactId>jmockit</artifactId>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.scalatest</groupId>
            <artifactId>scalatest_${scala.version.major}</artifactId>
        </dependency>
    </dependencies>
2024-01-30 01:31:47,163 INFO ApplicationMaster [shutdown-hook-0]: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: org.apache.spark.sql.AnalysisException:  Failed to find data source: com.databricks.spark.avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".        
	at org.apache.spark.sql.errors.QueryCompilationErrors$.failedToFindAvroDataSourceError(QueryCompilationErrors.scala:1028)
	at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:666)
	at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:720)
	at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:876)
	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:275)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:241)
	at com.linkedin.relevance.isolationforest.IsolationForestModelReadWrite$IsolationForestModelWriter.saveImplHelper(IsolationForestModelReadWrite.scala:262)
	at com.linkedin.relevance.isolationforest.IsolationForestModelReadWrite$IsolationForestModelWriter.saveImpl(IsolationForestModelReadWrite.scala:241)
	at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168)
@jverbus
Copy link
Contributor

jverbus commented Feb 12, 2024

I just created a fix.

#44

@jverbus
Copy link
Contributor

jverbus commented Feb 12, 2024

Try this

<dependency>
  <groupId>com.linkedin.isolation-forest</groupId>
  <artifactId>isolation-forest_3.2.4_2.12</artifactId>
  <version>3.0.4</version>
</dependency>

@siege089
Copy link
Author

siege089 commented Mar 8, 2024

Still getting the same error with this new version.

@jverbus jverbus self-assigned this May 30, 2024
@jverbus jverbus added the bug Something isn't working label May 30, 2024
@jverbus
Copy link
Contributor

jverbus commented Dec 17, 2024

I haven't been able to reproduce this error. Are you still running into the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants