build: Cross-compile to Scala 2.12 #726

nightscape · 2019-10-27T13:59:11Z

Done:

SBT file adapted for cross-compilation
Ambiguities due to Scala 2.12 SAM ambiguity resolved.

To Do:

Make tests succeed
Test doc generation

welcome · 2019-10-27T13:59:13Z

💖 Thanks for opening your first pull request! 💖 We use semantic commit messages to streamline the release process. Before your pull request can be merged, you should make sure your first commit and PR title start with a semantic prefix. This helps us to create release messages and credit you for your hard work!
Examples of commit messages with semantic prefixes:

fix: Fix LightGBM crashes with empty partitions
feat: Make HTTP on Spark back-offs configurable
docs: Update Spark Serving usage
build: Add codecov support
perf: improve LightGBM memory usage
refactor: make python code generation rely on classes
style: Remove nulls from CNTKModel
test: Add test coverage for CNTKModel

Make sure to check out the developer guide for guidance on testing your change.

msftclas · 2019-10-27T13:59:24Z

All CLA requirements met.

nightscape · 2019-10-27T21:41:26Z

@drdarshan @mhamilton723 a lot of tests are failing because there seem to be missing datasets, possibly/probably caused by this

ERROR: The subscription of 'ce1dee05-8cf6-4ad6-990a-9c80868800ba' doesn't exist in cloud 'AzureCloud'.

I assume this subscription requires permissions to log in and retrieve the datasets, correct?
If so, is there a way I can get the datasets to test locally, or do you have to test this in your environment?

mhamilton723 · 2019-10-29T19:11:17Z

Hey @nightscape thanks so much for this great PR.

Datasets can be downloaded from
https://mmlspark.blob.core.windows.net/installers/datasets-2019-05-02.tgz

mhamilton723 · 2019-10-29T22:28:14Z

/azp run

azure-pipelines · 2019-10-29T22:28:23Z

Azure Pipelines successfully started running 1 pipeline(s).

codecov · 2019-10-29T23:50:27Z

Codecov Report

Merging #726 into master will increase coverage by 2.84%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #726      +/-   ##
==========================================
+ Coverage   82.35%   85.20%   +2.84%     
==========================================
  Files         189      188       -1     
  Lines        8693     8643      -50     
  Branches      517      532      +15     
==========================================
+ Hits         7159     7364     +205     
+ Misses       1534     1279     -255

Impacted Files	Coverage Δ
...com/microsoft/ml/spark/cognitive/AzureSearch.scala	`87.87% <100.00%> (ø)`
...icrosoft/ml/spark/io/binary/BinaryFileFormat.scala	`97.75% <100.00%> (+0.02%)`	⬆️
.../microsoft/ml/spark/io/powerbi/PowerBIWriter.scala	`88.57% <100.00%> (ø)`
...ql/execution/streaming/DistributedHTTPSource.scala	`84.35% <100.00%> (ø)`
...om/microsoft/ml/spark/io/http/SharedVariable.scala	`66.66% <0.00%> (-16.67%)`	⬇️
...com/microsoft/ml/spark/cognitive/RESTHelpers.scala	`35.00% <0.00%> (-15.00%)`	⬇️
...n/scala/org/apache/spark/ml/param/ArrayParam.scala	`60.00% <0.00%> (-10.00%)`	⬇️
...ain/scala/com/microsoft/ml/spark/nn/BallTree.scala	`82.85% <0.00%> (-4.77%)`	⬇️
.../microsoft/ml/spark/core/schema/Categoricals.scala	`82.10% <0.00%> (-4.36%)`	⬇️
...oft/ml/spark/recommendation/RankingEvaluator.scala	`88.00% <0.00%> (-4.31%)`	⬇️
... and 34 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 840781a...540554a. Read the comment docs.

nightscape · 2019-11-02T08:26:21Z

Hi @mhamilton723, I had a look at the errors in the build, but at first glance I don’t see how they could be caused by the Scala 2.12 switch.
Unfortunately, I’ve been pulled into other topics and can’t spend much more time on this.
Would it be ok to chicken out here?

mhamilton723 · 2019-11-05T16:15:23Z

@nightscape, appreciate you taking it this far. Just rebased on top of some latest test flakiness ixes so well see how it goes :)

mhamilton723 · 2019-11-05T16:16:19Z

/azp run

azure-pipelines · 2019-11-05T16:22:39Z

Azure Pipelines successfully started running 1 pipeline(s).

nightscape · 2020-04-21T16:26:48Z

/azp run

azure-pipelines · 2020-04-21T16:29:54Z

Commenter does not have sufficient privileges for PR 726 in repo Azure/mmlspark

imatiach-msft · 2020-04-21T16:44:34Z

/azp run

azure-pipelines · 2020-04-21T16:49:13Z

Pull request contains merge conflicts.

nightscape · 2020-04-21T16:50:26Z

For some reason, Github didn't pick up the changes I made to my branch.
Possibly caused by the recent hickups: https://www.githubstatus.com/
Will try changing something and force-pushing again.

nightscape · 2020-04-22T07:22:40Z

@imatiach-msft, the build seems stuck...
Could you trigger it again?
Thank you!

imatiach-msft · 2020-04-22T14:18:54Z

/azp run

azure-pipelines · 2020-04-22T14:19:03Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2020-04-22T14:19:07Z

done!

imatiach-msft · 2020-04-23T18:44:26Z

seems like a bunch of our dependencies don't have 2.12 versions yet?

nightscape · 2020-04-24T07:14:49Z

@imatiach-msft right. I assume this is a new dependency, afair I had this branch compiling locally before I rebased on master.
Will try to get a 2.12 build for https://github.com/linkedin/isolation-forest

nightscape · 2020-04-24T15:11:50Z

Askes for a 2.12 build here: linkedin/isolation-forest#14

imatiach-msft · 2020-04-24T15:14:44Z

also adding @eisber for the isolation forest dependency

imatiach-msft · 2020-04-24T15:15:40Z

I wonder if there is some way to make this an optional dependency for now for scala 2.12 build...

eisber · 2020-04-24T15:49:42Z

@jverbus should be able to help with the dependency. he built the artifacts for spark 2.3.0 vs 2.4.3

https://search.maven.org/search?q=g:com.linkedin.isolation-forest%20AND%20a:isolation-forest_2.4.3_2.10

nightscape · 2020-06-02T07:52:33Z

@imatiach-msft @eisber I updated to the newly released isolation-forest version for Scala 2.12 (thanks @jverbus !) and rebased on master.
Unfortunately, I don't see any results from the CI run. Does that have to be triggered manually?

eisber · 2020-06-04T12:02:03Z

/azp run

azure-pipelines · 2020-06-04T12:02:14Z

Azure Pipelines successfully started running 1 pipeline(s).

nightscape · 2020-06-04T21:23:44Z

@eisber unfortunately I still don't see the exact error in the Azure pipeline.
Would you be so kind to paste the relevant parts here?
Thank you!

eisber · 2020-06-15T08:26:09Z

@nightscape sorry for the long delay

Info Provided - Suite ClassBalancerSuite took 4.357s
[info] - yield proper weights
[info] + Test yield proper weights took 0.844s
Info Provided - Shutting down spark session
[info] CacherSuite:
20/06/04 12:10:09 WARN CacheManager: Asked to cache already cached data.
20/06/04 12:10:10 WARN CacheManager: Asked to cache already cached data.
20/06/04 12:10:10 WARN CacheManager: Asked to cache already cached data.
[info] - Serialization Fuzzing
[info] + Creating a spark session for suite CacherSuite
[info] + Test Serialization Fuzzing took 0.416s
20/06/04 12:10:10 WARN CacheManager: Asked to cache already cached data.
[info] - Experiment Fuzzing
[info] + Test Experiment Fuzzing took 0.002s
20/06/04 12:10:10 WARN CacheManager: Asked to cache already cached data.
[info] - Be the identity operation
[info] + Test Be the identity operation took 0.027s
Info Provided - Suite CacherSuite took 0.445s
Info Provided - Shutting down spark session
[info] SummarizeDataSuite:
20/06/04 12:10:10 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
[info] - Serialization Fuzzing
[info] + Creating a spark session for suite SummarizeDataSuite
[info] + Test Serialization Fuzzing took 7.856s
[info] - Experiment Fuzzing
[info] + Test Experiment Fuzzing took 1.413s
[info] - Smoke test for summarizing basic DF - schema transform
[info] + Test Smoke test for summarizing basic DF - schema transform took 0.017s
[info] - Smoke test for summary params
[info] + Test Smoke test for summary params took 0.007s
[info] - Smoke test for summarizing basic DF
[info] + Test Smoke test for summarizing basic DF took 1.611s
[info] - Smoke test for summarizing missings DF
[info] + Test Smoke test for summarizing missings DF took 1.552s
[info] - Smoke test for subset summarizing missings DF
[info] + Test Smoke test for subset summarizing missings DF took 0.47s
Alert Provided - Suite SummarizeDataSuite took 12.926s
Info Provided - Shutting down spark session
[info] RepartitionSuite:
[info] - Serialization Fuzzing
[info] + Creating a spark session for suite RepartitionSuite
[info] + Test Serialization Fuzzing took 0.389s
[info] - Experiment Fuzzing
[info] + Test Experiment Fuzzing took 0.002s
[info] - Work for several values of n
[info] + Test Work for several values of n took 0.058s
[info] - Should allow a user to set the partitions specifically in pipeline transform
[info] + Test Should allow a user to set the partitions specifically in pipeline transform took 0.058s
Info Provided - Suite RepartitionSuite took 0.507s
Info Provided - Shutting down spark session
[info] BatchIteratorSuite:
ratio: 1.0039155890709441, Batched: 2338ms, normal: 2329ms

[error] Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
[error] OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/mml-natives8324893102437100607/libopencv_java320.so which might have disabled stack guard. The VM will try to fix the stack guard now.
[error] It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'.
[info] Could not generate getters and setters for class com.microsoft.ml.spark.automl.BestModel due to no default constructor
[info] Could not generate getters and setters for class com.microsoft.ml.spark.automl.TuneHyperparametersModel due to no default constructor
[info] Could not generate getters and setters for class com.microsoft.ml.spark.featurize.AssembleFeaturesModel due to no default constructor
[info] Could not generate getters and setters for class com.microsoft.ml.spark.featurize.CleanMissingDataModel due to no default constructor
[info] Could not generate getters and setters for class org.apache.spark.ml.PipelineModel due to no default constructor
[info] Could not generate getters and setters for class com.microsoft.ml.spark.featurize.text.TextFeaturizerModel due to no default constructor
[info] Could not generate getters and setters for class com.microsoft.ml.spark.isolationforest.IsolationForestModel due to no default constructor
[info] Could not generate getters and setters for class com.microsoft.ml.spark.lightgbm.LightGBMClassificationModel due to no default constructor
[info] Could not generate getters and setters for class com.microsoft.ml.spark.lightgbm.LightGBMRankerModel due to no default constructor
[info] Could not generate getters and setters for class com.microsoft.ml.spark.lightgbm.LightGBMRegressionModel due to no default constructor
[info] Could not generate getters and setters for class org.apache.spark.ml.PipelineModel due to no default constructor
[info] Could not generate getters and setters for class com.microsoft.ml.spark.stages.TimerModel due to no default constructor
[info] Could not generate getters and setters for class com.microsoft.ml.spark.train.TrainedClassifierModel due to no default constructor
[info] Could not generate getters and setters for class com.microsoft.ml.spark.train.TrainedRegressorModel due to no default constructor

nightscape · 2020-06-15T22:33:48Z

Hmm, the only errors I see are the ones about OpenCV:

[error] Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
[error] OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/mml-natives8324893102437100607/libopencv_java320.so which might have disabled stack guard. The VM will try to fix the stack guard now.
[error] It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'.

I honestly don't have any clue what to do with them.
Could anyone from the Azure team take it from here
@mhamilton723 @imatiach-msft @eisber ?

utsavavlino · 2020-09-02T19:32:18Z

@mhamilton723 any inputs on the status , will there be 2.12 support for mmlspark soon?

yosit · 2020-09-22T16:30:16Z

Any chance to push this forward?

yosit · 2020-10-08T07:45:11Z

@nightscape a small fix to your branch would be to:
sed -i 's/scala-2.11/scala-2.12/' src/it/scala/com/microsoft/ml/spark/codegen/CodegenConfig.scala

nightscape · 2020-10-09T12:39:14Z

Overall I think this needs to be pushed over the line by someone from the Azure team like @eisber, @imatiach-msft or @mhamilton723.
If someone from the Azure team tells me "Look, make these 3 changes and we'll take it from there", I'd be happy to.
But right now I won't invest more time until there is a commitment to actually get this merged.

yosit · 2020-10-10T18:49:09Z

@nightscape Thank you for your contribution!

@eisber , @imatiach-msft , @mhamilton723 - We have been using this successfully at ZipRecruiter.
Happy to see this small changes merged.

srowen · 2020-10-30T17:20:19Z

Hi @imatiach-msft - greetings from Spark-land. I know a few of our customers love mmlspark and would love to use it on Spark 3. I see this and a related PR basically have that working. If I can help review, glad to - they'd love to get it released when you have a moment!

hanbing1587 · 2020-11-09T07:34:46Z

@nightscape I try to use "sbt clean package" to build mmlspark from this branch and generate a jar file. But I can't use the python API for lightgbm. The error information is "ModuleNotFoundError: No module named 'mmlspark.lightgbm._LightGBMClassifier' ". I estimate my building command is not correct. So what is the correct command to build mmlspark with python API?

nightscape · 2020-11-09T08:04:40Z

@hanbing1587 unfortunately I don't know either...
I just changed the Scala version settings and fixed all compile errors.

juanpaulo · 2021-01-20T07:24:32Z

related to #912?

nightscape · 2021-01-20T08:50:18Z

Definitely. I have no idea why the other PR got closed though, or why this one doesn't receive any love...
The Open Source part of this repo leaves room for improvement 😝

imatiach-msft · 2021-04-30T06:42:42Z

closing as we already support scala 2.12

nightscape requested review from drdarshan and mhamilton723 as code owners October 27, 2019 13:59

nightscape force-pushed the scala_2.12 branch from c265249 to 1e089e7 Compare October 27, 2019 14:14

nightscape changed the title ~~WIP: Cross-compile to Scala 2.12~~ WIP: build: Cross-compile to Scala 2.12 Oct 27, 2019

nightscape mentioned this pull request Oct 27, 2019

Make it available for scala 2.12 #445

Open

mhamilton723 force-pushed the scala_2.12 branch from 1e089e7 to 8c90700 Compare November 5, 2019 16:14

mhamilton723 force-pushed the scala_2.12 branch from 8c90700 to 64c0f04 Compare November 5, 2019 16:16

nightscape marked this pull request as draft April 21, 2020 16:22

nightscape changed the title ~~WIP: build: Cross-compile to Scala 2.12~~ build: Cross-compile to Scala 2.12 Apr 21, 2020

nightscape force-pushed the scala_2.12 branch from 612d811 to a660443 Compare April 21, 2020 16:59

nightscape force-pushed the scala_2.12 branch 2 times, most recently from 96dea67 to 269912f Compare June 2, 2020 07:46

Cross-compile to Scala 2.12

540554a

nightscape force-pushed the scala_2.12 branch from 269912f to 540554a Compare June 2, 2020 07:49

nightscape marked this pull request as ready for review June 2, 2020 07:50

yosit mentioned this pull request Oct 13, 2020

Bump to spark3 #912

Closed

imatiach-msft closed this Apr 30, 2021

build: Cross-compile to Scala 2.12 #726

build: Cross-compile to Scala 2.12 #726

Conversation

nightscape commented Oct 27, 2019 • edited Loading

welcome bot commented Oct 27, 2019

msftclas commented Oct 27, 2019 • edited Loading

nightscape commented Oct 27, 2019

mhamilton723 commented Oct 29, 2019

mhamilton723 commented Oct 29, 2019

azure-pipelines bot commented Oct 29, 2019

codecov bot commented Oct 29, 2019 • edited Loading

Codecov Report

nightscape commented Nov 2, 2019

mhamilton723 commented Nov 5, 2019

mhamilton723 commented Nov 5, 2019

azure-pipelines bot commented Nov 5, 2019

nightscape commented Apr 21, 2020

azure-pipelines bot commented Apr 21, 2020

imatiach-msft commented Apr 21, 2020

azure-pipelines bot commented Apr 21, 2020

nightscape commented Apr 21, 2020

nightscape commented Apr 22, 2020

imatiach-msft commented Apr 22, 2020

azure-pipelines bot commented Apr 22, 2020

imatiach-msft commented Apr 22, 2020

imatiach-msft commented Apr 23, 2020

nightscape commented Apr 24, 2020

nightscape commented Apr 24, 2020

imatiach-msft commented Apr 24, 2020

imatiach-msft commented Apr 24, 2020

eisber commented Apr 24, 2020 • edited Loading

nightscape commented Jun 2, 2020

eisber commented Jun 4, 2020

azure-pipelines bot commented Jun 4, 2020

nightscape commented Jun 4, 2020

eisber commented Jun 15, 2020 • edited Loading

nightscape commented Jun 15, 2020 • edited Loading

utsavavlino commented Sep 2, 2020

yosit commented Sep 22, 2020

yosit commented Oct 8, 2020

nightscape commented Oct 9, 2020

yosit commented Oct 10, 2020

srowen commented Oct 30, 2020

hanbing1587 commented Nov 9, 2020 • edited Loading

nightscape commented Nov 9, 2020

juanpaulo commented Jan 20, 2021

nightscape commented Jan 20, 2021

imatiach-msft commented Apr 30, 2021

nightscape commented Oct 27, 2019 •

edited

Loading

msftclas commented Oct 27, 2019 •

edited

Loading

codecov bot commented Oct 29, 2019 •

edited

Loading

eisber commented Apr 24, 2020 •

edited

Loading

eisber commented Jun 15, 2020 •

edited

Loading

nightscape commented Jun 15, 2020 •

edited

Loading

hanbing1587 commented Nov 9, 2020 •

edited

Loading