Releases: dotnet/machinelearning
ML.NET 1.3.1
New Features
-
Deep Neural Networks Training (PREVIEW) (#4057)
Introduces in-preview 0.15.1Microsoft.ML.DNN
package that enables full DNN model retraining and transfer learning in .NET using C# bindings for tensorflow provided by Tensorflow .NET. The goal of this package is to allow high level DNN training and scoring tasks such as image classification, text classification, object detection, etc using simple yet powerful APIs that are framework agnostic but currently they only uses Tensorflow as the backend. The below APIs are in early preview and we hope to get customer feedback that we can incorporate in the next iteration.public static DnnEstimator RetrainDnnModel( this ModelOperationsCatalog catalog, string[] outputColumnNames, string[] inputColumnNames, string labelColumnName, string tensorFlowLabel, string optimizationOperation, string modelPath, int epoch = 10, int batchSize = 20, string lossOperation = null, string metricOperation = null, string learningRateOperation = null, float learningRate = 0.01f, bool addBatchDimensionInput = false, DnnFramework dnnFramework = DnnFramework.Tensorflow) public static DnnEstimator ImageClassification( this ModelOperationsCatalog catalog, string featuresColumnName, string labelColumnName, string outputGraphPath = null, string scoreColumnName = "Score", string predictedLabelColumnName = "PredictedLabel", string checkpointName = "_retrain_checkpoint", Architecture arch = Architecture.InceptionV3, DnnFramework dnnFramework = DnnFramework.Tensorflow, int epoch = 10, int batchSize = 20, float learningRate = 0.01f, bool measureTrainAccuracy = false)
-
Database Loader (PREVIEW) (#4035)
Introduces Database loader that enables training on databases. This loader supports any relational database supported by System.Data in .NET Framework or .NET Core, meaning that you can use many RDBMS such as SQL Server, Azure SQL Database, Oracle, PostgreSQL, MySQL, etc. This feature is in early preview and can be accessed viaMicrosoft.ML.Experimental
nuget.public static DatabaseLoader CreateDatabaseLoader(this DataOperationsCatalog catalog, params DatabaseLoader.Column[] columns)
Bug Fixes
Serious
-
SaveOnnxCommand appears to ignore predictors when saving a model to ONNX format: This broke export to ONNX functionality. (3974)
-
Unable to use fasterrcnn onnx model. (3963)
-
PredictedLabel is always true for Anomaly Detection: This bug disabled scenarios like fraud detection using binary classification/PCA. (#4039)
-
Update build certifications: This bug broke the official builds because of outdated certificates that were being used. (#4059)
Other
- Stop LightGbm Warning for Default Metric Input: Fixes warning, LightGBM
Warning Unknown parameter metric=
is produced when the default metric is used. (#3965)
Samples
Breaking Changes
None
Enhancements
CLI and AutoML API
- Bug fixes.
Remarks
- Machine Learning at Microsoft with ML.NET is presented at KDD 2019 Proceedings
ML.NET v1.2.0
General Availability
-
Microsoft.ML.TimeSeries
- Anomaly detection algorithms (Spike and Change Point):
- Independent and identically distributed.
- Singular spectrum analysis.
- Spectral residual from Azure Anomaly Detector/Kensho team.
- Forecasting models:
- Singular spectrum analysis.
- Prediction Engine for online learning
- Enables updating time series model with new observations at scoring so that the user does not have to re-train the time series with old data each time.
- Anomaly detection algorithms (Spike and Change Point):
-
Microsoft.ML.OnnxTransformer
Enables scoring of ONNX models in the learning pipeline. Uses ONNX Runtime v0.4. -
Microsoft.ML.TensorFlow
Enables scoring of TensorFlow models in the learning pipeline. Uses TensorFlow v1.13. Very useful for image and text classification. Users can featurize images or text using DNN models and feed the result into a classical machine learning model like a decision tree or logistic regression trainer.
New Features
-
Tree-based featurization (#3812)
Generating features using tree structure has been a popular technique in data mining. Useful for capturing feature interactions when creating a stacked model, dimensionality reduction, or featurizing towards an alternative label. ML.NET's tree featurization trains a tree-based model and then maps input feature vector to several non-linear feature vectors. Those generated feature vectors are:
- The leaves it falls into. It's a binary vector with ones happens at the indexes of reached leaves,
- The paths that the input vector passes before hitting the leaves, and
- The reached leaves values.
Here are two references.
- p. 9 (a Kaggle solution adopted by FB below).
- Section 3. (Facebook)
- Section of Entity-level personalization with GLMix. (LinkedIn)
-
Microsoft.Extensions.ML integration package. (#3827)
This package makes it easier to use ML.NET with app models that support Microsoft.Extensions - i.e. ASP.NET and Azure Functions.
Specifically it contains functionality for:
- Dependency Injection
- Pooling PredictionEngines
- Reloading models when the file or URI has changed
- Hooking ML.NET logging to Microsoft.Extensions.Logging
Bug Fixes
Serious
-
Time series Sequential Transform needs to have a binding mechanism: This bug made it impossible to use time series in NimbusML. (#3875)
-
Build errors resulting from upgrading to VS2019 compilers: The default CMAKE_C_FLAG for debug configuration sets /ZI to generate a PDB capable of edit and continue. In the new compilers, this is incompatible with /guard:cf which we set for security reasons. (#3894)
-
LightGBM Evaluation metric parameters: In LightGbm EvaluateMetricType where if a user specified EvaluateMetricType.Default, the metric would not get added to the options Dictionary, and LightGbmWrappedTraining would throw because of that. (#3815)
-
Change default EvaluationMetric for LightGbm: In ML.NET, the default EvaluationMetric for LightGbm is set to EvaluateMetricType.Error for multiclass, EvaluationMetricType.LogLoss for binary etc. This leads to inconsistent behavior from the user's perspective. (#3859)
Other
- CustomGains should allow multiple values in argument attribute. (#3854)
Breaking Changes
None
Enhancements
-
Fixes the Hardcoded Sigmoid value from -0.5 to the value specified during training. (#3850)
-
Fix TextLoader constructor and add exception message. (#3788)
-
Introduce the
FixZero
argument to the LogMeanVariance normalizer. (#3916) -
Ensembles trainer now work with ITrainerEstimators instead of ITrainers. (#3796)
-
LightGBM Unbalanced Data Argument. (#3925)
-
Tree based trainers implement ICanGetSummaryAsIDataView. (#3892)
-
CLI and AutoML API
Documentation and Samples
- Samples for applying ONNX model to in-memory images. (#3851)
- Reformatted all ~200 samples to 85 character width so the horizontal scrollbar does not appear on docs webpage. (#3930, 3941, 3949, 3950, 3947, 3943, 3942, 3946, 3948)
Remarks
- Roughly 200 Github issues were closed, the count decreased from ~550 to 351. Most of the issues got resolved due to the release of stable API and availability of samples.
ML.NET v1.1.0
New Features
-
Image type support in IDataView
PR#3263 added support for in-memory image as a type in IDataView. Previously it was not possible to use an image directly in IDataView, and the user had to specify the file path as a string and load the image using a transform. The feature resolved the following issues: 3162, 3723, 3369, 3274, 445, 3460, 2121, 2495, 3784.Image type support in IDataView was a much requested feature by the users.
Sample to convert gray scale image in-Memory | Sample for custom mapping with in-memory using custom type
-
Super-Resolution based Anomaly Detector (preview, please provide feedback)
PR#3693 adds a new anomaly detection algorithm to the Microsoft.ML.TimeSeries nuget. This algorithm is based on Super-Resolution using Deep Convolutional Networks and also got accepted in KDD'2019 conference as an oral presentation. One of the advantages of this algorithm is that it does not require any prior training and based on benchmarks using grid parameter search to find upper bounds it out performs the Independent and identically distributed(IID) and Singular Spectrum Analysis(SSA) based anomaly detection algorithms in accuracy. This contribution comes from the Azure Anomaly Detector team.Algo Precision Recall F1 #TruePositive #Positives #Anomalies Fine tuned parameters SSA (requires training) 0.582 0.585 0.583 2290 3936 3915 Confidence=99, PValueHistoryLength=32, Season=11, and use half the data of each series to do the training. IID 0.668 0.491 0.566 1924 2579 3915 Confidence=99, PValueHistoryLength=56 SR 0.601 0.670 0.634 2625 4370 3915 WindowSize=64, BackAddWindowSize=5, LookaheadWindowSize=5, AveragingWindowSize=3, JudgementWindowSize=64, Threshold=0.45 Sample for anomaly detection by SRCNN | Sample for anomaly detection by SRCNN using batch prediction
-
Time Series Forecasting (preview, please provide feedback)
PR#1900 introduces a framework for time series forecasting models and exposes an API for Singular Spectrum Analysis(SSA) based forecasting model in the Microsoft.ML.TimeSeries nuget. This framework allows to forecast w/o confidence intervals, update model with new observations and save/load the model to/from persistent storage. This closes following issues 929 and 3151 and was a much requested feature by the github community since September 2018. With this change Microsoft.ML.TimeSeries nuget is feature complete for RTM.Sample for forecasting | Sample for forecasting using confidence intervals
Bug Fixes
Serious
-
Math Kernel Library fails to load with latest libomp: Fixed by PR#3721 this bug made it impossible for anyone to check code into master branch because it was causing build failures.
-
Transform Wrapper fails at deserialization: Fixed by
PR#3700 this bug affected first party(1P) customer. A model trained using NimbusML(Python bindings for ML.NET) and then loaded for scoring/inferencing using ML.NET will hit this bug. -
Index out of bounds exception in KeyToVector transformer: Fixed by PR#3763 this bug closes following github issues: 3757,1751,2678. It affected first party customer and also github users.
Other
- Download images only when not present on disk and print warning messages when converting unsupported pixel format by PR#3625
- ML.NET source code does not build in VS2019 by PR#3742
- Fix SoftMax precision by utilizing double in the internal calculations by PR#3676
- Fix to the official build due to API Compat tool change by PR#3667
- Check for number of input columns in concat transform by PR#3809
Breaking Changes
None
Enhancements
- API Compat tool by PR#3623 ensures future changes to ML.NET will not break the stable API released in 1.0.0.
- Upgrade the TensorFlow version from 1.12.0 to 1.13.1 by PR#3758
- API for saving time series model to stream by PR#3805
Documentation and Samples
- L1-norm and L2-norm regularization documentation by PR#3586
- Sample for data save and load from text and binary files by PR#3745
- Sample for LoadFromEnumerable with a SchemaDefinition by PR#3696
- Sample for LogLossPerClass metric for multiclass trainers by PR#3724
- Sample for WithOnFitDelegate by PR#3738
- Sample for loading data using text loader using various techniques by PR#3793
Remarks
- Microsoft.ML.TensorFlow, Microsoft.ML.TimeSeries, Microsoft.ML.OnnxConverter, Microsoft.ML.OnnxTransformer nugets are expected to be upgraded to release in ML.NET 1.2 release. Please give them a try and provide feedback.
ML.NET v1.0.0
ML.NET is now 1.0.0
. 🍰
This is our stable API. In this final sprint we have worked mainly on improving the documentation. Please let us know what you like about ML.NET and what we can improve to make your use of machine learning easier in .NET. With this release we are committed to staying backward compatible.
ML.NET v1.0.0-preview
This is the RC1
release for ML.NET version 1.0.0
. The work on the API project has been concluded. The focus before releasing version 1.0.0
would be to enhance documentation and samples as well as addressing any critical issues. Please note that NuGets have now 1.0.0-preview as well as 0.12.0-preview versions depending on which one will become stable release. Also IDataView
is now in Microsoft.ML
namespace. As always thank you so much for being an awesome community of Machine Learning enthusiasts.
ML.NET v0.11
A lot more API clean up as well as many fixes are packed in this release! We are quickly approaching RC1 release for ML.NET and our first priority is to complete the API related work. Thank you for being patient while we get closer to our stable surface. We are super excited to work through the remaining issues and ship V0.1. In fact we are so excited that in the release notes was mentioned that FastTree
has a new package now. That is partially true as you can see in our nightly builds but 0.11 still does not have a separate package. oh well! :)
ML.NET v0.10
More API clean up as well as many fixes are in this release. We are preparing for our stable API in 1.0 release and greatly appreciate the community feedback and engagement. Please note that IDataView
is now in Microsoft.Data.DataView
#2220. Also please note that #2239 has changed the order of parameters and your existing code needs to be updated.
ML.NET v0.9
This release brings many fixes as well as significant API clean up. We have removed the API that was marked obsolete. Explainability features of ML.NET have also got some improvements as originally planned. Thanks to all the great support as we improve the API for 1.0 release.
ML.NET v0.8
ML.NET 0.8 is here with some very exciting features. Explainability, stateful time series, implicit feedback in recommendations and better debuggability as well as many bug fixes are included in this release. Please note that the legacy API has been marked obsolete and will be removed in the next release. Many thanks to the awesome users and community contributors for your continuous support.
ML.NET v0.7
ML.NET 0.7 brings multiple enhancements such as anomaly detection, matrix factorization, x86 builds, as well as custom transforms. We continue to refine our API with many exciting extensions. Thanks to everyone for your massive support and contributions in this release.