Version 2016.08.31.0
MLDB is the Machine Learning Database. It's the best way to get machine learning or AI into your applications or personal projects. Head on over to MLDB.ai to try it right now or see Running MLDB for installation details.
We're happy to announce the immediate availability of MLDB version 2016.08.31.0.
This release contains 114 new commits and modified 366 files. On top of many bug fixes and performance improvements, here are some of the highlights of this release:
MLPaint: the Real-Time Handwritten Digit Recognizer plugin
We're very excited to present MLPaint, the Real-Time Handwritten Digit Recognizer, a web app that runs on MLDB. It was made by Jonathan, the awesome intern we had with us this summer. Check out the video demo here: https://www.youtube.com/embed/WGdLCXDiDSo
The two demos below go into the technical details of how this plugin was built. The plugin is hosted on Github if you want to check out the implementation.
New demos
- The Image Processing with Convolutions demo explains what convolutions are and shows different ways of doing them with MLDB, including using Tensorflow 2D convolution operator directly in SQL.
- The Recognizing Handwritten Digits demo explains the machine learning steps that went into creating the MLPaint plugin.
Classifier testing procedure now fully supports weights
Weighting examples correctly is a crucial part of training machine learning models that will generalize well. It can be used to compensate for sampling bias, class imbalance, etc. This is well supported for training in two ways:
- specifying the weight for each example by using the
weight
column in thetrainingData
query - using the
equalizationFactor
parameter that specifies the amount by which to adjust weights so that all classes have an equal total weight
Weights can also be useful for testing. For instance, the cost of making mistakes for certain examples can be much less than for others. Having the metrics take that into consideration will help deliver a clearer picture of the performance expectations you can have for the model.
All the metrics reported by the classifier.test
prodecure now fully take the weight of each example into account. You can specify the weight of each example by using the weight
column in the testingData
query.
Credentials
MLDB makes it very easy to access secured resources using a variety of protocols like http
, sftp
or even s3
. MLDB can store credentials and supply them transparently whenever required when accessing protected files.
We fixed an issue that cause a problem when credentials file were loaded from a remote resource when launching MLDB from the command-line by using the add-credentials-from-url
flag. This is mostly used in a production scenario. Error messages related to handling of credential files were also improved so they're clearer.
Updated pymldb to version 0.7.1
The pymldb library is an open-source pure-Python module which provides a wrapper library that makes it easy to work with MLDB from Python. Version 0.7.1 is a minor update changes the way the query
function sends requests to MLDB. Instead of passing the query using the query string, it now sends it in the JSON payload. This makes it possible to send big feature vectors without hitting the query-string size limit.
Check out the Using pymldb Tutorial notebook for more info.
Improvements for c++ plugin developers
MLDB allows its functionality to be extended with plugins. While we often showcase Python plugins, like MLPaint mentioned at the top of this post, it's also possible to write plugins in c++.
And so c++ plugin developers rejoice! It is now easier to take advantage of MLDB's powerful SQL engine from c++ by using the new eval_sql
function. It makes running queries easier and faster.
You can now also specify built-in functions by using SQL from c++. This allows for much more compact code and less boilerplate.
Shout out: Golang interface
We'd like to shout out to ZzEeKkAa who developed a very nice Golang interface for MLDB. Check it out if you're into Golang!
If you created a plugin or library that works with MLDB, make sure to reach out!
Exciting and upcoming
We have been hard at work on a new LiDAR MLDB plugin. This enables MLDB to process 3D point cloud data and do voxel rendering. It makes is possible to visualize raw and voxelized data from any point of view. Combined with our existing Tensorflow integration, it opens the door to a solving cutting-edge deep learning image recognition problems with MLDB.
It is also now possible to build MLDB on 32 and 64 bit ARM architectures. This will enable us to target a wider range of hardware. Think of smartphones, Raspberry Pi, or even Nvidia's Jetson TX1. This is a stepping stone in having MLDB run on-device.
Other changes and fixes
- It is now possible to compile MLDB with clang
- Cleanup of the logarithm functions:
ln(dp or numeric)
: natural logarithmlog(dp or numeric)
: base 10 logarithmlog(b numeric, x numeric)
: logarithm to base b- Calling the following functions is now valid:
sqrt(-1)=nan
,log(0)=-inf
,log(-1)=nan
.
- Fixed default timestamps coherence
- Improved speed and fixes to JOIN operations
- Fixed a bug that prevented credentials from being deleted
- In Python script error messages, proper file paths are now returned
- The dataset-specific query route (
/v1/datasets/<dataset_name>/query
) has been removed. Use the/v1/query
instead. - We've unified the way GET endpoints accept parameters. Some routes will take parameters from either the query-string or as a JSON payload. We now enforce that all parameters should be sent one way or the other, not a mix of both.