Version 2016.10.05.0
MLDB is the Machine Learning Database. It's the best way to get machine learning or AI into your applications or personal projects. Head on over to MLDB.ai to try it right now or see Running MLDB for installation details.
We're happy to announce the immediate availability of MLDB version 2016.10.05.0.
This release contains 141 new commits and modified 903 files. On top of many bug fixes and performance improvements, here are some of the highlights of this release:
New MongoDB interface
A big new feature is support for importing and exporting data to and from MongoDB, a popular NoSQL database. Although MongoDB can be very useful for certain use cases, it doesn't have any machine learning capabilities. We want to make it as easy as possible for our users to get their data in MLDB. So we have added the following new MLDB entities that make it easy to interface with MongoDB:
- mongodb.import procedure: used to import a MongoDB collection into an MLDB dataset
- mongodb.dataset dataset: read only MLDB dataset based on a MongoDB collection
- mongodb.record dataset: write-only MLDB dataset that writes to a MongoDB collection
- mongodb.query function: function to perform an MLDB SQL query against a MongoDB collection
Updated TensorFlow to 0.10.0
We updated the TensorFlow version shipped with MLDB to version 0.10.0. The new version includes many bug fixes and performance improvements. We're now also shipping MLDB with different TensorFlow kernels, each optimized for different instruction sets. So for instance, the kernel with AVX2 instructions will be used if it the processor on which MLDB is run supports it.
If you're interested in deep learning, make sure to checkout the Tensorflow Image Recognition Tutorial and the Transfer Learning with Tensorflow demo to see how easy
it is to run trained models with MLDB.
Updated V8 to Release 5.0
We have updated V8, the Javascript engine used in MLDB, to Release 5.0. This brings in a lot of improvements and new features, like improved ECMAScript 2015 (ES6), as well as increasing performance. It now also compiles for the ARM architecture, which is an important step as we're working towards having MLDB run on embedded architectures.
An example of what this benefits is the jseval
function, that makes it possible to execute arbitrary JavaScript code inline in an SQL query.
Check out the Executing JavaScript Code Directly in SQL Queries Using the jseval Function Tutorial for great examples of how jseval
can be used.
Fixes and improvements to import procedures
The SELECT
statement of the import.text
procedure has been improved to support the CASE
keyword. The adds extra flexibility to process data as it is being imported.
We also fixed a bug when using the NAMED
clause with the import.json
procedure that could cause undesired behaviour.
Updates to the classifier configuration
We have improved the user experience around configuring supervised algorithms in two ways.
First, we have clarified the documentation by creating a new Classifier configuration section that contains the information related to the configuration of supervised models. When using one of the two procedures that can be used to train models, the classifier.train and classifier.experiment, all the information you need to configure your algorithm now lives in one place.
Second, we have made the training more robust to configuration errors by having better validation of elements meant to control hyper-parameters. Incorrect parameters will now trigger errors.
New vector space functions
We added two new vector space functions:
First, the new reshape(val, shape)
function takes an n-dimensional embedding and reinterprets it as an N-dimensional embedding of the provided shape containing all of the elements. This allows, for example, a 1-dimensional vector to be reinterpreted as a 2-dimensional array. The shape argument is an embedding containing the size of each dimension.
Second, the new shape(val)
takes an n-dimensional embedding and returns the size of each dimension as an array.
Other changes and fixes
- The
COLUMN EXPR
expression now supports theSTRUCTURED
keyword. By default,COLUMN EXPR
returns a flattened representation. Adding theSTRUCTURED
keyword will return the structured representation. - The
tsne.train
procedure now has alearningRate
configuration option. - Improved speed and fixes to JOIN operations.
- Columns and rows can now be named with an empty string.
- When evaluating a model using the classifier.test or the classifier.experimeny procedure, the F1-score was returned in a key named
f
. The name has been renamed tof1Score
. - The HTTP layer now correctly handles the
HTTP 1.1 100 CONTINUE
request header. - Fixed the ordering of paths when mixing Unicode and digits.
- The user function
fetcher
is now available as a built-in functionfetch
. - Fixed a bug with the
levenshtein_distance()
function where it did not work properly with UTF-8 characters. - Fixed an issue with the Javascript plugin's
serveStaticFolder()
function where the path to serve would not be considered relative to the plugin's installation directory. - Added the optional argument
sortField
to thestring_agg(expr, separator [, sortField])
function, that allows to sort the returned by thesortField
.