Skip to content

Commit

Permalink
Merge tag 'v2016.12.16.0' into release_latest
Browse files Browse the repository at this point in the history
Conflicts:
	container_files/demos/Transfer Learning with Tensorflow.ipynb
	mongodb/doc/MongoDataset.md
	mongodb/doc/MongoImport.md
	mongodb/doc/MongoQueryFunction.md
	mongodb/doc/MongoRecord.md
  • Loading branch information
Francois Maillet committed Dec 16, 2016
2 parents 1f737de + 595ab2d commit ec66477
Show file tree
Hide file tree
Showing 822 changed files with 28,326 additions and 13,389 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,5 @@ models
mldb_data
models
container_provis/packer
.ipynb_checkpoints
container_files/nvidia/files/*.deb
14 changes: 13 additions & 1 deletion .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
url = [email protected]:datacratic/baseimage-docker.git
[submodule "ext/tensorflow"]
path = ext/tensorflow
url = https://github.com/mldbai/tensorflow
url = git@github.com:mldbai/tensorflow
[submodule "ext/re2"]
path = ext/re2
url = https://github.com/mldbai/re2.git
Expand Down Expand Up @@ -55,3 +55,15 @@
[submodule "ext/giflib"]
path = ext/giflib
url = https://github.com/mldbai/giflib.git
[submodule "ext/pffft"]
path = ext/pffft
url = https://github.com/mldbai/pffft.git
[submodule "ext/JSONTestSuite"]
path = ext/JSONTestSuite
url = https://github.com/mldbai/JSONTestSuite.git
[submodule "ext/zstd"]
path = ext/zstd
url = https://github.com/mldbai/zstd.git
[submodule "testing/mldb_sample_plugin"]
path = testing/mldb_sample_plugin
url = [email protected]:mldbai/mldb_sample_plugin.git
1 change: 1 addition & 0 deletions AUTHORS
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
mldb.ai Inc
88 changes: 63 additions & 25 deletions Building.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,33 @@ It will take around **45 minutes on a 32-core machine with 244GB of RAM** to run
For C++ code to compile and the Python modules to install correctly, the following system packages need to be installed:

```bash
apt-get install -y git valgrind build-essential libboost-all-dev \
libgoogle-perftools-dev liblzma-dev libcrypto++-dev libblas-dev \
liblapack-dev python-virtualenv libcurl4-openssl-dev libssh2-1-dev \
libpython-dev libgit2-dev libarchive-dev libffi-dev \
libfreetype6-dev libpng12-dev libcap-dev autoconf libtool unzip \
language-pack-en libyaml-cpp-dev libsasl2-dev
apt-get install -y \
git \
autoconf \
build-essential \
language-pack-en \
libarchive-dev \
libblas-dev \
libboost-all-dev \
libcap-dev \
libcrypto++-dev \
libcurl4-openssl-dev \
libffi-dev \
libfreetype6-dev \
libgit2-dev \
libgoogle-perftools-dev \
liblapack-dev \
liblzma-dev \
libpng12-dev \
libpq-dev \
libpython-dev \
libsasl2-dev \
libssh2-1-dev \
libtool \
libyaml-cpp-dev \
python-virtualenv \
unzip \
valgrind
```
## Installing Docker

Expand Down Expand Up @@ -215,14 +236,16 @@ If the mldb_base layer does a lot of packages upgrade during its creation, it wo
To do so, run the following commands from the top of the mldb repo:

```
docker pull ubuntu:14.04
make baseimage
docker tag quay.io/datacratic/baseimage:0.9.17 quay.io/datacratic/baseimage:latest
docker push quay.io/datacratic/baseimage:latest
```

## S3 Credentials

Some tests require S3 credentials in order to run, as they access public files in the
`dev.mldb.datacratic.com` S3 bucket. These credentials are
`public-mldb-ai` S3 bucket. These credentials are
nothing special: they simply require read-only access to public S3 buckets.
But they need to be enabled for full test coverage.

Expand Down Expand Up @@ -365,34 +388,39 @@ make ... WITH_CUDA=1
First, the machine needs to be set up with cross compilers:

```
sudo apt-get install libc6-arm64-cross libc6-dev-arm64-cross linux-libc-dev-arm64-cross g++-aarch64-linux-gnu gcc-aarch64-linux-gnu
sudo apt-get install \
g++-aarch64-linux-gnu \
gcc-aarch64-linux-gnu \
libc6-arm64-cross \
libc6-dev-arm64-cross \
linux-libc-dev-arm64-cross
```

Then we need to add arm64 to Debian's multiarch support so that it can find the packages for an arm64 target system:
Then we need to modify the system's apt sources.list to add the `ubuntu-ports` repository for the `arm64` architecture:

```
sudo dpkg --add-architecture arm64
sudo apt-add-repository 'deb http://ports.ubuntu.com/ubuntu-ports/ trusty main restricted'
sudo apt-add-repository 'deb [arch=arm64] http://ports.ubuntu.com/ubuntu-ports/ trusty main restricted multiverse universe'
sudo apt-add-repository 'deb [arch=arm64] http://ports.ubuntu.com/ubuntu-ports/ trusty-updates main restricted multiverse universe'
sudo apt-get update
```

Thirdly, we need to download the cross development environment for the
target platform. This will be installed under build/aarch64/osdeps
target platform. This will be installed under `build/aarch64/osdeps`

```
make port_deps ARCH=aarch64
make -j$(nproc) port_deps ARCH=aarch64
```

Fourthly, we need to make the build tools for the host architecture

```
make build_tools
make -j$(nproc) build_tools
```

Finally, we can build the port itself:

```
make -j8 -k compile ARCH=aarch64
make -j$(nproc) compile ARCH=aarch64
```

Note that currently no version of the v8 javascript engine is available
Expand All @@ -404,35 +432,45 @@ from Debian for arch64. We are working on a solution.
First, the machine needs to be set up with cross compilers:

```
sudo apt-get install libc6-armhf-cross libc6-dev-armhf-cross linux-libc-dev-armhf-cross g++-arm-linux-gnueabihf gcc-arm-linux-gnueabihf g++-4.8-multilib-arm-linux-gnueabihf gcc-4.8-multilib-arm-linux-gnueabihf g++-4.8-arm-linux-gnueabihf gcc-4.8-arm-linux-gnueabihf
sudo apt-get install \
g++-4.8-arm-linux-gnueabihf \
g++-4.8-multilib-arm-linux-gnueabihf \
g++-arm-linux-gnueabihf \
gcc-4.8-arm-linux-gnueabihf \
gcc-4.8-multilib-arm-linux-gnueabihf \
gcc-arm-linux-gnueabihf \
libc6-armhf-cross \
libc6-dev-armhf-cross \
linux-libc-dev-armhf-cross
```

Then we need to add armhf to Debian's multiarch support so that it can find the packages for an armhf target system:
Then we need to modify the system's apt sources.list to add the `ubuntu-ports` repository for the `armhf` architecture:

```
sudo dpkg --add-architecture armhf
sudo apt-add-repository 'deb http://ports.ubuntu.com/ubuntu-ports/ trusty main restricted'
sudo apt-add-repository 'deb [arch=armhf] http://ports.ubuntu.com/ubuntu-ports/ trusty main restricted multiverse universe'
sudo apt-add-repository 'deb [arch=armhf] http://ports.ubuntu.com/ubuntu-ports/ trusty-updates main restricted multiverse universe'
sudo apt-get update
```

Thirdly, we need to download the cross development environment for the
target platform. This will be installed under build/aarch64/osdeps
target platform. This will be installed under `build/arm/osdeps`

```
make port_deps ARCH=arm
make -j$(nproc) port_deps ARCH=arm
```

Fourthly, we need to make the build tools for the host architecture
Fourthly, we need to make the build tools for the host architecture. Unfortunately this
takes quite a lot of time as a lot of Tensorflow is required in order to build itself.

```
make build_tools
make -j$(nproc) build_tools ### NOTE: do _not_ specify `ARCH=arm` here. ###
```

Finally, we can build the port itself:

```
make -j8 -k compile ARCH=arm
make -j$(nproc) compile ARCH=arm
```

The version of MLDB will be placed in build/arm/bin and build/arm/lib
The version of MLDB will be placed in `build/arm/bin` and `build/arm/lib`

4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ cores, how much RAM?
* What are you trying to do, what happened, and what would you have expected to
have happened instead?

MLDB is being actively developed by [Datacratic](http://datacratic.com/) and does
MLDB is being actively developed by [MLDB.ai](http://mldb.ai) and does
not have a public roadmap at the moment. If you would like to contribute code or
new features to MLDB, the best first step would be to send an email to
[email protected] to discuss how this might fit into existing development plans.
[email protected] to discuss how this might fit into existing development plans.
16 changes: 8 additions & 8 deletions arch/abort.cc
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
using namespace std;


namespace ML {
namespace MLDB {

namespace {

Expand All @@ -25,14 +25,14 @@ namespace {
/* COMPILE SETTING */
/******************************************************************************/

#ifndef JML_ABORT
# define JML_ABORT 0
#ifndef MLDB_ABORT
# define MLDB_ABORT 0
#else
# undef JML_ABORT
# define JML_ABORT 1
# undef MLDB_ABORT
# define MLDB_ABORT 1
#endif

enum { COMPILE_STATE = JML_ABORT };
enum { COMPILE_STATE = MLDB_ABORT };


/******************************************************************************/
Expand All @@ -44,7 +44,7 @@ struct AbortState {
AbortState() :
state(COMPILE_STATE)
{
state = state || getenv("JML_ABORT") != NULL;
state = state || getenv("MLDB_ABORT") != NULL;
}

bool state;
Expand Down Expand Up @@ -74,4 +74,4 @@ void set_abort_state(bool b)



} // namepsace ML
} // namepsace MLDB
16 changes: 7 additions & 9 deletions arch/abort.h
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
// This file is part of MLDB. Copyright 2015 Datacratic. All rights reserved.
//

/** abort.h -*- C++ -*-
Rémi Attab, 13 Nov 2012
Copyright (c) 2012 Datacratic. All rights reserved.
Utilities related to the abort() function.
This file is part of MLDB. Copyright 2015 Datacratic. All rights reserved.
These functions are meant to be used as debugging helpers so that the
program can be stopped as soon as an error is detected. This is mainly
Expand All @@ -14,15 +15,14 @@
*/

#ifndef __jml__utils__abort_h__
#define __jml__utils__abort_h__
#pragma once

namespace ML {
namespace MLDB {

/** Calls abort() if one of the following criterias are met:
- The environment variable JML_ABORT is set.
- The macro JML_ABORT is defined.
- The environment variable MLDB_ABORT is set.
- The macro MLDB_ABORT is defined.
- set_abort_state(true) is called.
Note that the value passed to set_abort_state() will override all other
Expand All @@ -39,6 +39,4 @@ bool get_abort_state();
void set_abort_state(bool b);


} // ML

#endif // __jml__utils__abort_h__
} // MLDB
8 changes: 4 additions & 4 deletions arch/arch.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,13 @@
#pragma once

#if defined(__i386__) || defined(__amd64__)
# define JML_INTEL_ISA 1
# define MLDB_INTEL_ISA 1
#elif defined (__aarch64__) || defined(__arm__)
# define JML_ARM_ISA 1
# define MLDB_ARM_ISA 1
#endif // intel ISA

# if defined(__amd64__) || defined(__aarch64__)
# define JML_BITS 64
# define MLDB_BITS 64
# else
# define JML_BITS 32
# define MLDB_BITS 32
# endif // 32/64 bits
10 changes: 0 additions & 10 deletions arch/arch.mk
Original file line number Diff line number Diff line change
Expand Up @@ -48,16 +48,6 @@ $(eval $(call library,exception_hook,exception_hook.cc,arch dl))
$(eval $(call library,node_exception_tracing,node_exception_tracing.cc,exception_hook arch dl))


ifeq ($(CUDA_ENABLED),1)

LIBARCH_CUDA_SOURCES := cuda.cc
LIBARCH_CUDA_LINK := arch OcelotIr OcelotParser OcelotExecutive OcelotTrace OcelotAnalysis hydrazine

$(eval $(call library,arch_cuda,$(LIBARCH_CUDA_SOURCES),$(LIBARCH_CUDA_LINK)))

endif # CUDA_ENABLED


ifeq ($(CAL_ENABLED),1)

LIBARCH_CAL_SOURCES := cal.cc
Expand Down
4 changes: 2 additions & 2 deletions arch/backtrace.cc
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@

using namespace std;

namespace ML {
namespace MLDB {

size_t backtrace(char * buffer, size_t bufferSize, int num_to_skip)
{
Expand Down Expand Up @@ -172,4 +172,4 @@ backtrace(const BacktraceInfo & info,
return result;
}

} // namespace ML
} // namespace MLDB
12 changes: 4 additions & 8 deletions arch/backtrace.h
Original file line number Diff line number Diff line change
@@ -1,19 +1,17 @@
// This file is part of MLDB. Copyright 2015 Datacratic. All rights reserved.

/* backtrace.h -*- C++ -*-
Jeremy Barnes, 26 February 2009
Copyright (c) 2009 Jeremy Barnes. All rights reserved.
This file is part of MLDB. Copyright 2015 Datacratic. All rights reserved.
Interface to a bactrace function.
*/

#include <iostream>
#include <vector>

#ifndef __jml__arch__backtrace_h__
#define __jml__arch__backtrace_h__
#pragma once

namespace ML {
namespace MLDB {

/** Basic backtrace information */
struct BacktraceInfo {
Expand Down Expand Up @@ -71,6 +69,4 @@ std::vector<BacktraceFrame> backtrace(int num_to_skip);
std::vector<BacktraceFrame>
backtrace(const BacktraceInfo & info, int num_to_skip);

} // namespace ML

#endif /* __jml__arch__backtrace_h__ */
} // namespace MLDB
Loading

0 comments on commit ec66477

Please sign in to comment.