Skip to content
This repository has been archived by the owner on Dec 21, 2023. It is now read-only.

Commit

Permalink
Add more documentation datasets (#251)
Browse files Browse the repository at this point in the history
  • Loading branch information
afranklin authored Feb 7, 2018
1 parent 28f01cc commit 39e7f27
Show file tree
Hide file tree
Showing 15 changed files with 25 additions and 21 deletions.
4 changes: 2 additions & 2 deletions userguide/activity_classifier/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ The activity classifier in Turi Create creates a deep learning model capable of

#### Introductory Example

In this example we create a model to classify physical activities done by users of a handheld phone, using both accelerometer and gyroscope data. We will use data from the [HAPT experiment](http://archive.ics.uci.edu/ml/datasets/Smartphone-Based+Recognition+of+Human+Activities+and+Postural+Transitions) which contains recording sessions of multiple users, each performing certain physical activities. The performed activities are walking, climbing up stairs, climbing down stairs, sitting, standing, and laying.
In this example we create a model to classify physical activities done by users of a handheld phone, using both accelerometer and gyroscope data. We will use data from the [HAPT experiment](http://archive.ics.uci.edu/ml/datasets/Smartphone-Based+Recognition+of+Human+Activities+and+Postural+Transitions) which contains recording sessions of multiple users, each performing certain physical activities.[<sup>1</sup>](../datasets.md) The performed activities are walking, climbing up stairs, climbing down stairs, sitting, standing, and laying.

Sensor data can be collected at varying frequencies. In the HAPT dataset, the sensors were sampled at 50Hz each - meaning 50 times per second. However, most applications would want to show outputs to the user at larger intervals. We control the output prediction rate via the ```prediction_window``` parameter. For example, if we want to produce a prediction every 5 seconds, and the sensors are sampled at 50Hz - we would set the ```prediction_window``` to 250 (5 sec * 50 samples per second).

Expand Down Expand Up @@ -66,4 +66,4 @@ We've seen how we can quickly create an activity classifier given recorded sessi

* [Advanced usage](advanced-usage.md)
* [Deployment via Core ML](export_coreml.md)
* [How does it work](how-it-works.md)
* [How does it work](how-it-works.md)
4 changes: 2 additions & 2 deletions userguide/activity_classifier/data-preperation.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# HAPT Data Preparation

In this section we will see how to get the [HAPT experiment](http://archive.ics.uci.edu/ml/datasets/Smartphone-Based+Recognition+of+Human+Activities+and+Postural+Transitions) data into the SFrame format expected by the activity classifier.
In this section we will see how to get the [HAPT experiment](http://archive.ics.uci.edu/ml/datasets/Smartphone-Based+Recognition+of+Human+Activities+and+Postural+Transitions) data into the SFrame format expected by the activity classifier.[<sup>1</sup>](../datasets.md)

First we need to download the data from [here](http://archive.ics.uci.edu/ml/machine-learning-databases/00341/HAPT%20Data%20Set.zip) in zip format. The code below assumes the data was unzipped into a directory named `HAPT Data Set`. This folder contains 3 types of files - a file containing the performed activities for each experiment, files containing the collected accelerometer samples, and files containing the collected gyroscope samples.

Expand Down Expand Up @@ -93,4 +93,4 @@ data = data.remove_column('activity_id')
data.save('hapt_data.sframe')
```

To learn more about the expected input format of the activity classifier please visit the [advanced usage](advanced-usage.md) section.
To learn more about the expected input format of the activity classifier please visit the [advanced usage](advanced-usage.md) section.
2 changes: 1 addition & 1 deletion userguide/clustering/dbscan.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ advantages:

To illustrate the basic usage of DBSCAN and how the results can differ from
K-means, we simulate non-spherical, low-dimensional data using the scikit-learn
datasets module.
datasets module.[<sup>1</sup>](../datasets.md)

```python
import turicreate as tc
Expand Down
5 changes: 3 additions & 2 deletions userguide/clustering/kmeans.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,9 @@ distance from point $$x$$ to center $$B$$ when assigning $$x$$ to a cluster.

#### Basic Usage

We illustrate usage of Turi Create K-means with a dataset used to classify
schizophrenic subjects based on MRI scans. The original data consists of
We illustrate usage of Turi Create K-means with the dataset from the [June
2014 Kaggle competition to classify schizophrenic subjects based on MRI
scans](https://www.kaggle.com/c/mlsp-2014-mri). Download **Train.zip** from the data tab.[<sup>1</sup>](../datasets.md) The original data consists of
two sets of features: functional network connectivity (FNC) features and
source-based morphometry (SBM) features, which we incorporate into a single
[`SFrame`](https://apple.github.io/turicreate/docs/api/generated/turicreate.SFrame.html)
Expand Down
2 changes: 2 additions & 0 deletions userguide/datasets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# User Guide Datasets
Apple has provided links to certain datasets for reference purposes only and on an “as is” basis. You are solely responsible for your use of the datasets and for complying with applicable terms and conditions, including any use restrictions and attribution requirements. Apple shall not be liable for, and specifically disclaims any warranties, express or implied, in connection with, the use of the datasets, including any warranties of fitness for a particular purpose or non-infringement.
11 changes: 6 additions & 5 deletions userguide/image_classifier/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,16 @@ create a high quality image classifier model.

#### Loading Data

Suppose we have a dataset containing labeled cat and dog images.
The [Kaggle Cats and Dogs Dataset](https://www.microsoft.com/en-us/download/details.aspx?id=54765) provides labeled cat and dog images.[<sup>1</sup>](../datasets.md) After downloading and decompressing the dataset, navigate to the main **kagglecatsanddogs** folder, which contains a **PetImages** subfolder.

```python
import turicreate as tc

# Load images
data = tc.image_analysis.load_images('train', with_path=True)
# Load images (Note: you can ignore 'Not a JPEG file' errors)
data = tc.image_analysis.load_images('PetImages', with_path=True)

# From the path-name, create a label column
data['label'] = data['path'].apply(lambda path: 'dog' if 'dog' in path else 'cat')
data['label'] = data['path'].apply(lambda path: 'dog' if '/Dog' in path else 'cat')

# Save the data for future use
data.save('cats-dogs.sframe')
Expand All @@ -44,7 +44,8 @@ data = tc.SFrame('cats-dogs.sframe')
# Make a train-test split
train_data, test_data = data.random_split(0.8)

# Automatically picks the right model based on your data.
# Automatically pick the right model based on your data.
# Note: Because the dataset is large, model creation may take hours.
model = tc.image_classifier.create(train_data, target='label')

# Save predictions to an SArray
Expand Down
2 changes: 1 addition & 1 deletion userguide/image_similarity/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ unsupervised.
In this example, we use the [Caltech-101
dataset](http://www.vision.caltech.edu/Image_Datasets/Caltech101/)
which contains images objects belonging to 101 categories with about 40
to 800 images per category.
to 800 images per category.[<sup>1</sup>](../datasets.md)

```python
import turicreate as tc
Expand Down
2 changes: 1 addition & 1 deletion userguide/recommender/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ interaction data and use that model to make recommendations.
Creating a recommender model typically requires a data set to use for
training the model, with columns that contain the user IDs, the item
IDs, and (optionally) the ratings. For this example, we use the [MovieLens
dataset](https://grouplens.org/datasets/movielens/).
20M dataset](https://grouplens.org/datasets/movielens/20m/).[<sup>1</sup>](../datasets.md)

```python
import turicreate as tc
Expand Down
2 changes: 1 addition & 1 deletion userguide/sframe/sframe-intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ A very common data format is the comma separated value (csv) file, which
is what we'll use for these examples. We will use some preprocessed data from
the
[Million Song Dataset](https://labrosa.ee.columbia.edu/millionsong/) to
aid our SFrame-related examples. The first table contains metadata
aid our SFrame-related examples.[<sup>1</sup>](../datasets.md) The first table contains metadata
about each song in the database. Here's how we load it into an SFrame:

```python
Expand Down
2 changes: 1 addition & 1 deletion userguide/supervised-learning/boosted_trees_classifier.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ decision trees.

##### Introductory Example

In this example, we will use the [Mushrooms dataset](https://archive.ics.uci.edu/ml/datasets/mushroom).
In this example, we will use the [Mushrooms dataset](https://archive.ics.uci.edu/ml/datasets/mushroom).[<sup>1</sup>](../datasets.md)
```python
import turicreate as tc

Expand Down
2 changes: 1 addition & 1 deletion userguide/supervised-learning/boosted_trees_regression.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ The algorithm simply fit a new decision tree to the residual at each iteration.

##### Introductory Example

In this example, we will use the [Mushrooms dataset](https://archive.ics.uci.edu/ml/datasets/mushroom).
In this example, we will use the [Mushrooms dataset](https://archive.ics.uci.edu/ml/datasets/mushroom).[<sup>1</sup>](../datasets.md)

```python
import turicreate as tc
Expand Down
2 changes: 1 addition & 1 deletion userguide/supervised-learning/decision_tree_classifier.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ on decision trees.

##### Introductory Example

In this example, we will use the [Mushrooms dataset](https://archive.ics.uci.edu/ml/datasets/mushroom).
In this example, we will use the [Mushrooms dataset](https://archive.ics.uci.edu/ml/datasets/mushroom).[<sup>1</sup>](../datasets.md)
```python
import turicreate as tc

Expand Down
2 changes: 1 addition & 1 deletion userguide/supervised-learning/decision_tree_regression.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ for more details).

##### Introductory Example

In this example, we will use the [Mushrooms dataset](https://archive.ics.uci.edu/ml/datasets/mushroom).
In this example, we will use the [Mushrooms dataset](https://archive.ics.uci.edu/ml/datasets/mushroom).[<sup>1</sup>](../datasets.md)

```python
import turicreate as tc
Expand Down
2 changes: 1 addition & 1 deletion userguide/supervised-learning/random_forest_classifier.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ forests.

##### Introductory Example

In this example, we will use the [Mushrooms dataset](https://archive.ics.uci.edu/ml/datasets/mushroom).
In this example, we will use the [Mushrooms dataset](https://archive.ics.uci.edu/ml/datasets/mushroom).[<sup>1</sup>](../datasets.md)
```python
import turicreate as tc

Expand Down
2 changes: 1 addition & 1 deletion userguide/supervised-learning/random_forest_regression.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ forests, all the base models are constructed independently using a

##### Introductory Example

In this example, we will use the [Mushrooms dataset](https://archive.ics.uci.edu/ml/datasets/mushroom).
In this example, we will use the [Mushrooms dataset](https://archive.ics.uci.edu/ml/datasets/mushroom).[<sup>1</sup>](../datasets.md)

```python
import turicreate as tc
Expand Down

0 comments on commit 39e7f27

Please sign in to comment.