DAX data set URL: https://developer.ibm.com/exchanges/data/all/thematic-clustering-of-sentences/
-
Open https://github.com/CODAIT/watson-studio-gallery-dax-project-template
-
Click Use this template and create a new dataset project repository in https://github.com/CODAIT/ naming it
watson-studio-gallery-XXX-project
replacingXXX
with the data set short name. -
Create an issue in https://github.com/CODAIT/DAX-Datasets/issues/new/choose tracking development/publication progress
-
Add a link to the newly created project in this status document.
-
Review the content of this sample project https://dataplatform.cloud.ibm.com/exchange/public/entry/view/a7432f0c29c5bda2fb42749f363bd9ff to familiarize yourself with the typical content of a DAX project:
- Description
- README
- Data assets
- Notebooks
The source for this project is located here: https://github.ibm.com/CODAIT/watson-studio-gallery-dax-weather-project
This is a one-time setup of your "development" environment. (Most notebooks will use a proprietary Watson Studio package to load and store data files. Therefore notebook development should not be performed in a local Jupyter environment.)
- Download the compressed data set archive (
.tar.gz
) from Cloud Object Storage into a temporary directory. - Extract the archive.
- If the data files are not of type
.csv
DO NOT PROCEED.
- Log in to Watson Studio and create an empty project.
- Choose meaningful name
- Add a short project description.
- Uncheck
Restrict who can be a collaborator
- Add a project token.
- Click Settings.
- Add an Access token (any name, role must be Editor).
- Add the extracted (raw) data set files.
- Click Assets.
- Click Add to project > Data and add each raw data set file (e.g.
.csv
) to the project.
- Try to export the data assets. If an error is raised because the archive is larger than 500MB, use the
Part 0 - Import Data.ipynb
notebook. Customize the notebook as follows:- Change the
dataset_download_url
- Change the
data_path_name
. - In the last code cell, customize the
if file.suffix != '.tgz':
filter as needed if the extracted archive contains files that should not be added as data assets.
- Change the
Review the notebook development instructions in /notebooks
.
- Add new notebooks ("from file") to the project using the
template-notebook.ipynb
in/notebooks
. - Before saving the notebook in GHE make sure to complete the following steps in Watso Studio:
- remove the first cell, which should look as follows
# @hidden_cell # The project token is an authorization token that is used to access project resources like data sources, connections, and used by platform APIs. from project_lib import Project project = Project(project_id='4...', project_access_token='p...') pc = project.project_context
This cell is automatically inserted when a user imports the project from the Watson Studio gallery. The project id and token are different for every user and every project instance and the content that works for you will not work for another user.
- clear all output cells
- remove the first cell, which should look as follows
We use ReviewNB
to better visualize updates to notebooks in Github. Due to several restrictions with using this tool, this is the process for getting notebook's ready to review:
- Create a new repository in github.com/codait if you are migrating the project from Github Enterprise.
- Make the repository private.
- Copy in all of the files created thus far in the project's Github Enterprise repository (we are only able to use the public Github version of
ReviewNB
at this time, hence the need for this step).
- Add a branch called
production
to the new repository.- In the repository's
Settings/Branches
page, make theproduction
branch the default (base) branch (Watson Studio currently can only push commits directly to themaster
branch of a repository, hence the need for this step).
- In the repository's
- Make sure you are assigned the
Admin
role in the Watson Studio project that you will push code to Github from:- If you are not assigned this role yet, you can have a current
Admin
grant you this privilege.
- If you are not assigned this role yet, you can have a current
- If your Watson Studio account does not yet have Github integration setup with your public Github account:
- Add a Github personal access token in Watson Studio's
Settings/Integration
page. - You can create this token by visiting your public Github's
Settings/Developer settings/Personal access tokens
page and clickingGenerate new token
. Make sure to give the token repo scope.
- Add a Github personal access token in Watson Studio's
- Inside of the Watson Studio project's
Settings
page:- Scroll to the
Integrations/Github repository
section and add the link to the new Github repository you created to which you will push your code.
- Scroll to the
- Now you are able to push commits to the
master
branch of the new repository you created.- To push a commit, open a notebook in edit mode, click the
Github integration
button in the top menu bar, clickPublish on Github
. - In the dialogue box, ensure the target path points to
./notebooks/your_target_notebook.ipynb
, add a commit message, selectAll content except hidden code cells
, and clickPublish
. - Follow this set of steps every time you need to make a commit.
- To push a commit, open a notebook in edit mode, click the
- Once you are ready for your notebook to be reviewed:
- Open a PR from the
master
branch against theproduction
branch. - Within this PR,
ReviewNB
will automatically add a button toCheck out this pull request on ReviewNB
. - Make sure
ReviewNB
is an Authorized Github App in your Github account'sSettings/Applications/Authorized Github Apps
page to be able to use the tool to add comments and code suggestions to individual cells of a notebook.
- Open a PR from the
Use this github repository to store all the artifacts that will be used to create the Watson Studio project for this data set.
- Copy the raw data set files into
/data_assets
following the instructions in /data_assets. - Copy the downloaded notebook files into
/notebooks
following the instructions in /notebooks. - Customize the metadata files in
/metadata
following the instructions in /metadata. - Complete the legal documents in
/legal
following the instructions in /legal.
- Follow the packaging instructions in dist.
-
Make sure you have completed the packaging instructions.
-
Complete the notebook publication checklist for each notebook.
- In the checklist document add a (company-wide readable) link to the completed data set publication approval request form.
- In the checklist document add a (company-wide readable) link to the data set publication approval.
- Save a copy of each completed document in the
legal
directory.
-
Send an email to @gdq:
- subject: Publication approval request for DAX/[insert-dataset-name] notebooks
- recipient:
[email protected]
- cc:
[email protected]
- body:
Requesting your approval to publish and license the following notebooks, including its source code under the terms of the MIT license.
- for each notebook
- include a link to the corresponding file in the source GH repository
- attach the completed publication checklist
- include a link to the data set's legal approval request form
- include a link to the data set's legal approval document
-
Once the request was approved the content team will work with the legal team to require clearance for the notebooks.
-
...