EXAMPLES

How to use bigboy

├── bigboy
├── bigboy.json
└── examples
    ├── basic
    │   ├── extract.sql
    │   └── target.json
    ├── nest
    │   ├── extract.sql
    │   ├── nest.sql
    │   └── target.json
    ├── params
    │   ├── extract.sql
    │   └── target.json
    ├── prefetch
    │   ├── extract.sql
    │   ├── prefetch.sql
    │   └── target.json
    └── split
        ├── extract.sql
        └── target.json

Note: bigboy is a binary executable. Can be downloaded from the list of releases

All examples listed above are using the publically available Rfam MySQL Database

bigboy.json

{
  "connections": {
    "mysql_rfam": {
      "driver": "mysql",
      "server": "tcp(mysql-rfam-public.ebi.ac.uk:4497)",
      "database": "Rfam",
      "port": 4497,
      "user": "rfamro",
      "password": null
    }
  }
}

Basic

Minimal target setup

extract.sql

SELECT ncbi_id, scientific_name, created
FROM genome
LIMIT 5

target.json

{
  "connection": "mysql_rfam"
}

Run bigboy examples/basic in terminal Output is saved with json extension under out/examples/basic with current date filename. Output:

{"created":"2017-06-06 15:11:02","ncbi_id":"12892","scientific_name":"Potato spindle tuber viroid"}
{"created":"2017-06-06 15:11:07","ncbi_id":"12901","scientific_name":"Columnea latent viroid"}
{"created":"2017-06-06 15:11:12","ncbi_id":"53194","scientific_name":"Tomato apical stunt viroid-S"}
{"created":"2017-06-06 15:11:17","ncbi_id":"12885","scientific_name":"Tomato apical stunt viroid"}
{"created":"2017-06-06 15:11:21","ncbi_id":"32618","scientific_name":"Cucumber yellows virus"}

Prefetch

Allows to run SQL extract in parallel

extract.sql

SELECT ncbi_id, scientific_name, created
FROM genome
WHERE ncbi_id IN (%s)
LIMIT 5

prefetch.sql

SELECT ncbi_id
FROM genome
WHERE created >= '2020-01-01'

target.json

{
  "connection": "mysql_rfam",
  "prefetch": true
}

Run bigboy examples/prefetch in terminal Output is saved with json extension under out/examples/prefetch with current date filename. Output:

{"created":"2020-09-11 00:21:31","ncbi_id":"11053","scientific_name":"Dengue virus 1"}
{"created":"2020-09-11 00:21:36","ncbi_id":"11053","scientific_name":"Dengue virus 1"}
{"created":"2020-09-11 00:22:17","ncbi_id":"11053","scientific_name":"Dengue virus 1"}
{"created":"2020-09-11 00:22:30","ncbi_id":"11053","scientific_name":"Dengue virus 1"}
{"created":"2020-09-11 00:24:18","ncbi_id":"11053","scientific_name":"Dengue virus 1"}

Nest

Combines two query results into one Note: the parent column has to be numeric. Currently can't perform nesting by string

nesting objects

extract.sql

SELECT rfam_acc, CONVERT(SUBSTRING(rfam_acc, 3, 5),UNSIGNED INTEGER) id
FROM family
WHERE rfam_acc IN ("RF00511", "RF00516", "RF01099", "RF01173", "RF01214")

nest.sql

SELECT CONVERT(SUBSTRING(rfam_acc, 3, 5),UNSIGNED INTEGER) _parent, author_id, name
FROM family_author
LEFT JOIN author USING(author_id)
WHERE CONVERT(SUBSTRING(rfam_acc, 3, 5),UNSIGNED INTEGER) IN (%s)
ORDER BY 1, 2;

target.json

{
  "connection": "mysql_rfam",
  "nest": [
    {
      "connection": "mysql_rfam",
      "childKey": "authors",
      "parentKey": "id"
    }
  ]
}

Run bigboy examples/nest in terminal Output is saved with json extension under out/examples/nest with current date filename. Output:

{"authors":[{"author_id":"39","name":"Moxon SJ"}],"id":"511","rfam_acc":"RF00511"}
{"authors":[{"author_id":"5","name":"Barrick JE"},{"author_id":"9","name":"Breaker RR"}],"id":"516","rfam_acc":"RF00516"}
{"authors":[{"author_id":"38","name":"Moore B"},{"author_id":"61","name":"Wilkinson A"}],"id":"1099","rfam_acc":"RF01099"}
{"authors":[{"author_id":"61","name":"Wilkinson A"},{"author_id":"62","name":"Eberhardt R"}],"id":"1173","rfam_acc":"RF01173"}
{"authors":[{"author_id":"11","name":"Burge SW"},{"author_id":"61","name":"Wilkinson A"}],"id":"1214","rfam_acc":"RF01214"}

nesting values

If you modify the nest.sql to use _value column like this:

SELECT CONVERT(SUBSTRING(rfam_acc, 3, 5),UNSIGNED INTEGER) _parent, name _value
FROM family_author
LEFT JOIN author USING(author_id)
WHERE CONVERT(SUBSTRING(rfam_acc, 3, 5),UNSIGNED INTEGER) IN (%s)
ORDER BY 1;

The output childKey property would have an array of strings rather than objects:

{"authors":["Moxon SJ"],"id":"511","rfam_acc":"RF00511"}
{"authors":["Barrick JE","Breaker RR"],"id":"516","rfam_acc":"RF00516"}
{"authors":["Wilkinson A","Moore B"],"id":"1099","rfam_acc":"RF01099"}
{"authors":["Wilkinson A","Eberhardt R"],"id":"1173","rfam_acc":"RF01173"}
{"authors":["Burge SW","Wilkinson A"],"id":"1214","rfam_acc":"RF01214"}

params

Customizing the query using CLI arguments or default params

extract.sql

SELECT ncbi_id, scientific_name, created, updated
FROM genome
WHERE updated BETWEEN ? AND ?
LIMIT 5;

target.json

{
  "connection": "mysql_rfam",
  "timezone": "UTC",
  "params": [
    {
      "name": "updated",
      "type": "date"
    },
    {
      "name": "updated",
      "type": "date",
      "default": "today"
    }
  ]
}

Run bigboy examples/params -1y in terminal Output is saved with json extension under out/examples/params with current date filename. Output:

{"created":"2017-06-06 15:11:02","ncbi_id":12892,"scientific_name":"Potato spindle tuber viroid","updated":"2020-04-23 11:46:08"}
{"created":"2017-06-06 15:11:07","ncbi_id":12901,"scientific_name":"Columnea latent viroid","updated":"2020-04-23 11:46:08"}
{"created":"2017-06-06 15:11:12","ncbi_id":53194,"scientific_name":"Tomato apical stunt viroid-S","updated":"2020-04-23 11:47:10"}
{"created":"2017-06-06 15:11:17","ncbi_id":12885,"scientific_name":"Tomato apical stunt viroid","updated":"2020-04-23 11:46:08"}
{"created":"2017-06-06 15:11:21","ncbi_id":32618,"scientific_name":"Cucumber yellows virus","updated":"2020-04-23 11:46:28"}

split

Allows to split output into multiple files (currently by date)

extract.sql

SELECT upid, description, created
FROM genome
WHERE date BETWEEN '2020-01-01' AND '2020-01-07'

target.json

{
  "connection": "mysql_rfam",
  "split": {
    "by": "date",
    "layout": "2006-01-02 15:04:05",
    "value": "created"
  }
}

Run bigboy examples/split in terminal Output is saved in multiple files with json extension under out/examples/split each has records created in a corresponding date.

13548 rows written to 4 files in 1 seconds

out
└── examples
    └── split
        ├── 2018-03-12.json
        ├── 2018-03-13.json
        ├── 2018-03-19.json
        └── 2018-03-27.json

Output of the out/examples/split/2018-03-27.json:

{"created":"2018-03-27 16:25:52","description":"Mesorhizobium huakuii 7653R genome","upid":"UP000027109"}
{"created":"2018-03-27 16:28:19","description":"Yersinia pseudotuberculosis IP 32953, complete genome.","upid":"UP000031850"}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EXAMPLES.md

EXAMPLES.md

EXAMPLES

Table of Contents

Project Structure

Basic

Prefetch

Nest

nesting objects

nesting values

params

split

Files

EXAMPLES.md

Latest commit

History

EXAMPLES.md

File metadata and controls

EXAMPLES

Table of Contents

Project Structure

Basic

Prefetch

Nest

nesting objects

nesting values

params

split