Skip to content

Data Manipulation Use Case

Ilya Sher edited this page May 25, 2019 · 2 revisions

NGS Use Case: Data Manipulation

Typical flow is to get data into NGS, process it and output the result. Following are examples of how to do it.

Getting Data in

NGS has several ways to acquire data. Following are the most useful of them.

Getting Data in - fetch()

In order to process data, one needs to get the data in first. NGS has several methods and syntax constructs to accomplish the task. Listing some of them below.

fetch() method will read the data and parse it into a data structure. Currently, fetch() handles JSON. As almost anything else in NGS, fetch() can be easily extended. In case of fetch() the extension could be to support other data formats.

Fetch (read and parse) from URL:

$ ngs -p 'fetch("https://api.myip.com")'
{ip=1.2.3.4, country=COUNTRY_NAME, cc=COUNTRY_CODE}

Fetch (read and parse) from URL and access a field:

$ ngs -p 'fetch("https://api.myip.com").ip'  # Also: ngs -p 'myip()'
1.2.3.4

Fetch (read and parse) standard input:

$ curl https://api.myip.com | ngs -p 'fetch().ip'
1.2.3.4
  • See fetch() in official documentation.

Getting Data in - Run a Program and Parse Output

A syntax for running external program and parsing the output is ``my_program arg1 arg2 ...``

Notice that NGS knows about AWS CLI and not just decodes the JSON but gives you the relevant part of the response when you run AWS CLI using ``aws ...``.

$ ngs -p '``aws ec2 describe-vpcs``'
[{CidrBlock=..., DhcpOptionsId=dopt-..., State=available, VpcId=vpc-..., ...}, ...]
$ aws ec2 describe-vpcs
{"Vpcs": [ ... ]}

Run a program and immediately access a field of all elements:

# Get PublicIpAddress of all elements
# ngs -p '``aws ec2 describe-instances``.PublicIpAddress'
( Exception ... KeyNotFound -- at least one of the instances had no public address )

Run a program and immediately access a field of all elements but only for elements that do have it:

# Get PublicIpAddress of all elements that have it
$ ngs -p '``aws ec2 describe-instances``.get("PublicIpAddress")'
[1.2.3.4,1.2.3.5,...]

# Get PublicIpAddress of all elements that have it, using default
# value where PublicIpAddress is not present.
ngs -p '``aws ec2 describe-instances``.get("PublicIpAddress", "-")'
[1.2.3.4,-,1.2.3.5,-,-,...]


# Parse "locate" output into array of strings, filter using /ecs/ regex,
# fetch() from a file, show operations' names
$ ngs -p '``locate 'service-2.json'``.filter(/ecs/)[-1].fetch().operations.keys()'
[CreateCluster,CreateService,...]

Processing Data - Accessing / Extracting

Following short examples demonstrate how to access various parts of structured data.

  • my_arr[idx] syntax: a=[10, 20]; a[0] (result: 10)
  • my_arr.the_one() - use in cases when the array is expected to have exactly one element. Returns value of the element. Throws exception is array doesn't have exactly one element.
  • my_hash[prop_name] syntax: {"a": 10}["a"] (result: 10)
  • my_hash.my_prop syntax: {"a": 10}.a (result: 10)
  • my_array_of_hashes.my_prop syntax:
    • [{"a": 10}, {"a": 20, "b": 100}].a (result: [10,20])
    • ``aws ec2 describe-instances``.State.Name (result: [running,running,running,stopped, ...])

( TODO: Show numerical and predicate range )

Processing Data - Filtering

filter() method is typically used to filter arrays. Filtering hashes is done using filter(), filterk() and filterv().

The methods above have their reject counterparts: reject(), rejectk(), rejectv(). In filter() group of methods, if predicate evaluates to true, it means to keep the element. In reject() group of methods, predicate returning true means to throw away the element.

  • The general form of filter() and reject() methods:

    • SOMETHING.filter(PREDICATE)
    • SOMETHING.filterk(PREDICATE)
    • SOMETHING.filterv(PREDICATE)
    • SOMETHING.reject(PREDICATE)
    • SOMETHING.rejectk(PREDICATE)
    • SOMETHING.rejectv(PREDICATE)
  • Filtering arrays

    • [1,2,3,4].filter(F(x) x > 2) (result: [3,4])
    • [1,2,3,4].filter(X > 2) (result: [3,4])
    • [1,2,3,4].first(X > 2) (result: 3)
    • [1,2,3,4].reject(X > 2) (result: [1,2])
    • [{"a": 10}, {"a": 20, "b": 100}, {"a": 30, "c": 200}].filter({"a": X>10}) (result: [{a=20, b=100},{a=30, c=200}]
    • [{"a": 10}, {"a": 20, "b": 100}, {"a": 30, "c": 200}].filter({"a": 20}) (result: [{a=20, b=100}])
    • ``aws ec2 describe-instances``.filter({"State": {"Name": "stopped"}}).InstanceId (result: [i-111,i-222,i-333])
    • $(ls /usr).lines().filter(/bin/) (result: [bin,sbin])
    • $(ls /usr).lines().first(/bin/) (result: bin)
    • $(ls /usr).lines().reject(/bin/) (result: [lib,libexec,local,...])
  • Filtering hashes (aka maps / dictionaries)

    • Filter by values
      • {"abc": 1, "abx": 2, "def": 3}.filter(F(k, v) v > 1) (result: {abx=2, def=3})
      • {"abc": 1, "abx": 2, "def": 3}.filterv(F(v) v > 1) (same)
      • {"abc": 1, "abx": 2, "def": 3}.filterv(X > 1) (same)
    • Filter by keys
      • {"abc": 1, "abx": 2, "def": 3}.filter(F(k, v) k ~ /^a/) (result: {abc=1, abx=2})
      • {"abc": 1, "abx": 2, "def": 3}.filterk(F(k) k ~ /^a/) (same)
      • {"abc": 1, "abx": 2, "def": 3}.filterk(/^a/) (same)
    • Filter by both keys and values
      • {"abc": 1, "abx": 2, "def": 3}.filter(F(k, v) k ~ /^a/ and v > 1) (result: {abx=2})
  • Partitioning - partition(). partition() filters given array into two arrays, one with elements for which the predicate returned true and another one with the elements for which the predicate returned false.

      # Start with specific region
      servers_ordered_by_launch_time = ...
      p = servers_ordered_by_launch_time.partition(F(i) i.Region == START_REGION)
      servers = p[0] + p[1]
      # deploy to "servers" in order	
    

Processing Data - Mapping

WIP

  • Mapping arrays and array-like values using map().

    • [".svn", ".git"].map(F(x) "my_repo" / x) (result: [my_repo/.svn,my_repo/.git])
    • "abc".map(ord) (result: [97,98,99])
  • map

  • map_base_idx

  • map_idx_key_val

  • map_idx_val

  • mapk

  • mapkv

  • mapo

  • mapv

  • pmap

Processing Data - Handling Missing Values

WIP

  • get() - arrays and hashes
  • dflt()
  • Box type
  • +? operator

Processing Data - Converting

  • Hash()
  • Arr()
  • Set()

Processing Data - Collecting

WIP

  • collect

Processing Data - Iterating

Iterating over array and array-like values:

[10,20,30].each(echo)

[10,20,30].each(F(x) echo("Value: $x"))

[10,20,30].each_idx_val(F(i, x) echo("Value at index ${i}: $x"))

for x in [10,20,30] {
    echo("Value: $x")
}

# Print each letter on it's own lline
"abc".each(echo)

Iterating over hashes (aka maps & dictionaries):

h = {"a": 1, "b": 2}
h.each(F(k, v) echo("${k}=${v}"))

Data insights

WIP

Need to know something about all the items together?

[1,2,3].all(F(x) x > 0)   # true
[1,2,3].all(F(x) x > 10)  # false
[1,2,3].all(X > 0)        # true
[1,2,3].all(X > 10)       # false
  • any()
  • count()
  • Stats type

Inspect data

  • inspect()
  • typeof()

Getting data out

WIP

NGS has several command line options for convenient output control. See the official documentation for more information about command line options.

Simple print:

# ngs -p EXPR -- "print"
$ ngs -p '``aws ec2 describe-instances``.get("PublicIpAddress")'
[1.2.3.4,1.2.3.5,...]

Print each item on its own line:

# ngs -pl EXPR -- "print lines"
$ ngs -pl '``aws ec2 describe-instances``.get("PublicIpAddress")'
1.2.3.4
1.2.3.5
...
$ ngs -pl 0..3  # Range object for example can also be printed one item per line
0
1
2

Print JSON

# ngs -pj EXPR -- "print JSON"
$ ngs -pj '``aws ec2 describe-instances``.get("PublicIpAddress")'
[ "1.2.3.4", "1.2.3.5", ... ]

Print each item on its own line as JSON

# ngs -pjl EXPR -- "print JSON lines"
$ ngs -pjl '``aws ec2 describe-vpcs``'
{ "CidrBlock": ..., "DhcpOptionsId": ..., ...}
{ "CidrBlock": ..., "DhcpOptionsId": ..., ...}
...

Print table

# ngs -pt EXPR -- "print table"
# (uses Table type which is also available for your own needs) 
$ ngs -pt '``aws ec2 describe-vpcs``'
( prints human-readable, nicely aligned table )

$ ngs -pt '[{"a": 1, "b": 2}, {"a": 10}, {"b": 20}, {"c": "AHA!"}]'
a   b   c
1   2   -
10  -   -
-   20  -
-   -   AHA!
a   b   c
  • encode_json()

  • store()