feat: support terminology codes #106

cmdoret · 2024-10-01T13:15:54Z

Summary

This adds support for terminology codes instead of free text for specific metadata fields, along with autocomplete suggestion in the terminal.

Currently, the following properties/terminologies are used:

"cell_type": "https://purl.obolibrary.org/obo/cl.owl",
"source_material": "https://purl.obolibrary.org/obo/uberon.owl",
"taxon_id": "https://purl.obolibrary.org/obo/ncbitaxon/subsets/taxslim.owl"

Major changes

Refactor CLI code: move prompt utilities to dedicated module (modos.prompt)
Add code-matching module (modos.codes)
Implement CodeMatcher protocol with two members (remote/local)
- Remote is used if an endpoint is provided (completion runs on server, faster)
- Local used as fallback (ontology downloaded and runs on client machine)
fuzon-http service added as a service in modos server deployment
pyfuzon added as extra dependency for local code matching

Trying it out

To test local autocomplete in terminal:

modos create data/example
modos add data/example sample

To rely on the server for autocomplete:

make deploy
modos --endpoint=http://localhost create data/example
modos --endpoint=http://localhost add data/example sample

Notes

Codes are recommended based similarity between user input and labels, but only the URIs are persisted in metadata.

Follow up (separate issues):

speed up download when using local completer
- terminology caching (+async download in background)

Open questions

When creating a modos from input yaml (instead of interactively) (see data/ex_config.yaml), URIs are now required for the 3 properties above.

It may be painful for users to find out what URIs to input in the yaml. Should we provide some kind of subcommand just to get the codes (basically a fuzon wrapper)?
Perhaps something along the lines of this

# modos codes <property> <query>
modos codes cell_type "red blood cell"

deploy/fuzon/Dockerfile

deploy/docker-compose.yml

supermaxiste · 2024-10-16T08:29:16Z

deploy/nginx/default.conf.template

+  # terminology code matching server
+  location /fuzon {
+    rewrite ^/fuzon/(.*) /$1 break ;
+    proxy_pass ${FUZON_LOCAL_URL} ;


question(clarification): the config template provides both a public & local setup. How is this config used exactly to make sure that we're dealing with either public or local setup? The main reason I'm asking is if it makes more sense to separate the two into different files or make it clearer what needs to be modified in which case.

the "public" urls are the service addresses outside the compose network, whereas the "local" urls are their addresses within the compose network.

Generally, there are two options:

a service is deployed as part of the compose (default),

public url: http://<server-name>/<service-name>

local url: http://<service-name>:<service-port>

a service is deployed externally (e.g. an aws s3 bucket):

public url: http://<external-service>:<service-port>

local url: http://<external-service>:<service-port>

I'll try to make that clearer in the docs

Documented in 06ca00b
@marftn do you think this whole thing makes sense, or is this some kind of antipattern?

modos/codes.py

modos/prompt.py

supermaxiste · 2024-10-16T08:47:46Z

Code review

Mostly suggestions, recommendations and clarifications.
To add to my other comments: Makefile L42 has a typo in the S3_PUBLIC_URL:

modos-api/Makefile

Line 42 in 8d9deb2

    
           cd deploy; S3_PUBLC_URL="http://$(LOCAL_IP):9000" docker compose up --build --force-recreate

Co-authored-by: supermaxiste <[email protected]>

supermaxiste · 2024-10-21T14:42:05Z

Deployment review

praise: with make deploy everything was smooth
praise: the autocomplete feature is not only fast, but works extremely smoothly 🤓

issue(minor, non-blocking): the recommendations can sometimes point to blank nodes. When blank nodes are selected, modos throws an error because we're not providing a proper URI. See screenshot as an example.

[...]

question(minor, non-blocking): when working with modos objects, there's no list command and it becomes a bit confusing to work with objects that you created but forgot about. Maybe I missed a command, but as far as I can see everything assumes that we know the objects we're working with.

$ cli:~/modo-api$ modos --help
Usage: modos [OPTIONS] COMMAND [ARGS]...

  Multi-Omics Digital Objects command line interface.

Options:
  --endpoint TEXT  URL of modos server.  [env var: MODOS_ENDPOINT]
  --version        Print version of modos client
  --help           Show this message and exit.

Commands:
  add      Add elements to a modo.
  create   Create a modo interactively or from a file.
  publish  Export a modo as linked data.
  remove   Removes an element and its files from the modo.
  show     Show the contents of a modo.
  stream   Stream genomic file from a remote modo into stdout.
  update   Update a modo based on a yaml file.

I will pre-approve the PR since all the points I raised might be a separate PR.

cmdoret · 2024-10-23T00:32:02Z

blank nodes: good catch! Addressed upstream, as it makes no sense for fuzon to even keep these in memory (they have an undetermined URI) sdsc-ordes/fuzon#31

cmdoret · 2024-10-23T08:43:19Z

list objects: added a modos list command to list remote objects on the endpoint
remembering local objects on the filesystem is probably not worth it as it will come with many edge cases and we mostly intent to use modos with remote objects anyways.

cmdoret · 2024-10-23T11:52:57Z

I've also added a modos search-codes command to provide an easy way to find code URIs

cmdoret added 13 commits September 25, 2024 15:59

feat: code matching client

8b17610

feat(remote): register fuzon endpoint in client

42d8635

refactor(cli): prompt logic to dedicated prompt module

48cd565

refactor(cli): prompt logic to dedicated prompt module

b90072d

chore(deps): pyfuzon as extra dep

5c93ea4

chore: update deps

024bc2e

feat(cli): support code completion in modos add

fc886b2

perf(codes): limit suggestions to 50 codes

8d59c3e

fix(cli): use labels in recommendations

acac519

refactor(codes): custom Code struct

37f5150

chore(deps): bump modos-schema version

8e528a4

feat(cli): prompt autocompletes text, persists uris

1363d6c

fix(cli): disable unnecessary autocomplete on modos create

49a2ad6

cmdoret self-assigned this Oct 1, 2024

cmdoret linked an issue Oct 1, 2024 that may be closed by this pull request

[Feature request]: Support terminology codes #103

Closed

cmdoret added 10 commits October 1, 2024 15:25

test(data): use uris when required

2169165

fix(codes): fuzon-http api parameters

53dbc84

chore(make): document deploy recipe

ab0a807

feat(compose): add fuzon service

2ab52c7

feat(nginx): register fuzon in reverse proxy

85e3ea5

feat(fuzon): dockerized fuzon-http setup

653c73a

fix(compose): add envvar for fuzon service

a85dc51

fix(compose): fuzon envvars

5afc4ef

fix(nginx): typo

8df4cf3

chore(deps): add prompt-toolkit

b4eeb47

cmdoret commented Oct 2, 2024

View reviewed changes

deploy/fuzon/Dockerfile Outdated Show resolved Hide resolved

cmdoret requested a review from supermaxiste October 2, 2024 15:00

fix(deploy): pin fuzon to tag 0.2.3

8d9deb2