Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support terminology codes #106

Merged
merged 39 commits into from
Oct 23, 2024
Merged

feat: support terminology codes #106

merged 39 commits into from
Oct 23, 2024

Conversation

cmdoret
Copy link
Member

@cmdoret cmdoret commented Oct 1, 2024

Summary

This adds support for terminology codes instead of free text for specific metadata fields, along with autocomplete suggestion in the terminal.

Currently, the following properties/terminologies are used:

Major changes

  • Refactor CLI code: move prompt utilities to dedicated module (modos.prompt)
  • Add code-matching module (modos.codes)
  • Implement CodeMatcher protocol with two members (remote/local)
    • Remote is used if an endpoint is provided (completion runs on server, faster)
    • Local used as fallback (ontology downloaded and runs on client machine)
  • fuzon-http service added as a service in modos server deployment
  • pyfuzon added as extra dependency for local code matching

Trying it out

To test local autocomplete in terminal:

modos create data/example
modos add data/example sample

To rely on the server for autocomplete:

make deploy
modos --endpoint=http://localhost create data/example
modos --endpoint=http://localhost add data/example sample

Notes

Codes are recommended based similarity between user input and labels, but only the URIs are persisted in metadata.

Follow up (separate issues):

  • speed up download when using local completer
    • terminology caching (+async download in background)

Open questions

When creating a modos from input yaml (instead of interactively) (see data/ex_config.yaml), URIs are now required for the 3 properties above.

It may be painful for users to find out what URIs to input in the yaml. Should we provide some kind of subcommand just to get the codes (basically a fuzon wrapper)?
Perhaps something along the lines of this

# modos codes <property> <query>
modos codes cell_type "red blood cell"

@cmdoret cmdoret self-assigned this Oct 1, 2024
@cmdoret cmdoret linked an issue Oct 1, 2024 that may be closed by this pull request
deploy/fuzon/Dockerfile Outdated Show resolved Hide resolved
@cmdoret cmdoret requested a review from supermaxiste October 2, 2024 15:00
# terminology code matching server
location /fuzon {
rewrite ^/fuzon/(.*) /$1 break ;
proxy_pass ${FUZON_LOCAL_URL} ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question(clarification): the config template provides both a public & local setup. How is this config used exactly to make sure that we're dealing with either public or local setup? The main reason I'm asking is if it makes more sense to separate the two into different files or make it clearer what needs to be modified in which case.

Copy link
Member Author

@cmdoret cmdoret Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "public" urls are the service addresses outside the compose network, whereas the "local" urls are their addresses within the compose network.

Generally, there are two options:

  • a service is deployed as part of the compose (default),
    • public url: http://<server-name>/<service-name>
    • local url: http://<service-name>:<service-port>
  • a service is deployed externally (e.g. an aws s3 bucket):
    • public url: http://<external-service>:<service-port>
    • local url: http://<external-service>:<service-port>

I'll try to make that clearer in the docs

Copy link
Member Author

@cmdoret cmdoret Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented in 06ca00b
@marftn do you think this whole thing makes sense, or is this some kind of antipattern?

modos/codes.py Outdated Show resolved Hide resolved
modos/prompt.py Outdated Show resolved Hide resolved
@supermaxiste
Copy link
Member

Code review

Mostly suggestions, recommendations and clarifications.
To add to my other comments: Makefile L42 has a typo in the S3_PUBLIC_URL:

cd deploy; S3_PUBLC_URL="http://$(LOCAL_IP):9000" docker compose up --build --force-recreate

@cmdoret cmdoret requested a review from supermaxiste October 18, 2024 14:21
@supermaxiste
Copy link
Member

Deployment review

praise: with make deploy everything was smooth
praise: the autocomplete feature is not only fast, but works extremely smoothly 🤓

issue(minor, non-blocking): the recommendations can sometimes point to blank nodes. When blank nodes are selected, modos throws an error because we're not providing a proper URI. See screenshot as an example.

Screenshot 2024-10-21 at 16 28 55
[...]
Screenshot 2024-10-21 at 16 36 29

question(minor, non-blocking): when working with modos objects, there's no list command and it becomes a bit confusing to work with objects that you created but forgot about. Maybe I missed a command, but as far as I can see everything assumes that we know the objects we're working with.

$ cli:~/modo-api$ modos --help
Usage: modos [OPTIONS] COMMAND [ARGS]...

  Multi-Omics Digital Objects command line interface.

Options:
  --endpoint TEXT  URL of modos server.  [env var: MODOS_ENDPOINT]
  --version        Print version of modos client
  --help           Show this message and exit.

Commands:
  add      Add elements to a modo.
  create   Create a modo interactively or from a file.
  publish  Export a modo as linked data.
  remove   Removes an element and its files from the modo.
  show     Show the contents of a modo.
  stream   Stream genomic file from a remote modo into stdout.
  update   Update a modo based on a yaml file.

I will pre-approve the PR since all the points I raised might be a separate PR.

@cmdoret
Copy link
Member Author

cmdoret commented Oct 23, 2024

blank nodes: good catch! Addressed upstream, as it makes no sense for fuzon to even keep these in memory (they have an undetermined URI) sdsc-ordes/fuzon#31

@cmdoret
Copy link
Member Author

cmdoret commented Oct 23, 2024

list objects: added a modos list command to list remote objects on the endpoint
remembering local objects on the filesystem is probably not worth it as it will come with many edge cases and we mostly intent to use modos with remote objects anyways.

@cmdoret
Copy link
Member Author

cmdoret commented Oct 23, 2024

I've also added a modos search-codes command to provide an easy way to find code URIs

@cmdoret cmdoret merged commit eef83b0 into main Oct 23, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature request]: Support terminology codes
2 participants