Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import functions from the ilab repository #11

Closed
8 of 9 tasks
aakankshaduggal opened this issue Jun 4, 2024 · 9 comments · Fixed by #89
Closed
8 of 9 tasks

Import functions from the ilab repository #11

aakankshaduggal opened this issue Jun 4, 2024 · 9 comments · Fixed by #89
Milestone

Comments

@aakankshaduggal
Copy link
Member

aakankshaduggal commented Jun 4, 2024

Some functions that are being called in the generate_data.py file are in this file - https://github.com/instructlab/instructlab/blob/main/src/instructlab/utils.py


to-do list:

  • DEFAULT_MULTIPROCESSING_START_METHOD
    • Used in lab.py but also this is just a variable
  • DEFAULT_API_KEY
    • Used in many places but also this is just a variable
  • DEFAULT_MODEL_OLD
    • Used in lab.py but also this is just a variable
  • get_model_family()

From utils, we import the following:

@russellb
Copy link
Member

russellb commented Jun 4, 2024

related to #6

@nathan-weinberg
Copy link
Member

@aakankshaduggal @russellb is this something we want to try and preserve commits for? if not I'm happy to take this (and the other issue I guess, seems one PR can resolve both)

@russellb
Copy link
Member

russellb commented Jun 4, 2024

@aakankshaduggal @russellb is this something we want to try and preserve commits for? if not I'm happy to take this (and the other issue I guess, seems one PR can resolve both)

In general, yes, I would always prefer keeping history. I can also walk you through how I would do it.

I haven't looked at these specifics. If the code in question is only used here, then moving it here seems like an easy decision. If it's used in other places, too, we should discuss further.

@nathan-weinberg
Copy link
Member

ack @russellb - fine either way - AFAIK the code in question is only used in the CLI currently

@russellb
Copy link
Member

russellb commented Jun 4, 2024

ack @russellb - fine either way - AFAIK the code in question is only used in the CLI currently

question is it called from only code other than what got moved here

@nathan-weinberg
Copy link
Member

Okay looked into this a bit - we import from two locations within instructlab - config and utils

From config, we import the following:

  • DEFAULT_MULTIPROCESSING_START_METHOD
    • Used in lab.py but also this is just a variable
  • DEFAULT_API_KEY
    • Used in many places but also this is just a variable
  • DEFAULT_MODEL_OLD
    • Used in lab.py but also this is just a variable
  • get_model_family()
    • Used by generate_data.py but also server.py

From utils, we import the following:

  • chunk_document()
    • Only used by generate_data.py
  • max_seed_example_tokens()
    • Only used by generate_data.py
  • num_chars_from_tokens()
    • Only used by generate_data.py
  • read_taxonomy()
    • Used by generate_data.py but also lab.py
  • get_sysprompt()
    • Used by generate_data.py but also lab.py, chat.py and make_data.py

@tiran
Copy link
Contributor

tiran commented Jun 20, 2024

You can write rules for Ruff or PyLint to detect these types of imports and raise an error. InstructLab uses Ruff for that:

[tool.ruff.lint.flake8-tidy-imports.banned-api]
"yamllint".msg = "yamllint is for CLI usage only."

russellb added a commit to russellb/sdg that referenced this issue Jun 24, 2024
This code was only used by instructlab.sdg, so move it over here
instead of leaving it back in the `instructlab` repo.

Part of issue instructlab#11

Signed-off-by: Russell Bryant <[email protected]>
markmc added a commit to markmc/sdg that referenced this issue Jun 27, 2024
Part of instructlab#11

sdg appears to be the main user of this, along with `ilab taxonomy diff`.

We want to adapt the output of read_taxonomy() to be better suited to
what sdg needs.

This is the majority of src/instructlab/utils.py from commit commit 4737feb
with read_taxonomy() and TaxonomyReadingException as the public API.

Signed-off-by: Mark McLoughlin <[email protected]>
russellb pushed a commit to russellb/sdg that referenced this issue Jun 28, 2024
Part of instructlab#11

sdg appears to be the main user of this, along with `ilab taxonomy diff`.

We want to adapt the output of read_taxonomy() to be better suited to
what sdg needs.

This is the majority of src/instructlab/utils.py from commit commit 4737feb
with read_taxonomy() and TaxonomyReadingException as the public API.

Signed-off-by: Mark McLoughlin <[email protected]>
markmc added a commit to russellb/sdg that referenced this issue Jun 28, 2024
Part of instructlab#11

sdg appears to be the main user of this, along with `ilab taxonomy diff`.

We want to adapt the output of read_taxonomy() to be better suited to
what sdg needs.

This is the majority of src/instructlab/utils.py from commit commit 4737feb
with read_taxonomy() and TaxonomyReadingException as the public API.

Temporarily disable logging-fstring-interpolation to get lint passing.

Signed-off-by: Mark McLoughlin <[email protected]>
@RobotSail
Copy link
Member

It looks like DEFAULT_MULTIPROCESSING_START_METHOD, DEFAULT_API_KEY, and DEFAULT_MODEL_OLD are no longer present in SDG, I'll mark those as complete for this epic.

russellb added a commit to russellb/sdg that referenced this issue Jul 6, 2024
This was the last import from the main `instructlab` package to
remove. All it did was return this string constant, so just copy it
over.

Closes instructlab#11

Signed-off-by: Russell Bryant <[email protected]>
@russellb
Copy link
Member

russellb commented Jul 6, 2024

The change to remove the last import from the main instructlab package is here: #89

@russellb russellb added this to the 0.1.0 milestone Jul 8, 2024
aakankshaduggal pushed a commit that referenced this issue Jul 8, 2024
This was the last import from the main `instructlab` package to
remove. All it did was return this string constant, so just copy it
over.

Closes #11

Signed-off-by: Russell Bryant <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants