Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Iterblock with LLMBlock #58

Closed
wants to merge 10 commits into from

Conversation

npalaska
Copy link
Contributor

@npalaska npalaska commented Jul 1, 2024

This PR is rebased on top of #53

  • This Removes the Iterblock and replaces it with LLMblock.
  • The number of iterations of the Iterblock can be specified in the LLMBlock's gen_kwargs using openai's n parameter.
  • n parameter determines the number of output choices generated by the LLM service per sample in the batch.

@mergify mergify bot added the ci-failure label Jul 1, 2024
"gen_kwargs": {
"max_tokens": 2048,
"temperature": 0.7,
"n": 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be self.num_iters?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this is a vllm specific parameter - we are replacing the IterBlock with this. This parameter essentialy controls the number of return sequences

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, these parameters get passed to the openai client directly as kwargs. Ref to openai's n parameter https://platform.openai.com/docs/api-reference/completions/create#completions-create-n. In theory, openai's functionality of n is the same as what we were doing with num_iters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running ilab data generate with this branch gives:

  File "/home/markmc/sdg/src/instructlab/sdg/generate_data.py", line 257, in generate_data
    sdg_knowledge, sdg_freeform_skill, sdg_grounded_skill = _sdg_init(
                                                            ^^^^^^^^^^
  File "/home/markmc/sdg/src/instructlab/sdg/generate_data.py", line 144, in _sdg_init
    [
  File "/home/markmc/sdg/src/instructlab/sdg/generate_data.py", line 146, in <listcomp>
    flow_type(
TypeError: Flow.__init__() takes from 4 to 5 positional arguments but 6 were given

(after I fix the iterblock import error)

What should the num_instructions_to_generate parameter to generate_data() be used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivchander or @oindrillac any thoughts here? As far as I know the generate_data.py file still contains older sdg code right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it seems like generate_data.py has been updated to use the newer pipeline structure. So I guess I can go ahead and fix the num_iter in generate_data() API call. Just want to get confirmation from @oindrillac / @aakankshaduggal

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the point is that num_iters was a constructor arg to the Flow class before and it's not clear why that's changed. Is it the intention to change this from being parameterized to being 1? If it's always 1 then there's no need to specify n at all. If it stays as a param to the Flow class then it might be best to rename it since num_iters isn't really correct.. should be something more like num_parallel_outputs_per_input

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @njhill

Again the question @npalaska is ... how should the num_instructions_to_generate parameter to generate_data() be implemented?

(Is it not by specifying n?)

Copy link
Contributor Author

@npalaska npalaska Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, after discussing with @shivchander, we think that the client num_instructions_to_generate can be captured by openai's n parameter. Currently, the num_samples are hardcoded in the default_flows.py which is overwriting the num_instructions_to_generate /num_iters passed from generate_data(). It should be fixed in the latest commit.

Copy link
Member

@russellb russellb Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does that n parameter imply that the server supports batching? llama-cpp does not -- either way, just need to make sure it works with that backend too and not just vllm

@@ -123,8 +123,11 @@ def generate(self, samples, **gen_kwargs) -> Dataset:
outputs = [self._generate([sample], **gen_kwargs)[0] for sample in samples]
logger.debug("Generated outputs: {}".format(outputs))

num_parallel_samples = gen_kwargs.get("n", 1)
n_samples = [item for item in samples for i in range(num_parallel_samples)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this super confusing

What is the "n" in n_samples supposed to stand for? It reads to me like "number of samples", suggesting an integer

And the nested list comprehension is ... too clever

Suggest

  extended_samples = []
  for item in samples:
      extended_samples.extend([item] * num_parallel_samples)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sorry I couldn't come up with a better name. I agree extended_samples is better.

Some context for this change:
Openai's n parameter determines how many responses to generate for each input sample. This change fixes the zip functionality we use just below this code.

If we have n higher than 1 then we end up with scenarios something like this:

>>> input = ["s1", "s2"]
>>> output = ["o11", "o12", "o13", "o21", "o22", "o23"] # n==3
>>> c = zip(input, output)
>>> for item in c:
...     print(item)
... 
('s1', 'o11')
('s2', 'o12')

Instead, we want

>>> n_samples = [item for item in input for i in range(3)]
>>> n_samples
['s1', 's1', 's1', 's2', 's2', 's2']

>>> c = zip(n_samples, output)
>>> 
>>> for item in c:
...     print(item)
... 
('s1', 'o11')
('s1', 'o12')
('s1', 'o13')
('s2', 'o21')
('s2', 'o22')
('s2', 'o23')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO just some code comments would help with this.

@markmc
Copy link
Contributor

markmc commented Jul 1, 2024

The commit log could be much improved. The logical changes I see, with the explanations I'd expect:

  1. Convert filter values to a float - does this fix a bug? details of how the bug manifested would help others learn
  2. Combine question and context in the grounded skills flow at the end - why? what's the before and after behavior?
  3. template updates to get full pipeline working - these look like important, but subtle changes, what details can you provide to help a person in future understand what to look out for if they are making further changes?
  4. Change from ' to "
  5. Replace IterBlock and use openai's 'n' parameter instead

@markmc
Copy link
Contributor

markmc commented Jul 1, 2024

The commit log could be much improved. The logical changes I see, with the explanations I'd expect:

  1. Convert filter values to a float - does this fix a bug? details of how the bug manifested would help others learn
  2. Combine question and context in the grounded skills flow at the end - why? what's the before and after behavior?
  3. template updates to get full pipeline working - these look like important, but subtle changes, what details can you provide to help a person in future understand what to look out for if they are making further changes?

Ah, I see these three are from #53

@@ -39,12 +39,6 @@ def generate(self, dataset) -> Dataset:
drop_duplicates_cols = block_prop.get("drop_duplicates", False)
block = block_type(**block_config)

if block_type == IterBlock:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e2e is failing with:

  File "/home/runner/work/sdg/sdg/venv/lib/python3.11/site-packages/instructlab/sdg/pipeline.py", line 6, in <module>
    from .iterblock import IterBlock
ModuleNotFoundError: No module named 'instructlab.sdg.iterblock'

see above warnings

E0401: Unable to import 'instructlab.sdg.iterblock' (import-error)
W0611: Unused IterBlock imported from iterblock (unused-import)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, the pipeline file changes didn't get pushed, sigh!

@npalaska npalaska force-pushed the remove_iterblock branch from 07f876a to 6bf23ce Compare July 1, 2024 18:43
@njhill
Copy link
Contributor

njhill commented Jul 1, 2024

@npalaska you should rebase this on the latest main branch or merge latest main into this branch, right now the diff/PR includes unrelated changes.

@markmc
Copy link
Contributor

markmc commented Jul 2, 2024

  • n parameter determines the number of output choices generated by the LLM service per sample in the batch.

Does the llama-cpp backend support this parameter?

See the instructlab changelog:

Add vLLM backend to serve, chat and generate commands.
Add --backend flag to ilab model serve command to allow for specifying the backend to use when serving a model. This is useful when you have multiple backends installed and want to specify which one to use. Currently, the only supported backend are llama-cpp and vllm.

i.e. the llama-cpp backend will need to be supported

@njhill
Copy link
Contributor

njhill commented Jul 2, 2024

  • n parameter determines the number of output choices generated by the LLM service per sample in the batch.

Does the llama-cpp backend support this parameter?

I don't think so, the best thing here would be to wrap the client in this case - the generate method can be overridden to just make sequential calls in a loop and return the equivalent response when n>1.

@markmc
Copy link
Contributor

markmc commented Jul 2, 2024

  • n parameter determines the number of output choices generated by the LLM service per sample in the batch.

Does the llama-cpp backend support this parameter?

I don't think so, the best thing here would be to wrap the client in this case - the generate method can be overridden to just make sequential calls in a loop and return the equivalent response when n>1.

Right, so something like this in IterBlock:

    if self.client.supports_n:
       gen_kwargs['n'] = num_iters
       generated_samples = self.block.generate(samples, **gen_kwargs)
    else:
       for _ in range(num_iters):
           generated_samples.extend(self.block.generate(samples, **gen_kwargs)

(I don't actually know the best way to detect whether the client supports this parameter though)

Copy link
Contributor

mergify bot commented Jul 2, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @npalaska please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jul 2, 2024
oindrillac and others added 9 commits July 2, 2024 17:47
This is a step towards converting it to the final format (messages) required for training.

Signed-off-by: Oindrilla Chatterjee <[email protected]>
Co-authored-by: Aakanksha Duggal <[email protected]>
Co-authored-by: Shiv <[email protected]>
Changed the prompt templates and alignment with expected outputs. Conducted stress testing across various leaf nodes to ensure accuracy and relevance.

Signed-off-by: Oindrilla Chatterjee <[email protected]>
Co-authored-by: Aakanksha Duggal <[email protected]>
Co-authored-by: Shiv <[email protected]>
@npalaska npalaska force-pushed the remove_iterblock branch from 261cbbe to 0c90804 Compare July 2, 2024 21:47
@mergify mergify bot removed the needs-rebase label Jul 2, 2024
@npalaska
Copy link
Contributor Author

npalaska commented Jul 2, 2024

  • n parameter determines the number of output choices generated by the LLM service per sample in the batch.

Does the llama-cpp backend support this parameter?

I don't think so, the best thing here would be to wrap the client in this case - the generate method can be overridden to just make sequential calls in a loop and return the equivalent response when n>1.

n is an openai parameter and LLama.cpp supports openai protocols. It might be worth testing these changes with llama.cpp.

@markmc
Copy link
Contributor

markmc commented Jul 3, 2024

Ruff checking failing:

ERROR: one or more checks have failed.
Run 'tox -e ruff' to auto-correct all fixable errors.

@markmc
Copy link
Contributor

markmc commented Jul 3, 2024

n is an openai parameter and LLama.cpp supports openai protocols.

It may only support a subset of parameters, we can't assume it supports n

It might be worth testing these changes with llama.cpp.

This PR should not merge without this testing

Since NUM_INSTRUCTIONS=5 is what's used in the e2e test, this looks promising:

INFO 2024-07-02 22:47:01,977 serve.py:100: serve Serving model 'models/merlinite-7b-lab-Q4_K_M.gguf' with llama-cpp
...
INFO 2024-07-02 22:47:46,166 generate_data.py:304: generate_data Generated 5 samples

but also note:

INFO 2024-07-02 22:48:01,726 generate_data.py:304: generate_data Generated 6 samples
...
INFO 2024-07-02 22:54:41,264 generate_data.py:304: generate_data Generated 110 samples

Just a little more investigation is required to confirm that llama-cpp is handling this correctly

@markmc
Copy link
Contributor

markmc commented Jul 3, 2024

Since NUM_INSTRUCTIONS=5 is what's used in the e2e test, this looks promising:

INFO 2024-07-02 22:47:01,977 serve.py:100: serve Serving model 'models/merlinite-7b-lab-Q4_K_M.gguf' with llama-cpp
...
INFO 2024-07-02 22:47:46,166 generate_data.py:304: generate_data Generated 5 samples

Hmm, maybe this isn't so promising - it looks like we're creating 1 sample per seed example? There are 5 seed examples for the freeform skills test

@mergify mergify bot added the needs-rebase label Jul 3, 2024
Copy link
Contributor

mergify bot commented Jul 3, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @npalaska please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@njhill
Copy link
Contributor

njhill commented Jul 3, 2024

  • n parameter determines the number of output choices generated by the LLM service per sample in the batch.

Does the llama-cpp backend support this parameter?

I don't think so, the best thing here would be to wrap the client in this case - the generate method can be overridden to just make sequential calls in a loop and return the equivalent response when n>1.

Right, so something like this in IterBlock:

    if self.client.supports_n:
       gen_kwargs['n'] = num_iters
       generated_samples = self.block.generate(samples, **gen_kwargs)
    else:
       for _ in range(num_iters):
           generated_samples.extend(self.block.generate(samples, **gen_kwargs)

(I don't actually know the best way to detect whether the client supports this parameter though)

@markmc yes, exactly, except that again "num_iters" might not be the best name since the fact that they are iters in one of the cases is an implementation detail. Also we could have a wrapper for the client that encapsulates this and thus isn't needed in the block impls (and wouldn't need to use an IterBlock).

@npalaska
Copy link
Contributor Author

npalaska commented Jul 3, 2024

It may only support a subset of parameters, we can't assume it supports n

It might be worth testing these changes with llama.cpp.

This PR should not merge without this testing

@markmc Yeah I tested with model served with llama.cpp and it seems like openai client supports it in a way that it doesn't break the request but the server ignores the n from the openai request and generates only 1 request irrespective of the value of n in the request.

curl -X 'POST'   'http://127.0.0.1:8000/v1/completions'   -H 'accept: application/json'   -H 'Content-Type: application/json'   -d '{
  "prompt": "\n\n### Instructions:\nWhat is the capital of France?\n\n### Response:\n",
  "stop": [
    "\n",
    "###"
  ], "max_tokens": 7, "temperature": 0, "n": 10 
}'
{"id":"cmpl-1180cf33-23e5-4a23-9ecc-40e1f485d98f","object":"text_completion","created":1720021232,"model":"/models/merlinite-7b-lab-Q4_K_M.gguf","choices":[{"text":"Paris.","index":0,"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":21,"completion_tokens":4,"total_tokens":25}}

I guess the next thing would be to find out what is the best way to know whether the server supports the n parameter. The openai client to we use in our code to communicate with the server handles it gracefully. I am not sure if this can be done with if self.client.supports_n: we probably may need another argument to distinguish between llama.cpp and vLLM.

@markmc
Copy link
Contributor

markmc commented Jul 3, 2024

I guess the next thing would be to find out what is the best way to know whether the server supports the n parameter. The openai client to we use in our code to communicate with the server handles it gracefully. I am not sure if this can be done with if self.client.supports_n: we probably may need another argument to distinguish between llama.cpp and vLLM.

Yep. The most straightforward thing is to pass down a serve_backend parameter to the library and we have vllm vs llama-cpp logic

I'd prefer something more like model_serve_capabilities.n = True or something ... but that would probably take more design work than we have appetite for right now, and I guess it could be added later

@npalaska
Copy link
Contributor Author

npalaska commented Jul 3, 2024

Yep. The most straightforward thing is to pass down a serve_backend parameter to the library and we have vllm vs llama-cpp logic

Do you mean adding a new parameter serve_backend in the generate cli and then passing it down?

@markmc
Copy link
Contributor

markmc commented Jul 3, 2024

Yep. The most straightforward thing is to pass down a serve_backend parameter to the library and we have vllm vs llama-cpp logic

Do you mean adding a new parameter serve_backend in the generate cli and then passing it down?

no take it from the existing serve.backend config and pass it to generate_data() and then onto the Flow constructor

@njhill
Copy link
Contributor

njhill commented Jul 3, 2024

I worked with @npalaska to refactor and clean things up a bit, have opened a new PR #77. This doesn't require configuring the server type, it makes a test request to determine whether it supports batched inputs.

@npalaska
Copy link
Contributor Author

npalaska commented Jul 3, 2024

no take it from the existing serve.backend config and pass it to generate_data() and then onto the Flow constructor

@markmc So we are doing it in a different way where we determine whether the backend server supports batching by making a test request. We don't need to pass any server configs. @njhill has opened a new PR #77 with these changes on top of changes from this PR and I will close this one.

@markmc
Copy link
Contributor

markmc commented Jul 11, 2024

Replaced by #105

@markmc markmc closed this Jul 11, 2024
jwm4 pushed a commit to jwm4/sdg that referenced this pull request Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants