Replace Iterblock with LLMBlock #58

npalaska · 2024-07-01T03:50:05Z

This PR is rebased on top of #53

This Removes the Iterblock and replaces it with LLMblock.
The number of iterations of the Iterblock can be specified in the LLMBlock's gen_kwargs using openai's n parameter.
n parameter determines the number of output choices generated by the LLM service per sample in the batch.

markmc · 2024-07-01T10:07:04Z

src/instructlab/sdg/default_flows.py

+                "gen_kwargs": {
+                    "max_tokens": 2048,
+                    "temperature": 0.7,
+                    "n": 1


Should this be self.num_iters?

No this is a vllm specific parameter - we are replacing the IterBlock with this. This parameter essentialy controls the number of return sequences

No, these parameters get passed to the openai client directly as kwargs. Ref to openai's n parameter https://platform.openai.com/docs/api-reference/completions/create#completions-create-n. In theory, openai's functionality of n is the same as what we were doing with num_iters.

Running ilab data generate with this branch gives:

File "/home/markmc/sdg/src/instructlab/sdg/generate_data.py", line 257, in generate_data sdg_knowledge, sdg_freeform_skill, sdg_grounded_skill = _sdg_init( ^^^^^^^^^^ File "/home/markmc/sdg/src/instructlab/sdg/generate_data.py", line 144, in _sdg_init [ File "/home/markmc/sdg/src/instructlab/sdg/generate_data.py", line 146, in <listcomp> flow_type( TypeError: Flow.__init__() takes from 4 to 5 positional arguments but 6 were given

(after I fix the iterblock import error)

What should the num_instructions_to_generate parameter to generate_data() be used for?

@shivchander or @oindrillac any thoughts here? As far as I know the generate_data.py file still contains older sdg code right?

Yeah, it seems like generate_data.py has been updated to use the newer pipeline structure. So I guess I can go ahead and fix the num_iter in generate_data() API call. Just want to get confirmation from @oindrillac / @aakankshaduggal

I think the point is that num_iters was a constructor arg to the Flow class before and it's not clear why that's changed. Is it the intention to change this from being parameterized to being 1? If it's always 1 then there's no need to specify n at all. If it stays as a param to the Flow class then it might be best to rename it since num_iters isn't really correct.. should be something more like num_parallel_outputs_per_input

Thanks @njhill

Again the question @npalaska is ... how should the num_instructions_to_generate parameter to generate_data() be implemented?

(Is it not by specifying n?)

Alright, after discussing with @shivchander, we think that the client num_instructions_to_generate can be captured by openai's n parameter. Currently, the num_samples are hardcoded in the default_flows.py which is overwriting the num_instructions_to_generate /num_iters passed from generate_data(). It should be fixed in the latest commit.

does that n parameter imply that the server supports batching? llama-cpp does not -- either way, just need to make sure it works with that backend too and not just vllm

markmc · 2024-07-01T10:25:00Z

src/instructlab/sdg/llmblock.py

@@ -123,8 +123,11 @@ def generate(self, samples, **gen_kwargs) -> Dataset:
            outputs = [self._generate([sample], **gen_kwargs)[0] for sample in samples]
        logger.debug("Generated outputs: {}".format(outputs))

+        num_parallel_samples = gen_kwargs.get("n", 1)
+        n_samples = [item for item in samples for i in range(num_parallel_samples)]


I find this super confusing

What is the "n" in n_samples supposed to stand for? It reads to me like "number of samples", suggesting an integer

And the nested list comprehension is ... too clever

Suggest

extended_samples = [] for item in samples: extended_samples.extend([item] * num_parallel_samples)

Yeah, sorry I couldn't come up with a better name. I agree extended_samples is better.

Some context for this change:
Openai's n parameter determines how many responses to generate for each input sample. This change fixes the zip functionality we use just below this code.

If we have n higher than 1 then we end up with scenarios something like this:

>>> input = ["s1", "s2"] >>> output = ["o11", "o12", "o13", "o21", "o22", "o23"] # n==3 >>> c = zip(input, output) >>> for item in c: ... print(item) ... ('s1', 'o11') ('s2', 'o12')

Instead, we want

>>> n_samples = [item for item in input for i in range(3)] >>> n_samples ['s1', 's1', 's1', 's2', 's2', 's2'] >>> c = zip(n_samples, output) >>> >>> for item in c: ... print(item) ... ('s1', 'o11') ('s1', 'o12') ('s1', 'o13') ('s2', 'o21') ('s2', 'o22') ('s2', 'o23')

IMO just some code comments would help with this.

markmc · 2024-07-01T10:36:57Z

The commit log could be much improved. The logical changes I see, with the explanations I'd expect:

Convert filter values to a float - does this fix a bug? details of how the bug manifested would help others learn
Combine question and context in the grounded skills flow at the end - why? what's the before and after behavior?
template updates to get full pipeline working - these look like important, but subtle changes, what details can you provide to help a person in future understand what to look out for if they are making further changes?
Change from ' to "
Replace IterBlock and use openai's 'n' parameter instead

markmc · 2024-07-01T10:40:16Z

The commit log could be much improved. The logical changes I see, with the explanations I'd expect:

Convert filter values to a float - does this fix a bug? details of how the bug manifested would help others learn

Combine question and context in the grounded skills flow at the end - why? what's the before and after behavior?

template updates to get full pipeline working - these look like important, but subtle changes, what details can you provide to help a person in future understand what to look out for if they are making further changes?

Ah, I see these three are from #53

markmc · 2024-07-01T16:45:08Z

src/instructlab/sdg/pipeline.py

@@ -39,12 +39,6 @@ def generate(self, dataset) -> Dataset:
            drop_duplicates_cols = block_prop.get("drop_duplicates", False)
            block = block_type(**block_config)

-            if block_type == IterBlock:


e2e is failing with:

File "/home/runner/work/sdg/sdg/venv/lib/python3.11/site-packages/instructlab/sdg/pipeline.py", line 6, in <module> from .iterblock import IterBlock ModuleNotFoundError: No module named 'instructlab.sdg.iterblock'

see above warnings

E0401: Unable to import 'instructlab.sdg.iterblock' (import-error) W0611: Unused IterBlock imported from iterblock (unused-import)

Ah yeah, the pipeline file changes didn't get pushed, sigh!

njhill · 2024-07-01T20:15:55Z

@npalaska you should rebase this on the latest main branch or merge latest main into this branch, right now the diff/PR includes unrelated changes.

markmc · 2024-07-02T10:20:07Z

n parameter determines the number of output choices generated by the LLM service per sample in the batch.

Does the llama-cpp backend support this parameter?

See the instructlab changelog:

Add vLLM backend to serve, chat and generate commands.
Add --backend flag to ilab model serve command to allow for specifying the backend to use when serving a model. This is useful when you have multiple backends installed and want to specify which one to use. Currently, the only supported backend are llama-cpp and vllm.

i.e. the llama-cpp backend will need to be supported

njhill · 2024-07-02T15:29:20Z

n parameter determines the number of output choices generated by the LLM service per sample in the batch.

Does the llama-cpp backend support this parameter?

I don't think so, the best thing here would be to wrap the client in this case - the generate method can be overridden to just make sequential calls in a loop and return the equivalent response when n>1.

markmc · 2024-07-02T15:44:12Z

n parameter determines the number of output choices generated by the LLM service per sample in the batch.

Does the llama-cpp backend support this parameter?

I don't think so, the best thing here would be to wrap the client in this case - the generate method can be overridden to just make sequential calls in a loop and return the equivalent response when n>1.

Right, so something like this in IterBlock:

    if self.client.supports_n:
       gen_kwargs['n'] = num_iters
       generated_samples = self.block.generate(samples, **gen_kwargs)
    else:
       for _ in range(num_iters):
           generated_samples.extend(self.block.generate(samples, **gen_kwargs)

(I don't actually know the best way to detect whether the client supports this parameter though)

mergify · 2024-07-02T20:39:23Z

This pull request has merge conflicts that must be resolved before it can be
merged. @npalaska please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

This is a step towards converting it to the final format (messages) required for training. Signed-off-by: Oindrilla Chatterjee <[email protected]> Co-authored-by: Aakanksha Duggal <[email protected]> Co-authored-by: Shiv <[email protected]>

Changed the prompt templates and alignment with expected outputs. Conducted stress testing across various leaf nodes to ensure accuracy and relevance. Signed-off-by: Oindrilla Chatterjee <[email protected]> Co-authored-by: Aakanksha Duggal <[email protected]> Co-authored-by: Shiv <[email protected]>

npalaska · 2024-07-02T22:29:48Z

n parameter determines the number of output choices generated by the LLM service per sample in the batch.

Does the llama-cpp backend support this parameter?

I don't think so, the best thing here would be to wrap the client in this case - the generate method can be overridden to just make sequential calls in a loop and return the equivalent response when n>1.

n is an openai parameter and LLama.cpp supports openai protocols. It might be worth testing these changes with llama.cpp.

markmc · 2024-07-03T09:04:54Z

Ruff checking failing:

ERROR: one or more checks have failed.
Run 'tox -e ruff' to auto-correct all fixable errors.

markmc · 2024-07-03T12:11:55Z

n is an openai parameter and LLama.cpp supports openai protocols.

It may only support a subset of parameters, we can't assume it supports n

It might be worth testing these changes with llama.cpp.

This PR should not merge without this testing

Since NUM_INSTRUCTIONS=5 is what's used in the e2e test, this looks promising:

INFO 2024-07-02 22:47:01,977 serve.py:100: serve Serving model 'models/merlinite-7b-lab-Q4_K_M.gguf' with llama-cpp
...
INFO 2024-07-02 22:47:46,166 generate_data.py:304: generate_data Generated 5 samples

but also note:

INFO 2024-07-02 22:48:01,726 generate_data.py:304: generate_data Generated 6 samples
...
INFO 2024-07-02 22:54:41,264 generate_data.py:304: generate_data Generated 110 samples

Just a little more investigation is required to confirm that llama-cpp is handling this correctly

markmc · 2024-07-03T12:16:08Z

Since NUM_INSTRUCTIONS=5 is what's used in the e2e test, this looks promising:

INFO 2024-07-02 22:47:01,977 serve.py:100: serve Serving model 'models/merlinite-7b-lab-Q4_K_M.gguf' with llama-cpp
...
INFO 2024-07-02 22:47:46,166 generate_data.py:304: generate_data Generated 5 samples

Hmm, maybe this isn't so promising - it looks like we're creating 1 sample per seed example? There are 5 seed examples for the freeform skills test

mergify · 2024-07-03T14:45:40Z

This pull request has merge conflicts that must be resolved before it can be
merged. @npalaska please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

njhill · 2024-07-03T14:59:49Z

n parameter determines the number of output choices generated by the LLM service per sample in the batch.

Does the llama-cpp backend support this parameter?

I don't think so, the best thing here would be to wrap the client in this case - the generate method can be overridden to just make sequential calls in a loop and return the equivalent response when n>1.

Right, so something like this in IterBlock:
    if self.client.supports_n:
       gen_kwargs['n'] = num_iters
       generated_samples = self.block.generate(samples, **gen_kwargs)
    else:
       for _ in range(num_iters):
           generated_samples.extend(self.block.generate(samples, **gen_kwargs)
(I don't actually know the best way to detect whether the client supports this parameter though)

@markmc yes, exactly, except that again "num_iters" might not be the best name since the fact that they are iters in one of the cases is an implementation detail. Also we could have a wrapper for the client that encapsulates this and thus isn't needed in the block impls (and wouldn't need to use an IterBlock).

npalaska · 2024-07-03T15:54:38Z

It may only support a subset of parameters, we can't assume it supports n

It might be worth testing these changes with llama.cpp.

This PR should not merge without this testing

@markmc Yeah I tested with model served with llama.cpp and it seems like openai client supports it in a way that it doesn't break the request but the server ignores the n from the openai request and generates only 1 request irrespective of the value of n in the request.

curl -X 'POST'   'http://127.0.0.1:8000/v1/completions'   -H 'accept: application/json'   -H 'Content-Type: application/json'   -d '{
  "prompt": "\n\n### Instructions:\nWhat is the capital of France?\n\n### Response:\n",
  "stop": [
    "\n",
    "###"
  ], "max_tokens": 7, "temperature": 0, "n": 10 
}'
{"id":"cmpl-1180cf33-23e5-4a23-9ecc-40e1f485d98f","object":"text_completion","created":1720021232,"model":"/models/merlinite-7b-lab-Q4_K_M.gguf","choices":[{"text":"Paris.","index":0,"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":21,"completion_tokens":4,"total_tokens":25}}

I guess the next thing would be to find out what is the best way to know whether the server supports the n parameter. The openai client to we use in our code to communicate with the server handles it gracefully. I am not sure if this can be done with if self.client.supports_n: we probably may need another argument to distinguish between llama.cpp and vLLM.

markmc · 2024-07-03T16:46:16Z

I guess the next thing would be to find out what is the best way to know whether the server supports the n parameter. The openai client to we use in our code to communicate with the server handles it gracefully. I am not sure if this can be done with if self.client.supports_n: we probably may need another argument to distinguish between llama.cpp and vLLM.

Yep. The most straightforward thing is to pass down a serve_backend parameter to the library and we have vllm vs llama-cpp logic

I'd prefer something more like model_serve_capabilities.n = True or something ... but that would probably take more design work than we have appetite for right now, and I guess it could be added later

npalaska · 2024-07-03T18:06:45Z

Yep. The most straightforward thing is to pass down a serve_backend parameter to the library and we have vllm vs llama-cpp logic

Do you mean adding a new parameter serve_backend in the generate cli and then passing it down?

markmc · 2024-07-03T18:08:28Z

Yep. The most straightforward thing is to pass down a serve_backend parameter to the library and we have vllm vs llama-cpp logic

Do you mean adding a new parameter serve_backend in the generate cli and then passing it down?

no take it from the existing serve.backend config and pass it to generate_data() and then onto the Flow constructor

njhill · 2024-07-03T22:40:31Z

I worked with @npalaska to refactor and clean things up a bit, have opened a new PR #77. This doesn't require configuring the server type, it makes a test request to determine whether it supports batched inputs.

npalaska · 2024-07-03T22:46:36Z

no take it from the existing serve.backend config and pass it to generate_data() and then onto the Flow constructor

@markmc So we are doing it in a different way where we determine whether the backend server supports batching by making a test request. We don't need to pass any server configs. @njhill has opened a new PR #77 with these changes on top of changes from this PR and I will close this one.

markmc · 2024-07-11T08:59:10Z

Replaced by #105

Update actionlint to v1.7.0

mergify bot added the ci-failure label Jul 1, 2024

markmc reviewed Jul 1, 2024

View reviewed changes

npalaska force-pushed the remove_iterblock branch from 07f876a to 6bf23ce Compare July 1, 2024 18:43

mergify bot added the needs-rebase label Jul 2, 2024

oindrillac and others added 9 commits July 2, 2024 17:47

Remove the iterBlock and use openai's 'n' parameter instead

b7977ea

some debug

2ca0534

fix zipping of samples and outputs

6f1784c

some refactoring

7e8282d

fix the num_samples location

09fdffb

update generate_data API calls

1ac689a

change back the defaults in test scripts

0c90804

npalaska force-pushed the remove_iterblock branch from 261cbbe to 0c90804 Compare July 2, 2024 21:47

mergify bot removed the needs-rebase label Jul 2, 2024

fix SimpleFlows

20b9f7c

mergify bot added the needs-rebase label Jul 3, 2024

njhill mentioned this pull request Jul 3, 2024

Properly support batched/non-batched with vllm/llama.cpp #77

Closed

markmc closed this Jul 11, 2024

jwm4 pushed a commit to jwm4/sdg that referenced this pull request Dec 13, 2024

Merge pull request instructlab#58 from bjhargrave/actionlint-update

74f70a5

Update actionlint to v1.7.0

Replace Iterblock with LLMBlock #58

Replace Iterblock with LLMBlock #58

Conversation

npalaska commented Jul 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

npalaska Jul 2, 2024 • edited Loading

Choose a reason for hiding this comment

russellb Jul 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markmc commented Jul 1, 2024

markmc commented Jul 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

njhill commented Jul 1, 2024

markmc commented Jul 2, 2024

njhill commented Jul 2, 2024

markmc commented Jul 2, 2024

mergify bot commented Jul 2, 2024

npalaska commented Jul 2, 2024 • edited Loading

markmc commented Jul 3, 2024

markmc commented Jul 3, 2024

markmc commented Jul 3, 2024

mergify bot commented Jul 3, 2024

njhill commented Jul 3, 2024 • edited Loading

npalaska commented Jul 3, 2024

markmc commented Jul 3, 2024

npalaska commented Jul 3, 2024

markmc commented Jul 3, 2024

njhill commented Jul 3, 2024

npalaska commented Jul 3, 2024 • edited Loading

markmc commented Jul 11, 2024

npalaska Jul 2, 2024 •

edited

Loading

russellb Jul 2, 2024 •

edited

Loading

npalaska commented Jul 2, 2024 •

edited

Loading

njhill commented Jul 3, 2024 •

edited

Loading

npalaska commented Jul 3, 2024 •

edited

Loading