StaticWhisperPipeline change to work with optimum models #1103

eshiryae · 2024-10-29T22:30:34Z

Fixes #895

src/cpp/src/whisper_pipeline.cpp

src/cpp/src/whisper_pipeline_static.cpp

src/cpp/src/whisper_pipeline.cpp

… static pipeline

as-suvorov · 2024-10-30T17:18:24Z

src/cpp/src/whisper_pipeline_static.cpp

+            preprocessor.input(tensor.get_any_name()).tensor().set_element_type(ov::element::Type_t::f16);
+            preprocessor.input(tensor.get_any_name()).preprocess().convert_element_type();
+
+            // if (tensor.get_any_name().find(".value") != std::string::npos) {


Seems redundant, do we need to keep it?

as-suvorov · 2024-10-30T17:18:37Z

src/cpp/src/whisper_pipeline_static.cpp

+            preprocessor.output(tensor.get_any_name()).tensor().set_element_type(ov::element::Type_t::f16);
+            preprocessor.output(tensor.get_any_name()).postprocess().convert_element_type();
+
+            // if (tensor.get_any_name().find(".value") != std::string::npos) {


soumendukrg · 2024-10-30T18:37:13Z

src/cpp/src/whisper_pipeline_static.cpp

@@ -160,10 +171,9 @@ int64_t decode_with_past(ov::InferRequest& decoder_with_past,
                         const std::vector<int64_t>& generated_tokens) {
    // FIXME: Avoid this cast to i32. Why it's not i64 precision in model?
    decoder_with_past.get_tensor("input_ids").data<int32_t>()[0] = static_cast<int32_t>(input_id);


Since optimum creates i64 dtype for input_ids, do we still need this cast? This cast was initially required to support NPU-friendly models.

I believe it's attempt to align model generated on-the-fly to NPU-friendly one. Perhaps just leave it as i64 is fine

TolyaTalamanov

Slight revision of preprocessing stuff perhaps still needed here, but it's not critical for now, let's merge it!

TolyaTalamanov · 2024-10-30T18:43:01Z

src/cpp/src/whisper_pipeline_static.cpp

+void preprocess_encoder(std::shared_ptr<ov::Model> model) {
+    ov::preprocess::PrePostProcessor preprocessor(model);
+
+    preprocessor.input("input_features").tensor().set_element_type(ov::element::Type_t::f32);


it's already f32, isn't it?

TolyaTalamanov · 2024-10-30T18:43:54Z

src/cpp/src/whisper_pipeline_static.cpp

+    pm.run_passes(model);
+}
+
+void reshape_to_static(std::shared_ptr<ov::Model> model, const uint32_t input_size, const uint32_t kvcache_size) {


Perhaps should be separate function for every model to avoid confusion

I agree. Having separate reshape_to_static for decoder and decoder with past will be helpful.

TolyaTalamanov · 2024-10-30T18:44:36Z

src/cpp/src/whisper_pipeline_static.cpp

+            //    preprocessor.output(tensor.get_any_name()).tensor().set_layout(ov::Layout("NCWH"));
+            //    preprocessor.output(tensor.get_any_name()).model().set_layout(ov::Layout("NCHW"));
+            //} else if (tensor.get_any_name().find(".key") != std::string::npos) {
+            //    preprocessor.output(tensor.get_any_name()).tensor().set_layout(ov::Layout("NCHW"));


Let's remove it with refactoring PR

soumendukrg · 2024-10-30T18:52:39Z

src/cpp/src/whisper_pipeline_static.cpp

+            const auto& partial_shape = input.get_partial_shape();
+            new_shape = partial_shape;
+            new_shape[0] = 1;     // batch_dim
+            new_shape[1] = 1500;  // FIXME: is it got from encoder output{'last_hidden_state'}


Encoder hidden states is not needed as inputs to decoder with past.

It is however a required input for decoder model and static shapes are 1 for batch and 1500 for encoder sequence length as you have added. However, even the last dimension is also dynamic and varies with model (checked optimum exported model using transformers v4.45.2 and optimum-intel v1.20.0.

Ideally, the encoder output 'last_hidden_state' dimension can be used to reshape the encoder_hidden_states input to decoder. This will be straightforward.

This function should be split on 3 (encoder, decoder, decoder_with_past) to avoid further confusion - let's do this clean up after main part is merged

soumendukrg · 2024-10-30T18:54:45Z

src/cpp/src/whisper_pipeline_static.cpp

+    pm.run_passes(model);
+}
+
+void add_attention_mask_input(std::shared_ptr<ov::Model> model) {


Why are there 2 add_attention_mask functions?

soumendukrg · 2024-10-30T18:59:00Z

src/cpp/src/whisper_pipeline_static.cpp

+            new_shape = ov::PartialShape({1, input_size});
+        } else if (input_name.find("attention_mask") != std::string::npos) {
+            new_shape = ov::PartialShape({1, kvcache_size + 1});
+        } else if (input_name.find("position_ids") != std::string::npos) {


position_ids is now deprecated as inputs, replaced with cache_position. May be removed?

I believe it's left from non optimum-cli models, should be removed ofc

soumendukrg · 2024-10-30T19:04:24Z

src/cpp/src/whisper_pipeline_static.cpp

+            const auto& partial_shape = input.get_partial_shape();
+            new_shape = partial_shape;
+            new_shape[0] = 1;     // batch_dim
+            new_shape[1] = 1500;  // FIXME: is it got from encoder output{'last_hidden_state'}


It is however a required input for decoder model and static shapes are 1 for batch and 1500 for encoder sequence length as you have added. However, even the last dimension is also dynamic and varies with model (checked optimum exported model using transformers v4.45.2 and optimum-intel v1.20.0.

Ideally, the encoder output 'last_hidden_state' dimension can be used to reshape the encoder_hidden_states input to decoder. This will be straightforward.

soumendukrg · 2024-10-30T19:08:22Z

src/cpp/src/whisper_pipeline_static.cpp

+        if (input_name.find("input_ids") != std::string::npos) {
+            new_shape = ov::PartialShape({1, input_size});
+        } else if (input_name.find("attention_mask") != std::string::npos) {
+            new_shape = ov::PartialShape({1, kvcache_size + 1});


Since 448 is passed as kvcache_size, using kvcache_size + 1 creates attention_mask of size 1, 449 which is wrong as 448 is max supported by model.

Use 1, kvcache_size for mask, as the past_key_values is later reshaped to kvcache_size - 1.

I believe it's the matter of naming. The real kv cache size is 449 in this case though

github-actions bot added category: whisper Whisper pipeline category: sampling Sampling / Decoding algorithms labels Oct 29, 2024

ilya-lavrenov assigned as-suvorov Oct 30, 2024

as-suvorov reviewed Oct 30, 2024

View reviewed changes

src/cpp/src/whisper_pipeline.cpp Outdated Show resolved Hide resolved

src/cpp/src/whisper_pipeline_static.cpp Show resolved Hide resolved

src/cpp/src/whisper_pipeline_static.cpp Outdated Show resolved Hide resolved

src/cpp/src/whisper_pipeline.cpp Outdated Show resolved Hide resolved

eshiryae force-pushed the b_whisper_test branch from 9b53403 to ba2370e Compare October 30, 2024 12:30

TolyaTalamanov self-requested a review October 30, 2024 14:29

eshiryae force-pushed the b_whisper_test branch from ba2370e to e2557b2 Compare October 30, 2024 16:25

github-actions bot added category: samples GenAI samples and removed category: whisper Whisper pipeline labels Oct 30, 2024

eshiryae marked this pull request as ready for review October 30, 2024 16:26

eshiryae added 3 commits October 30, 2024 16:35

Inject attension_mask for decoder/decoder with past models in Whisper…

458c1f8

… static pipeline

Add model modifications for StaticWhisperPipeline

3097473

Fix attention mask setting and remove debug logs

e2557b2

ilya-lavrenov added this to the 2024.5 milestone Oct 30, 2024

as-suvorov reviewed Oct 30, 2024

View reviewed changes

as-suvorov approved these changes Oct 30, 2024

View reviewed changes

soumendukrg reviewed Oct 30, 2024

View reviewed changes

TolyaTalamanov approved these changes Oct 30, 2024

View reviewed changes

TolyaTalamanov added this pull request to the merge queue Oct 30, 2024

ilya-lavrenov added the Code Freeze label Oct 30, 2024

soumendukrg reviewed Oct 30, 2024

View reviewed changes

soumendukrg suggested changes Oct 30, 2024

View reviewed changes

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 30, 2024

ilya-lavrenov added this pull request to the merge queue Oct 31, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 31, 2024

ilya-lavrenov added this pull request to the merge queue Oct 31, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 31, 2024

Wovchena added this pull request to the merge queue Oct 31, 2024

andrei-kochin removed this pull request from the merge queue due to a manual request Oct 31, 2024

andrei-kochin merged commit cb2d527 into openvinotoolkit:master Oct 31, 2024
49 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StaticWhisperPipeline change to work with optimum models #1103

StaticWhisperPipeline change to work with optimum models #1103

eshiryae commented Oct 29, 2024 •

edited by ilya-lavrenov

Loading

as-suvorov Oct 30, 2024

as-suvorov Oct 30, 2024

soumendukrg Oct 30, 2024

TolyaTalamanov Oct 30, 2024

TolyaTalamanov left a comment

TolyaTalamanov Oct 30, 2024

TolyaTalamanov Oct 30, 2024

soumendukrg Oct 30, 2024

TolyaTalamanov Oct 30, 2024

soumendukrg Oct 30, 2024

soumendukrg Oct 30, 2024

TolyaTalamanov Oct 30, 2024

soumendukrg Oct 30, 2024

soumendukrg Oct 30, 2024

TolyaTalamanov Oct 30, 2024

soumendukrg Oct 30, 2024

soumendukrg Oct 30, 2024

TolyaTalamanov Oct 30, 2024

StaticWhisperPipeline change to work with optimum models #1103

StaticWhisperPipeline change to work with optimum models #1103

Conversation

eshiryae commented Oct 29, 2024 • edited by ilya-lavrenov Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TolyaTalamanov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eshiryae commented Oct 29, 2024 •

edited by ilya-lavrenov

Loading