Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Example] ggml: add Qwen2-VL example #166

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

[Example] ggml: add Qwen2-VL example #166

wants to merge 3 commits into from

Conversation

dm4
Copy link
Member

@dm4 dm4 commented Dec 24, 2024

Copy link
Member

juntao commented Dec 24, 2024

Hello, I am a code review agent on flows.network. Here are my reviews of changed source code files in this PR.


.github/workflows/llama.yml

Potential issues

  1. Repeated Model Downloads: Multiple jobs download the same model files (e.g., llama-2-7b-chat.Q5_K_M.gguf) from Hugging Face, leading to redundant network requests and potential storage bloat. Consider caching these models or using a shared cache step.

  2. Hardcoded Environment Variables: The NGL environment variable is set in the Set environment variable step but also directly used in some job steps without ensuring it's always defined (e.g., --env n_gpu_layers="$NGL"). This can lead to undefined behavior if the variable isn't properly set for all jobs.

  3. Inconsistent Input Formats: Different jobs use varying input formats and prompts, which may confuse or cause errors in models expecting a specific format (e.g., '<start_of_turn>user Where is the capital of Japan? <end_of_turn><start_of_turn>model' vs. $'[INST] <<SYS>>\nYou are a helpful...). Standardizing input formats would improve reliability and maintainability.

Summary of changes

  • Key Changes:
  • Updated wasmedge version from ["0.13.5", "0.14.0"] to ["0.14.1"].
  • Updated plugin to [wasi_nn-ggml-b4419].
  • Added a new job for Qwen2-VL, including downloading models, building the project, and running inference on an image.

wasmedge-ggml/qwen2vl/Cargo.toml

Potential issues

  1. The edition field is set to "2021", which is not a valid Rust edition; it should be "2018" or "2021" (if using the correct year) but typically specified as "2021" without quotes.
  2. The dependency wasmedge-wasi-nn = "0.7.1" might be outdated; consider checking for a newer version to benefit from bug fixes and improvements.
  3. The name field uses hyphens, which is unconventional for Rust package names; use underscores instead, e.g., wasmedge_ggml_qwen2vl.

Summary of changes

  • • Added a new Cargo.toml file for package configuration.
    • Defined the package name as "wasmedge-ggml-qwen2vl" and set its version to "0.1.0".
    • Specified dependencies including serde_json version "1.0" and wasmedge-wasi-nn version "0.7.1".

wasmedge-ggml/qwen2vl/src/main.rs

Potential issues

  1. Error Handling in get_options_from_env: The function uses unwrap() on the result of serde_json::from_str, which can panic if the JSON is malformed. Replace unwrap() with proper error handling to prevent panics.

  2. Tensor Data Size in set_data_to_context: The function does not check if the input data size matches the tensor shape, which could lead to unexpected behavior or errors. Consider adding a size validation check.

  3. Metadata Parsing in get_metadata_from_context: The function assumes that specific keys exist in the metadata JSON without checking, which can cause runtime errors if the keys are missing. Add key existence checks before accessing them.

Summary of changes

  • Key Changes:
  • Added Qwen2-VL Example Code: Introduced a new Rust program that integrates with the wasmedge_wasi_nn library to handle multimedia projection and image processing, specifically for the Qwen2-VL model.
  • Environment Variable Handling: Implemented functions to read required (mmproj, image) and optional environment variables (enable_log, ctx_size, n_gpu_layers) to configure the model's behavior and execution context dynamically.
  • Interactive Prompt Processing: Added interactive prompt handling where users can input text, and the system processes it through a Qwen2-VL model in an ongoing dialogue format, resetting the context when necessary if the prompt becomes too long or the context is full.

wasmedge-ggml/qwen2vl/wasmedge-ggml-qwen2vl.wasm

Potential issues

  1. Memory Access Issue: The code uses hard-coded memory addresses and indices (e.g., A�j"�, A�lj"�) which can lead to undefined behavior if the data structure layout changes, making the code fragile and difficult to maintain.

  2. Magic Numbers: The presence of numerous magic numbers (e.g., , , ) without clear definitions or comments makes it hard to understand their significance, increasing the risk of errors during maintenance.

  3. Lack of Error Checking: The code does not check for errors after calling critical functions like A��j$���� or B�7�, which can lead to silent failures and make debugging more challenging.

Summary of changes

  • • Added a new example configuration for Qwen2-VL.
    • Integrated necessary model-specific parameters and settings for Qwen2-VL.
    • Included sample code or instructions to run the Qwen2-VL example.

@dm4 dm4 marked this pull request as draft December 25, 2024 08:03
@dm4 dm4 marked this pull request as ready for review January 2, 2025 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants