Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial fix phi3 device mapping #1002

Closed

Conversation

cdoko
Copy link
Contributor

@cdoko cdoko commented Dec 24, 2024

I've made a partial fix for device mapping issues on Phi3. Previously, device mapping didn't work across various models, including Phi2, Phi3, Mistral, and Llama.

The fix involves moving tensors needed to operate together to the same device. I've chosen the device where the cache is, assuming that moving the cache might be slower. This change allows Phi3 to be loaded across devices, and I've tested it with 2 GPUs and 1 GPU + 1 CPU.

The fix resolves the issue partially for Phi3, but other models still encounter a CUDA_ERROR_ILLEGAL_ADDRESS error that prevents them from loading successfully. In contrast, Phi3 can now be loaded without issues.

The CUDA_ERROR_ILLEGAL_ADDRESS error occurs in different scenarios for each model. For example, in the Mistral model, calling contiguous() on a tensor causes this error, and moving a tensor across devices also triggers it. I found it unusual that Phi3 is the only model that works with this fix, and certain operations like contiguous() work fine on Phi3 but not on other models.

However, there's still a broken aspect: sending a second request with the same prompt results in gibberish output. Notably, this behavior is currently equivalent to running with --no-paged-attn (using only 1 device), so the issue is not introduced by this fix. I suspect it's a bug in the cache manager. PA, on the other hand, does not have this issue.

I appreciate any feedback you can provide. I look forward to your review!

Copy link

github-actions bot commented Dec 24, 2024

Code Metrics Report
  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 C Header                2           35           28            0            7
 Dockerfile              1           41           22           10            9
 JSON                   12          105          104            0            1
 Python                 63         2706         2338           71          297
 Shell                   1           57           22           18           17
 Plain Text              3         3723            0         2413         1310
 TOML                   18          603          536            2           65
 YAML                    2           21           19            2            0
-------------------------------------------------------------------------------
 Jupyter Notebooks       4            0            0            0            0
 |- Markdown             2           77           32           31           14
 |- Python               2          205          178            1           26
 (Total)                            282          210           32           40
-------------------------------------------------------------------------------
 Markdown               43         3324            0         2520          804
 |- BASH                 6          101           98            0            3
 |- JSON                 1           12           12            0            0
 |- Python               7          121          109            0           12
 |- Rust                12          406          344            0           62
 |- TOML                 2           75           63            0           12
 (Total)                           4039          626         2520          893
-------------------------------------------------------------------------------
 Rust                  289        87939        78969         1804         7166
 |- Markdown           139         1534           25         1395          114
 (Total)                          89473        78994         3199         7280
===============================================================================
 Total                 438        98554        82038         6840         9676
===============================================================================
  

@cdoko cdoko closed this Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant