-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
source instead of exec in run-readme-pr-macos.yml #1476
Conversation
source test commands instead of executing them. (Possible fix for pytorch#1315 )
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1476
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 6a5c3a7 with merge base 201411c ( BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
source instead of exec
somebody pushed all the model exports into exportedModels, but... we never create the directory. we should do that also do this in the user instructions, just because storing into a directory that doesn't exist is not good :)
@Jack-Khuu when it rains it pours, that showed more false positives per #1315 than anybody could anticipate! I added the directory that all the examples use (but don't create!), or we can just removed the directory... |
@Jack-Khuu Ideally we start a run for 1476, and in parallel commit 1409, 1410, 1417, 1439, 1455, 1466. PS: In a nutshell, failures for the doc based runs haven't bubbled up because a failure inside a shell script that's executed with bash did not seem to pass failure information to upstream. Using source to run the multiple layers rectifies this, and may be a pragmatic answer to restoring full test coverage. (I think right now we've cought some of the fails that have not bubbled up to hud.pytorch.org because of the exec/bash dichotomy by eye balling which is not a healthy long term solution. |
Yup, making my way through those CI PR's then we'll rebase this one Our current coverage has plenty of gaps and has honestly been adhoc, so revamping the CI and creating a comprehensive unittest system is a P0 KR for us this half (working on it with @Gasoonjia). Thanks again for grinding through these!! |
pip3 not found. I guess we do conda for this environment. That's interesting. How do we deal with conda like that. or is it just pip vs pip3. ( https://github.com/pytorch/torchchat/actions/runs/12942557291/job/36137959147?pr=1476 |
multimodal doc needed end of tests comment.
Need to download files before using them, lol. We expect the users to do this, but we should verbalize. Plus, if we extract for testing, then it obviously fails.
( triggers unexpected token in macos zsh
# metadata does not install properly on macos # .ci/scripts/run-docs multimodal
# metadata does not install properly on macos # .ci/scripts/run-docs multimodal
https://hud.pytorch.org/pr/pytorch/torchchat/1476#36153855033
|
install wget
echo ".ci/scripts/run-docs native DISABLED" # .ci/scripts/run-docs native
echo ".ci/scripts/run-docs native DISABLED" # .ci/scripts/run-docs native
pip3 command not found. This is called from
|
Looks like #1362 for the mismatched group size is finally marked as failing properly cc: @Gasoonjia |
Some issues: we can't find pip3 and/or conda. https://github.com/pytorch/torchchat/actions/runs/12996559809/job/36252363658
https://ossci-raw-job-status.s3.amazonaws.com/log/pytorch/torchchat/36252362180
https://ossci-raw-job-status.s3.amazonaws.com/log/pytorch/torchchat/36252360312
|
The following is an issue specific to the use of stories... because the features aren't multiple of 256 groupsize. Originally, I had included padding or otherwise support for handling this (Embeddings quantization just handles an "unfull" group for example). Since moving to torchao we're insisting on the multiplicity of the features size and group size.
Options: I'll assume that (3) might take a while for discussion and implementation and going with (1) or (2) is probably the pragmatic solution. (With the caveat that (2) won't test gs=256, but it may be the quickest to implement and not sure what the smallest model for (1) is. (I haven't looked at Stories 110M which may be an acceptable stand-in re: feature sizes being multiple of 256, although it will drive up runtime of our tests....) Resolved via (2) for now |
switch to gs=32 quantization (requires consolidated run-docs of pytorch#1439)
add gs=32 cuda quantization for use w/ stories15M
add gs=32 for stories15M
https://github.com/pytorch/torchchat/actions/runs/13021784616/job/36327025188?pr=1476 test-advanced-any
|
try to install pip & pip3
debug which pip || true which pip3 || true which conda || true
debug info ``` which pip || true which pip3 || true which conda || true ```
use group size 32 which works on all models
Cleanup, comment non-working tests
Uncomment test code requiring unavailable pip3
comment non-working tests
comment out test code requiring pip3
Avoid nested quotes
Enable distributed test
Remove extraneous debug messages from install_requirements.sh
remove debug
Comment out failing quantization-any (glibc version issue) and distributed (nccl usage)
@Jack-Khuu can you please restart the tests? I fixed the runner for x86 (cpu/gpu) and enabled the first few basic tests. Two issues pip/pip3 did not exist and error status was returned. A few of the tests work now reliably when error status checked properly and pip available. While errors were ignored (see #1315), all sorts of other tests started to fail (the usual bit rot when tests don't actually execute). Once the first few are enabled, we can than look at tests individually and asign fails to subsystem owners (nobody will feel compelled to fix things when 30+ tests fail because how can they know it's their problem and not some underlying common issue?) |
@Jack-Khuu Is torchao working elsewhere?
|
@Jack-Khuu readme and quantization both failing with torchao build issues. -- I thought we previously were downloading prebuilt torchao? In any event, same issue as previously.
We might disable these tests (which means we'd have disabled all tests!) and land the framework for testing commands from documentation, and then create separate PRs for each test that we can assign for subject matter experts? Let me know how you;d like to proceed. |
@Jack-Khuu readme and quantization both failing with torchao build issues. -- I thought we previously were downloading prebuilt torchao? In any event, same issue as previously.
We might disable these tests (which means we'd have disabled all tests!) and land the framework for testing commands from documentation, and then create separate PRs for each test that we can assign for subject matter experts? Let me know how you'd like to proceed. |
Working as in wheels are being built properly?
Previously we were pinned to major releases from AO, but when we moved to using AO nightly (which doesn't have macos builds), we build from source torchchat/install/install_requirements.sh Line 122 in b57b2be
Since the README tests are giving us no signal; disabling them is honestly the best move. We can use this PR to do so: Disable all README tests that provide no signal
This sounds like a working approach. Let's get a fresh Issue spun up with details so we can link the subject matter experts to it |
Disable remaining tests
enable readme
remove run of readme
The failing tests are failing on main so I'm looking into it; hold tight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for all this
@@ -82,17 +82,17 @@ Here are some examples of quantization configurations | |||
``` | |||
* Only quantize linear layers | |||
``` | |||
--quantize '{"linear:a8w4dq": {"groupsize" : 256}}' | |||
--quantize '{"linear:a8w4dq": {"groupsize" : 32}}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to jog my memory: Is the only reason we use 32 here instead of 256 is so that the stories model works?
source test commands instead of executing them.
(Possible fix for #1315 )