Add CI Testing for Example Notebooks and Python Scripts #190

mnrozhkov · 2024-07-26T16:21:26Z

Overview:
This PR aims to open a discussion on an approach to automated tests and quality checks for examples (in Jupyter notebooks and Python scripts in the examples/ directory), ensuring compatibility with the latest datachain library.
Approach:

use papermill to ruin examples in Jupyter Notebooks (and commit updates - TBD)
use datachain query scriptname.py to test scripts designed for Studio

Context:
Right now, most efforts are focused on examples of open source libraries.
However, we also have clients and need to keep examples consistent with Studio, which is simple to copy-paste into Studio (along with additional imported modules). It seems reasonable to keep them closer to "notebooks" and reuse code and update altogether. Both notebooks and scripts are used for customer workshops.

These considerations were taken into account during the work on examples/computer_vision/fashion_product_images:

Scripts in scripts/ dir are designed specifically for Studio!
they are in "Studio format" (without if __name__ .... and with a query at the last line)
to test them, we need to use datachain query ... CLI, not python ...

How It Works:
The CI workflow .github/workflows/examples.yaml tests examples as follows:

Triggers on pushes to specified branches and PRs affecting examples/ or the workflow file.
Uses a matrix strategy to test multiple directories concurrently.
For each directory in the matrix:
- Checks for .ipynb and .py files
- Sets up a Python environment with required dependencies
- Executes JN with papermill using a tests/examples/test_notebooks.sh script
- Runs Python scripts with datachain query
An example passes if all its scripts and notebooks execute without errors.
Uploads test outputs and logs as artifacts for review.

This approach ensures comprehensive testing of all specified example directories, maintaining their functionality with the latest datachain version.

Pros:

Ensures example functionality with latest datachain version
Separate virtual environments for each example dir to ensure isolation.
Both Jupyter notebooks and Python scripts are tested.
Output logs and artifacts are stored for easy diagnostics.
Uses caching to speed up dependency installation.

Cons:

The CI process might be time-consuming for complex examples.
Resource-intensive if many tests run in parallel.
Requires maintenance of requirements.txt files for each example.

Requirements for Example directory:
To comply with this testing framework, each example must adhere to the following standards:

Directory Structure:
- Jupyter notebooks should be in the main example directory.
- Correct directory structure (notebooks in main dir, Studio scripts in scripts/)
Dependencies: All dependencies should be specified in a requirements.txt file in the same directory as the notebooks.
Linear Execution: Notebooks and scripts must be able to run from start to finish without manual intervention.
Datachain Compatibility: Python scripts should be compatible with the datachain query command.
Ordering: Scripts within a matrix directory run alphabetically (so use numbering in notebooks and script filenames). It allows shared artifacts between scripts (train -> model.pth -> inference)

I hope this draft makes sense to some extent and helps us to design an appropriate approach. Looking for your feedback and alternatives!
cc @shcheklein @mattseddon @dberenbaum @volkfox @tibor-mach

cloudflare-workers-and-pages · 2024-07-26T16:21:49Z

Deploying datachain-documentation with Cloudflare Pages

Latest commit:	`d8db0d3`
Status:	✅ Deploy successful!
Preview URL:	https://91ba8618.datachain-documentation.pages.dev
Branch Preview URL:	https://test-examples-ci.datachain-documentation.pages.dev

View logs

for more information, see https://pre-commit.ci

skshetry · 2024-07-27T03:54:54Z

I have some suggestions regarding the setup:

It might be better to drive all the tests via pytest instead of using custom scripts and YAML files. The datachain query script commands currently executed inside .github/workflows can be incorporated into pytest test cases. Similarly, you can execute notebooks through pytest.
Consider creating a nox session to execute notebooks and save outputs.

Additionally, I have a question: Is running the script and relying on the exit code sufficient, or should we implement some snapshot testing as well with syrupy?

mattseddon · 2024-07-29T03:23:26Z

Context: Right now, most efforts are focused on examples of open source libraries. However, we also have clients and need to keep examples consistent with Studio, which is simple to copy-paste into Studio (along with additional imported modules). It seems reasonable to keep them closer to "notebooks" and reuse code and update altogether. Both notebooks and scripts are used for customer workshops.

These considerations were taken into account during the work on examples/computer_vision/fashion_product_images:

Scripts in scripts/ dir are designed specifically for Studio!

they are in "Studio format" (without if __name__ .... and with a query at the last line)

to test them, we need to use datachain query ... CLI, not python ...

@mnrozhkov you have clear and specific expectations for these examples and I no longer believe this repository is the correct place for them. IMO they would be better placed in a separate customer-workshop repository which you can tune to your specific needs. Throwback to https://github.com/iterative/dvcx/issues/1659#issuecomment-2197406631 as I think the sentiment was similar there.

With that said, I fully agree with @skshetry that we should be building something that fits in with the current testing approach instead of something bespoke and different that sits outside of it. Something closer to #188 would be better.

FWIW -

Right now we have 24 scripts and 6 notebooks for examples and they are (obviously) an integral part of our documentation strategy. We do need however to balance out a few things. Each example should have a decent UX and be real enough to be meaningful BUT every example will come with a maintenance cost and for that reason, they each need a valid and specific reason to exist.

A good example of where I don't think we currently have this trade-off right is with the 3 claude-llm examples. We would get 90% of the same benefit by reducing to a single example and when you tie that in with the fact that we have the examples/llm/llm_chatbot_evaluation.ipynb notebook there is even less value in us maintaining those 3 scripts.

Maybe #146 was the right answer after all.

mnrozhkov · 2024-07-29T13:01:55Z

@skshetry @mattseddon thank you for your thoughtful comments and suggestions. They're very helpful!
You're right, using pytest makes sense for consistency, I'll check it out. We don't need snapshot testing for current examples, but it might be useful in the future.

Your suggestion to move customer-faced examples to a separate repository makes sense, given their unique purpose and details. Let's explore ways to:

Automate CI tests with each datachain release
Integrate them into datachain documentation under a "Tutorials" section (or smth similar)

My concerns for this setup:

datachain dev team stays outside of the customer-facing tutorials, and stays unnoticed if there are any breaking changes that affect exampels outside the datachain repo.
Understanding customer workflows (that we intend to implement in those tutorials) are important for the datachain development. We need to collaborate on these tutorials together to stay tuned!

@dberenbaum @shcheklein @tibor-mach WDYT?

shcheklein · 2024-07-30T02:48:20Z

I agree on using regular pytest if possible.

@mnrozhkov how long would it take to run (in parallel) all these tests on each PR / commit as a regular CI? Let's say using some smaller dataset(s)?

If it's doable, I would strongly prefer the same repo, always run them. Everything else immediately leads to "things are broken for a while, someone (even can be an author of the code change) is fixing is w/o a fresh context".

Integrate them into datachain documentation under a "Tutorials" section (or smth similar)

is it relevant to tests? (just trying to understand)

datachain dev team stays outside of the customer-facing tutorials, and stays unnoticed if there are any breaking changes that affect exampels outside the datachain repo.

agreed

Understanding customer workflows (that we intend to implement in those tutorials) are important for the datachain development. We need to collaborate on these tutorials together to stay tuned!

also, agreed

A few additional points. Same repo or different repo

as usual seems secondary question to me. More important we run things on CI each time or not. How many examples we maintain, etc.
100% agree that each example is a "burden" - we should be careful. We can't though just become too defensive, conservative. It feels unhealthy if we won't be able to add examples only because we don't have a good checks in place.

mnrozhkov · 2024-07-30T08:28:42Z

@mnrozhkov how long would it take to run (in parallel) all these tests on each PR / commit as a regular CI? Let's say using some smaller dataset(s)?

For "fashion-product-images" it takes ~4min now. But it's possible to reduce it at least twice.

| Integrate them into datachain documentation under a "Tutorials" section (or smth similar)
is it relevant to tests? (just trying to understand)

No

tibor-mach · 2024-07-30T12:54:28Z

@mnrozhkov First of all - great kick-off!

My two cents:

I would try to have some tests right now, even if they are not as comprehensive as they can be and fine-tune and refine them as examples crystallise. I think the examples themselves should be published quickly and treated as basically WIP as we see what is popular and make it better. So in that light I think that very crude tests which just run the script/notebook end-to-end and maybe check for some important parts of the outputs are good enough for the moment.
I would keep everything in the same repo for the same reason you mention - it becomes immediately apparent to everyone that something failed. In some cases that can still allow merging the pull requests so it should not be a huge blocker. But eveyrone stays aware of the fact that now some examples broke down.

shcheklein · 2024-07-30T17:42:36Z

For "fashion-product-images" it takes ~4min now. But it's possible to reduce it at least twice.

even 4 mins is totally fine if runs as a separate step in the CI (similar to our E2E tests in Studio, or the way we split datachain server tests). @0x2b3bfa0 might help with this I think.

mattseddon · 2024-07-30T23:18:46Z

@shcheklein I have a PR open at #199 for testing scripts. Looks like we are going to have issues with anything that isn't a toy example.

dmpetrov · 2024-07-31T02:22:24Z

@mnrozhkov you have clear and specific expectations for these examples and I no longer believe this repository is the correct place for them. I

We should consider moving tutorials and notebooks outside of the repo to prevent it from growing too large. Currently, I hear that it's difficult to fit notebooks to the 0.5MB file size limit, suggesting we may need to relax this restriction- grow repo even more.

From Slack:

hey team, there is an increasing number of Jupyter notebook tutorials in the repository, which is great and we need more! I like the idea of keeping docs in the repo at this stage. However, each notebook can be quite large (0.5MB+), and their frequent changes are causing the repo size to grow significantly.

Does anyone have ideas on how we can manage this?

dmpetrov · 2024-07-31T02:56:23Z

18Mb and 1 minutes to clone - it's a bit too much for 1 month old project 😅

$ time git clone https://github.com/iterative/datachain/
remote: Counting objects: 100% (1584/1584), done.
remote: Compressing objects: 100% (867/867), done.
Receiving objects: 100% (2182/2182), 12.00 MiB | 247.00 KiB/s, done.
Resolving deltas: 100% (1321/1321), done.

real	0m52.035s
user	0m0.533s
sys	0m0.432s

$ du -sh datachain/
 18M	datachain/

shcheklein · 2024-07-31T22:44:55Z

Discussed with @dmpetrov and the team yesterday. We agreed to move the notebooks to a separate repo (but run CI, ideally on each PR) still.

Add CI to test examples

dca8a90

[pre-commit.ci] auto fixes from pre-commit.com hooks

32baacc

for more information, see https://pre-commit.ci

mnrozhkov marked this pull request as draft July 26, 2024 16:22

mnrozhkov requested review from dberenbaum, shcheklein, volkfox, mattseddon and tibor-mach July 26, 2024 16:22

mnrozhkov mentioned this pull request Jul 26, 2024

Update imports & instructions for 3-train-model.py #155

Closed

mnrozhkov and others added 4 commits July 26, 2024 18:35

Update Commit & push step

a3295f0

Skip commiting updated Notebooks

021e013

Comment in commiting changes in Jupyter Notebooks

2827cca

[pre-commit.ci] auto fixes from pre-commit.com hooks

d8db0d3

for more information, see https://pre-commit.ci

mattseddon mentioned this pull request Jul 30, 2024

add example script smoke tests #199

Merged

mnrozhkov closed this Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CI Testing for Example Notebooks and Python Scripts #190

Add CI Testing for Example Notebooks and Python Scripts #190

mnrozhkov commented Jul 26, 2024 •

edited

Loading

cloudflare-workers-and-pages bot commented Jul 26, 2024 •

edited

Loading

skshetry commented Jul 27, 2024 •

edited

Loading

mattseddon commented Jul 29, 2024

mnrozhkov commented Jul 29, 2024 •

edited

Loading

shcheklein commented Jul 30, 2024

mnrozhkov commented Jul 30, 2024 •

edited

Loading

tibor-mach commented Jul 30, 2024

shcheklein commented Jul 30, 2024

mattseddon commented Jul 30, 2024

dmpetrov commented Jul 31, 2024

dmpetrov commented Jul 31, 2024

shcheklein commented Jul 31, 2024

Add CI Testing for Example Notebooks and Python Scripts #190

Add CI Testing for Example Notebooks and Python Scripts #190

Conversation

mnrozhkov commented Jul 26, 2024 • edited Loading

cloudflare-workers-and-pages bot commented Jul 26, 2024 • edited Loading

Deploying datachain-documentation with Cloudflare Pages

skshetry commented Jul 27, 2024 • edited Loading

mattseddon commented Jul 29, 2024

mnrozhkov commented Jul 29, 2024 • edited Loading

shcheklein commented Jul 30, 2024

mnrozhkov commented Jul 30, 2024 • edited Loading

tibor-mach commented Jul 30, 2024

shcheklein commented Jul 30, 2024

mattseddon commented Jul 30, 2024

dmpetrov commented Jul 31, 2024

dmpetrov commented Jul 31, 2024

shcheklein commented Jul 31, 2024

mnrozhkov commented Jul 26, 2024 •

edited

Loading

cloudflare-workers-and-pages bot commented Jul 26, 2024 •

edited

Loading

skshetry commented Jul 27, 2024 •

edited

Loading

mnrozhkov commented Jul 29, 2024 •

edited

Loading

mnrozhkov commented Jul 30, 2024 •

edited

Loading