Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CI Testing for Example Notebooks and Python Scripts #190

Closed
wants to merge 6 commits into from

Conversation

mnrozhkov
Copy link
Contributor

@mnrozhkov mnrozhkov commented Jul 26, 2024

Overview:
This PR aims to open a discussion on an approach to automated tests and quality checks for examples (in Jupyter notebooks and Python scripts in the examples/ directory), ensuring compatibility with the latest datachain library.
Approach:

  • use papermill to ruin examples in Jupyter Notebooks (and commit updates - TBD)
  • use datachain query scriptname.py to test scripts designed for Studio

Context:
Right now, most efforts are focused on examples of open source libraries.
However, we also have clients and need to keep examples consistent with Studio, which is simple to copy-paste into Studio (along with additional imported modules). It seems reasonable to keep them closer to "notebooks" and reuse code and update altogether. Both notebooks and scripts are used for customer workshops.

These considerations were taken into account during the work on examples/computer_vision/fashion_product_images:

  • Scripts in scripts/ dir are designed specifically for Studio!
  • they are in "Studio format" (without if __name__ .... and with a query at the last line)
  • to test them, we need to use datachain query ... CLI, not python ...

How It Works:
The CI workflow .github/workflows/examples.yaml tests examples as follows:

  1. Triggers on pushes to specified branches and PRs affecting examples/ or the workflow file.
  2. Uses a matrix strategy to test multiple directories concurrently.
  3. For each directory in the matrix:
    • Checks for .ipynb and .py files
    • Sets up a Python environment with required dependencies
    • Executes JN with papermill using a tests/examples/test_notebooks.sh script
    • Runs Python scripts with datachain query
  4. An example passes if all its scripts and notebooks execute without errors.
  5. Uploads test outputs and logs as artifacts for review.

This approach ensures comprehensive testing of all specified example directories, maintaining their functionality with the latest datachain version.

Pros:

  • Ensures example functionality with latest datachain version
  • Separate virtual environments for each example dir to ensure isolation.
  • Both Jupyter notebooks and Python scripts are tested.
  • Output logs and artifacts are stored for easy diagnostics.
  • Uses caching to speed up dependency installation.

Cons:

  • The CI process might be time-consuming for complex examples.
  • Resource-intensive if many tests run in parallel.
  • Requires maintenance of requirements.txt files for each example.

Requirements for Example directory:
To comply with this testing framework, each example must adhere to the following standards:

  1. Directory Structure:
    • Jupyter notebooks should be in the main example directory.
    • Correct directory structure (notebooks in main dir, Studio scripts in scripts/)
  2. Dependencies: All dependencies should be specified in a requirements.txt file in the same directory as the notebooks.
  3. Linear Execution: Notebooks and scripts must be able to run from start to finish without manual intervention.
  4. Datachain Compatibility: Python scripts should be compatible with the datachain query command.
  5. Ordering: Scripts within a matrix directory run alphabetically (so use numbering in notebooks and script filenames). It allows shared artifacts between scripts (train -> model.pth -> inference)

I hope this draft makes sense to some extent and helps us to design an appropriate approach. Looking for your feedback and alternatives!
cc @shcheklein @mattseddon @dberenbaum @volkfox @tibor-mach

Copy link

cloudflare-workers-and-pages bot commented Jul 26, 2024

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: d8db0d3
Status: ✅  Deploy successful!
Preview URL: https://91ba8618.datachain-documentation.pages.dev
Branch Preview URL: https://test-examples-ci.datachain-documentation.pages.dev

View logs

@skshetry
Copy link
Member

skshetry commented Jul 27, 2024

I have some suggestions regarding the setup:

  1. It might be better to drive all the tests via pytest instead of using custom scripts and YAML files. The datachain query script commands currently executed inside .github/workflows can be incorporated into pytest test cases. Similarly, you can execute notebooks through pytest.
  2. Consider creating a nox session to execute notebooks and save outputs.

Additionally, I have a question: Is running the script and relying on the exit code sufficient, or should we implement some snapshot testing as well with syrupy?

@mattseddon
Copy link
Member

Context: Right now, most efforts are focused on examples of open source libraries. However, we also have clients and need to keep examples consistent with Studio, which is simple to copy-paste into Studio (along with additional imported modules). It seems reasonable to keep them closer to "notebooks" and reuse code and update altogether. Both notebooks and scripts are used for customer workshops.

These considerations were taken into account during the work on examples/computer_vision/fashion_product_images:

  • Scripts in scripts/ dir are designed specifically for Studio!
  • they are in "Studio format" (without if __name__ .... and with a query at the last line)
  • to test them, we need to use datachain query ... CLI, not python ...

@mnrozhkov you have clear and specific expectations for these examples and I no longer believe this repository is the correct place for them. IMO they would be better placed in a separate customer-workshop repository which you can tune to your specific needs. Throwback to https://github.com/iterative/dvcx/issues/1659#issuecomment-2197406631 as I think the sentiment was similar there.

With that said, I fully agree with @skshetry that we should be building something that fits in with the current testing approach instead of something bespoke and different that sits outside of it. Something closer to #188 would be better.

FWIW -

Right now we have 24 scripts and 6 notebooks for examples and they are (obviously) an integral part of our documentation strategy. We do need however to balance out a few things. Each example should have a decent UX and be real enough to be meaningful BUT every example will come with a maintenance cost and for that reason, they each need a valid and specific reason to exist.

A good example of where I don't think we currently have this trade-off right is with the 3 claude-llm examples. We would get 90% of the same benefit by reducing to a single example and when you tie that in with the fact that we have the examples/llm/llm_chatbot_evaluation.ipynb notebook there is even less value in us maintaining those 3 scripts.

Maybe #146 was the right answer after all.

@mnrozhkov
Copy link
Contributor Author

mnrozhkov commented Jul 29, 2024

@skshetry @mattseddon thank you for your thoughtful comments and suggestions. They're very helpful!
You're right, using pytest makes sense for consistency, I'll check it out. We don't need snapshot testing for current examples, but it might be useful in the future.

Your suggestion to move customer-faced examples to a separate repository makes sense, given their unique purpose and details. Let's explore ways to:

  • Automate CI tests with each datachain release
  • Integrate them into datachain documentation under a "Tutorials" section (or smth similar)

My concerns for this setup:

  1. datachain dev team stays outside of the customer-facing tutorials, and stays unnoticed if there are any breaking changes that affect exampels outside the datachain repo.
  2. Understanding customer workflows (that we intend to implement in those tutorials) are important for the datachain development. We need to collaborate on these tutorials together to stay tuned!

@dberenbaum @shcheklein @tibor-mach WDYT?

@shcheklein
Copy link
Member

I agree on using regular pytest if possible.

@mnrozhkov how long would it take to run (in parallel) all these tests on each PR / commit as a regular CI? Let's say using some smaller dataset(s)?

If it's doable, I would strongly prefer the same repo, always run them. Everything else immediately leads to "things are broken for a while, someone (even can be an author of the code change) is fixing is w/o a fresh context".

Integrate them into datachain documentation under a "Tutorials" section (or smth similar)

is it relevant to tests? (just trying to understand)

datachain dev team stays outside of the customer-facing tutorials, and stays unnoticed if there are any breaking changes that affect exampels outside the datachain repo.

agreed

Understanding customer workflows (that we intend to implement in those tutorials) are important for the datachain development. We need to collaborate on these tutorials together to stay tuned!

also, agreed

A few additional points. Same repo or different repo

  • as usual seems secondary question to me. More important we run things on CI each time or not. How many examples we maintain, etc.
  • 100% agree that each example is a "burden" - we should be careful. We can't though just become too defensive, conservative. It feels unhealthy if we won't be able to add examples only because we don't have a good checks in place.

@mnrozhkov
Copy link
Contributor Author

mnrozhkov commented Jul 30, 2024

@mnrozhkov how long would it take to run (in parallel) all these tests on each PR / commit as a regular CI? Let's say using some smaller dataset(s)?

For "fashion-product-images" it takes ~4min now. But it's possible to reduce it at least twice.

| Integrate them into datachain documentation under a "Tutorials" section (or smth similar)
is it relevant to tests? (just trying to understand)

No

@tibor-mach
Copy link
Contributor

@mnrozhkov First of all - great kick-off!

My two cents:

  1. I would try to have some tests right now, even if they are not as comprehensive as they can be and fine-tune and refine them as examples crystallise. I think the examples themselves should be published quickly and treated as basically WIP as we see what is popular and make it better. So in that light I think that very crude tests which just run the script/notebook end-to-end and maybe check for some important parts of the outputs are good enough for the moment.
  2. I would keep everything in the same repo for the same reason you mention - it becomes immediately apparent to everyone that something failed. In some cases that can still allow merging the pull requests so it should not be a huge blocker. But eveyrone stays aware of the fact that now some examples broke down.

@shcheklein
Copy link
Member

For "fashion-product-images" it takes ~4min now. But it's possible to reduce it at least twice.

even 4 mins is totally fine if runs as a separate step in the CI (similar to our E2E tests in Studio, or the way we split datachain server tests). @0x2b3bfa0 might help with this I think.

@mattseddon
Copy link
Member

@shcheklein I have a PR open at #199 for testing scripts. Looks like we are going to have issues with anything that isn't a toy example.

@dmpetrov
Copy link
Member

@mnrozhkov you have clear and specific expectations for these examples and I no longer believe this repository is the correct place for them. I

We should consider moving tutorials and notebooks outside of the repo to prevent it from growing too large. Currently, I hear that it's difficult to fit notebooks to the 0.5MB file size limit, suggesting we may need to relax this restriction- grow repo even more.

From Slack:

hey team, there is an increasing number of Jupyter notebook tutorials in the repository, which is great and we need more! I like the idea of keeping docs in the repo at this stage. However, each notebook can be quite large (0.5MB+), and their frequent changes are causing the repo size to grow significantly.

Does anyone have ideas on how we can manage this?

@dmpetrov
Copy link
Member

18Mb and 1 minutes to clone - it's a bit too much for 1 month old project 😅

$ time git clone https://github.com/iterative/datachain/
remote: Counting objects: 100% (1584/1584), done.
remote: Compressing objects: 100% (867/867), done.
Receiving objects: 100% (2182/2182), 12.00 MiB | 247.00 KiB/s, done.
Resolving deltas: 100% (1321/1321), done.

real	0m52.035s
user	0m0.533s
sys	0m0.432s
$ du -sh datachain/
 18M	datachain/

@shcheklein
Copy link
Member

Discussed with @dmpetrov and the team yesterday. We agreed to move the notebooks to a separate repo (but run CI, ideally on each PR) still.

@mnrozhkov mnrozhkov closed this Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants