Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/main' into robin/thesis/experime…
Browse files Browse the repository at this point in the history
…nt-configs
  • Loading branch information
robinholzi committed Dec 28, 2024
2 parents d4e9080 + b04a2e1 commit 1ac3779
Show file tree
Hide file tree
Showing 42 changed files with 1,711 additions and 295 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# Images
*.svg
*.png

# Logging files
*.log

Expand Down
5 changes: 3 additions & 2 deletions .pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -186,8 +186,9 @@ disable=raw-checker-failed,
too-many-arguments, # we can't determine a good limit here. reviews should spot bad cases of this.
duplicate-code, # Mostly imports and test setup.
cyclic-import, # We use these inside methods that require models from multiple apps. Tests will catch actual errors.
too-many-instance-attributes # We always ignore this anyways

too-many-instance-attributes, # We always ignore this anyways
too-many-positional-arguments, # We do not want to limit the number of positional arguments
too-many-locals # We always ignore this anyways
# Enable the message, report, category or checker with the given id(s). You can
# either give multiple identifier separated by comma (,) or put this option
# multiple time (only on the command line, not in the configuration file where
Expand Down
18 changes: 15 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
[![codecov](https://codecov.io/github/eth-easl/modyn/graph/badge.svg?token=KFDCE03SQ4)](https://codecov.io/github/eth-easl/modyn)
[![License](https://img.shields.io/github/license/eth-easl/modyn)](https://img.shields.io/github/license/eth-easl/modyn)

Modyn is an open-source platform for model training on growing datasets, i.e., datasets where points get added over time.
Modyn is a data-centric machine learning pipeline orchestrator, i.e., a platform for model training on growing datasets where points get added over time. Check out our [blog post](https://systems.ethz.ch/research/blog/modyn.html) for a brief introduction.

</div>

Expand Down Expand Up @@ -55,9 +55,8 @@ For running all integration tests, run
Checkout our [Example Pipeline](docs/EXAMPLE.md) guide for an example on how to run a Modyn pipeline.
Checkout our [Technical Guidelines](docs/TECHNICAL.md) for some hints on developing Modyn and how to add new data selection and triggering policies.
Checkout the [Architecture Documentation](docs/ARCHITECTURE.md) for an overview of Modyn's components.
Last, checkout our [vision paper on Modyn](https://anakli.inf.ethz.ch/papers/MLonDynamicData_EuroMLSys23.pdf) for an introduction to model training on dynamic datasets.
Last, checkout our [full paper on Modyn](https://anakli.inf.ethz.ch/papers/modyn_sigmod25.pdf) for more technical background and experiments we ran using Modyn.

We are actively developing and designing Modyn, including more thorough documentation.
Please reach out via Github, Twitter, E-Mail, or any other channel of communication if you are interested in collaborating, have any questions, or have any problems running Modyn.

How to [contribute](docs/CONTRIBUTING.md).
Expand All @@ -81,3 +80,16 @@ We welcome input from both research and practice.

Modyn is being developed at the [Efficient Architectures and Systems Lab (EASL)](https://anakli.inf.ethz.ch/#Group) at the [ETH Zurich Systems Group](https://systems.ethz.ch/).
Please reach out to `mboether [at] inf [­dot] ethz [dot] ch` or open an issue on Github if you have any questions or inquiry related to Modyn and its usage.

### Paper / Citation

If you use Modyn, please cite our SIGMOD'25 paper:

```bibtex
@inproceedings{Bother2025Modyn,
author = {B\"{o}ther, Maximilian and Robroek, Ties and Gsteiger, Viktor and Ma, Xianzhe and T\"{o}z\"{u}n, P{\i}nar and Klimovic, Ana},
title = {Modyn: Data-Centric Machine Learning Pipeline Orchestration},
booktitle = {Proceedings of the Conference on Management of Data (SIGMOD)},
year = {2025},
}
```
1 change: 1 addition & 0 deletions _typos.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ extend-ignore-re = ["(?Rm)^.*# spellchecker:disable-line$"]
[default.extend-words]
strat = "strat"
fpr = "fpr"
ther = "ther"
8 changes: 4 additions & 4 deletions analytics/app/pages/plots/cost_vs_eval_metric_agg.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from dataclasses import dataclass
from typing import get_args
from typing import Any, get_args

import pandas as pd
import plotly.express as px
Expand Down Expand Up @@ -32,9 +32,9 @@ class _PageState:

def gen_fig_scatter_num_triggers(
page: str,
eval_handler: str,
dataset_id: str,
metric: str,
eval_handler: str | Any | None,
dataset_id: str | Any | None,
metric: str | Any | None,
agg_func_x: AGGREGATION_FUNCTION,
agg_func_y: EVAL_AGGREGATION_FUNCTION,
stages: list[str],
Expand Down
7 changes: 4 additions & 3 deletions analytics/app/pages/plots/eval_heatmap.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from dataclasses import dataclass
from typing import Any

import pandas as pd
from dash import Input, Output, callback, dcc, html
Expand Down Expand Up @@ -31,9 +32,9 @@ def gen_figure(
page: str,
multi_pipeline_mode: bool,
patch_yearbook: bool,
eval_handler: str,
dataset_id: str,
metric: str,
eval_handler: str | Any | None,
dataset_id: str | Any | None,
metric: str | Any | None,
) -> go.Figure:
"""Create the cost over time figure with barplot or histogram. Histogram
has nice binning while barplot is precise.
Expand Down
7 changes: 4 additions & 3 deletions analytics/app/pages/plots/eval_over_time.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from dataclasses import dataclass
from typing import Any

import pandas as pd
import plotly.express as px
Expand Down Expand Up @@ -29,9 +30,9 @@ def gen_figure(
page: str,
multi_pipeline_mode: bool,
patch_yearbook: bool,
eval_handler: str,
dataset_id: str,
metric: str,
eval_handler: str | Any | None,
dataset_id: str | Any | None,
metric: str | Any | None,
) -> go.Figure:
"""Create the evaluation over time figure with a line plot.
Expand Down
10 changes: 5 additions & 5 deletions analytics/app/pages/plots/num_samples.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,12 @@ class _PageState:
def gen_figure(
page: str,
multi_pipeline_mode: bool,
time_metric: str,
y_axis: YAxis,
use_scatter_size: bool,
time_metric: str | None,
y_axis: YAxis | None,
use_scatter_size: bool | None,
patch_yearbook: bool,
dataset_id: str,
eval_handler: str,
dataset_id: str | None,
eval_handler: str | None,
) -> go.Figure:
"""Create the cost over time figure with barplot or histogram. Histogram
has nice binning while barplot is precise.
Expand Down
6 changes: 3 additions & 3 deletions analytics/app/pages/plots/num_triggers_eval_metric.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,9 @@ class _PageState:
def gen_fig_scatter_num_triggers(
page: str,
multi_pipeline_mode: bool,
eval_handler: str,
dataset_id: str,
metric: str,
eval_handler: str | None,
dataset_id: str | None,
metric: str | None,
aggregate_metric: bool = True,
time_weighted: bool = True,
only_active_periods: bool = True,
Expand Down
Loading

0 comments on commit 1ac3779

Please sign in to comment.