Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WG Data proposal #673

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
6 changes: 6 additions & 0 deletions OWNERS_ALIASES
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,12 @@ aliases:
- gaocegege
- johnugeorge
- tenzen-y
wg-data-leads:
- ChenYi015
- andreyvelich
- franciscojavierarceo
- rareddy
- tarilabs
wg-deployment-leads:
- PatrickXYS
- animeshsingh
Expand Down
35 changes: 35 additions & 0 deletions wg-data/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
<!---
This is an autogenerated file!

Please do not edit this file directly, but instead make changes to the
sigs.yaml file in the project root.

To understand how this file is generated, see https://github.com/kubeflow/community/blob/master/generator/README.md
--->
# Data Working Group

The WG "Data" is focused on enhancing the support for data and metadata-related tasks within Kubeflow, with a specific focus on the Spark operator, Feast and Model Registry. The group aims to simplify and improve data and feature processing between various stages of ML lifecycle. For example, from Data Preparation to model training and fine-tuning. The group also aims to facilitate the ML model's metadata management, while ensuring seamless integration with other Kubeflow components. The goal of Spark on Kubernetes Operator is to simplify the capability of running Apache Spark on Kubernetes. It automates deployment and simplifies lifecycle management of Spark Jobs on Kubernetes. The goal of Model Registry is gather, analyze, and develop model registry requirements of Kubeflow community users. The goal of Feast is to provide a customizable, operational data system that enables ML Platform Engineers, Machine Learning Engineers, and Data Scientists to accelerate production machine learning.

The [charter](charter.md) defines the scope and governance of the Data Working Group.

## Meetings
* KF Model Registry community meeting (US/EMEA): [Mondays at 7:00PM-8:00PM Europe/Madrid]() (biweely - every other Monday of the month). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=7:00PM-8:00PM&tz=Europe%2FMadrid).
* [Meeting notes and Agenda](https://docs.google.com/document/d/1DmMhcae081SItH19gSqBpFtPfbkr9dFhSMCgs-JKzNo/edit?usp=sharing).
* The Feast community meeting (US/EMEA): [Tuesdays at 8:00PM-9:00PM Europe/Madrid]() (biweely - every other Tuesday of the month). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=8:00PM-9:00PM&tz=Europe%2FMadrid).
* [Meeting notes and Agenda](https://docs.google.com/document/d/1MqNNhPCi1vvcYv71j3N0uBb4qIbl2ikaFO8tKohu12I/edit#heading=h.6js8vamr57ls).

tarilabs marked this conversation as resolved.
Show resolved Hide resolved
## Organizers

* Yi Chen (**[@ChenYi015](https://github.com/ChenYi015)**), Alibaba Cloud
* Andrey Velichkevich (**[@andreyvelich](https://github.com/andreyvelich)**), Apple
* Ramesh Reddy (**[@rareddy](https://github.com/rareddy)**), Red Hat

## Contact
- Slack: [#https://cloud-native.slack.com/archives/C073W572LA2](https://kubeflow.slack.com/messages/https://cloud-native.slack.com/archives/C073W572LA2)
- [Mailing list](https://groups.google.com/forum/#!forum/kubeflow-discuss)
- [Open Community Issues/PRs](https://github.com/kubeflow/community/labels/wg%2Farea/wg-data)
- GitHub Teams:
- [@kubeflow/wg-data-leads](https://github.com/orgs/kubeflow/teams/wg-data-leads) - Team of Data Working Group leads
<!-- BEGIN CUSTOM CONTENT -->

<!-- END CUSTOM CONTENT -->
71 changes: 71 additions & 0 deletions wg-data/charter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# WG Data Charter

This charter adheres to the conventions, roles, and organisation management outlined in [wg-governance] for the Working Group "Data".

## Scope

The WG "Data" is focused on enhancing the support for data and metadata-related tasks within Kubeflow, with a specific focus on the [Spark operator](https://github.com/kubeflow/community/pull/672), the [Model Registry](https://github.com/kubeflow/kubeflow/issues/7396), and [Feast](https://github.com/feast-dev/feast).
The group aims to simplify and improve data and feature processing between various stages of ML lifecycle. For example, from Data Preparation to model training and fine-tuning.
The group also aims to facilitate the ML model's metadata management, while ensuring seamless integration with other Kubeflow components.

An additional goal of the group is to offer a common ground for data and metadata-related topics in the MLOps orbit that didn't have a more specific working group yet, so they can "incubate as one", coherent effort.

For example: Data Preparation, the Feature Store, and Model Registry have been recently discussed in the Kubeflow community while not mature enough yet to have their own working group, they can be nurtured together as part of this WG.

### In scope

#### Code, Binaries, and Other relevant assets

tarilabs marked this conversation as resolved.
Show resolved Hide resolved
- Onboarding and maintenance of the Spark operator for scalable and distributed data processing.
[See also](https://github.com/kubeflow/spark-operator)
- Continued development of the Model Registry to manage and version machine learning models efficiently.
[See also](https://github.com/kubeflow/model-registry)
- Model Registry REST server
- Model Registry Python client
- deployment Manifests
- BFF for Model Registry
- UI front-end for Model Registry
- Onboarding and maintenance of Feast for historical feature extraction and online feature serving.
- SDKs and REST APIs for interacting with Kubeflow APIs related to data processing and ML models metadata management.
- CI/CD pipelines for Kubeflow subproject repositories in the scope of this WG.
- Documentation, in the form of Kubeflow website sections and as necessary in each repository.

#### Cross-cutting and Externally Facing Processes

- Ensuring seamless integration of these WG subprojects with the rest of the Kubeflow platform. For example:
- Coordinating with WG Pipelines for integrations of Model Registry and Feast with KFP.
- Coordinating with WG Serving for integrations of Model Registry and Feast with KServe and ModelMesh.
- Coordinating with release teams to ensure that the capabilities and subprojects in scope of this WG can be released properly.
- Offer mentorship to support contributors working on data-centric projects that want to integrate with Kubeflow.

### Out of scope

- APIs and components related to:
- ML exploration, feature development, and experimentation (covered in Notebooks/Pipelines),
- ML training (covered in Training),
- serving ML features and models for model inference (covered in Serving)
- Anything else not explicitly outlined in the scope of this WG.

## Roles and Organization Management

This WG adheres to the Roles and Organization Management outlined in [wg-governance] and opts-in to updates and modifications to [wg-governance].

### Additional responsibilities of Chairs

- Coordinating and facilitating discussions on Data-related topics in scope of the WG, within the WG itself and the Kubeflow community.
- Ensuring alignment with overall Kubeflow goals and objectives in the context of data processing and ML model metadata's management.

### Additional responsibilities of Tech Leads

- Providing technical guidance and mentorship to contributors working on Spark operator, Model Registry, and the projects in scope of this WG.
- Overseeing the technical direction of the subprojects and ensuring consistency with Kubeflow's vision for data processing and metadata management.

### Deviations from [wg-governance]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


This WG follows the outlined roles and governance in [wg-governance].

### Subproject Creation

WG Technical Leads

[wg-governance]: ../wgs/wg-governance.md
1 change: 1 addition & 0 deletions wg-list.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ When the need arises, a [new WG can be created](wgs/wg-lifecycle.md)
| Name | Label | Chairs | Contact | Meetings |
|------|-------|--------|---------|----------|
|[AutoML](wg-automl/README.md)|area/wg-automl|* [Andrey Velichkevich](https://github.com/andreyvelich), Apple<br>* [Ce Gao](https://github.com/gaocegege), Caicloud<br>* [Johnu George](https://github.com/johnugeorge), Nutanix<br>|* [Slack](https://kubeflow.slack.com/messages/wg-automl)<br>* [Mailing List](https://groups.google.com/forum/#!forum/kubeflow-discuss)|* Kubeflow AutoML Working Group Meeting (Asia & Europe friendly): [Wednesdays at 11:00am UTC (Coordinated Universal Time) (every 4 weeks on Wednesday from the 10th of March 2021)](https://calendar.google.com/calendar/u/0/r?cid=ZDQ5bnNpZWZzbmZna2Y5MW8wdThoMmpoazRAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ)<br>* Kubeflow AutoML Working Group Meeting (US friendly): [Wednesdays at 5:00pm UTC (Coordinated Universal Time) (every 4 weeks on Wednesday from the 24th of March 2021)](https://calendar.google.com/calendar/u/0/r?cid=ZDQ5bnNpZWZzbmZna2Y5MW8wdThoMmpoazRAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ)<br>
|[Data](wg-data/README.md)|area/wg-data|* [Yi Chen](https://github.com/ChenYi015), Alibaba Cloud<br>* [Andrey Velichkevich](https://github.com/andreyvelich), Apple<br>* [Ramesh Reddy](https://github.com/rareddy), Red Hat<br>|* [Slack](https://kubeflow.slack.com/messages/https://cloud-native.slack.com/archives/C073W572LA2)<br>* [Mailing List](https://groups.google.com/forum/#!forum/kubeflow-discuss)|* KF Model Registry community meeting (US/EMEA): [Mondays at 7:00PM-8:00PM Europe/Madrid (biweely - every other Monday of the month)]()<br>* The Feast community meeting (US/EMEA): [Tuesdays at 8:00PM-9:00PM Europe/Madrid (biweely - every other Tuesday of the month)]()<br>
|[Deployment](wg-deployment/README.md)|area/wg-deployment|* [Yao Xiao](https://github.com/PatrickXYS), AWS<br>* [Animesh Singh](https://github.com/animeshsingh), IBM<br>* [Igor Mameshin](https://github.com/mameshini), Agile Stacks<br>* [Vaclav Pavlin](https://github.com/vpavlin), Red Hat<br>* [Yannis Zarkadas](https://github.com/yanniszark), Arrikto<br>|* [Slack](https://kubeflow.slack.com/messages/wg-deployment)<br>* [Mailing List](https://groups.google.com/forum/#!forum/kubeflow-discuss)|* Regular WG Meeting (Pacific PM): [Wednesdays at 17:30 PT (Pacific Time) (biweekly - every other Wednesday)]()<br>
|[Manifests](wg-manifests/README.md)|area/wg-manifests|* [Julius von Kohout](https://github.com/juliusvonkohout), DHL<br>* [Kimonas Sotirchos](https://github.com/kimwnasptd), Canonical<br>|* [Slack](https://kubeflow.slack.com/messages/wg-manifests)<br>* [Mailing List](https://groups.google.com/forum/#!forum/kubeflow-discuss)|* Regular WG Meeting (Pacific AM): [Thursdays at 08:00 PT (Pacific Time) (biweekly - every other Thursday)]()<br>
|[Notebooks](wg-notebooks/README.md)|area/wg-notebooks|* [Stefano Fioravanzo](https://github.com/StefanoFioravanzo), Arrikto<br>* [Ilias Katsakioris](https://github.com/elikatsis), Arrikto<br>* [Kimonas Sotirchos](https://github.com/kimwnasptd), Canonical<br>* [Mathew Wicks](https://github.com/thesuperzapper)<br>* [Yannis Zarkadas](https://github.com/yanniszark), Arrikto<br>|* [Slack](https://kubeflow.slack.com/messages/wg-notebooks)<br>* [Mailing List](https://groups.google.com/forum/#!forum/kubeflow-discuss)|* Regular Notebooks Meeting (Australia & Europe friendly): [Thursdays at 11:00 pm PT (Pacific Time) (weekly)]()<br>
Expand Down
69 changes: 69 additions & 0 deletions wgs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,75 @@ workinggroups:
- name: katib
owners:
- https://raw.githubusercontent.com/kubeflow/katib/master/OWNERS
- dir: wg-data
name: Data
mission_statement: >
The WG "Data" is focused on enhancing the support for data and metadata-related
tasks within Kubeflow, with a specific focus on the Spark operator, Feast and
Model Registry. The group aims to simplify and improve data and feature processing
between various stages of ML lifecycle. For example, from Data Preparation to
model training and fine-tuning. The group also aims to facilitate the ML model's
metadata management, while ensuring seamless integration with other Kubeflow components.
The goal of Spark on Kubernetes Operator is to simplify the capability of running
Apache Spark on Kubernetes. It automates deployment and simplifies lifecycle management
of Spark Jobs on Kubernetes. The goal of Model Registry is gather, analyze, and
develop model registry requirements of Kubeflow community users. The goal of Feast
is to provide a customizable, operational data system that enables ML Platform
Engineers, Machine Learning Engineers, and Data Scientists to accelerate production
machine learning.

charter_link: charter.md
label: area/wg-data
leadership:
chairs:
- github: ChenYi015
name: Yi Chen
company: Alibaba Cloud
- github: andreyvelich
name: Andrey Velichkevich
company: Apple
- github: rareddy
name: Ramesh Reddy
company: Red Hat
tech_leads:
- github: ChenYi015
name: Yi Chen
company: Alibaba Cloud
- github: andreyvelich
name: Andrey Velichkevich
company: Apple
- github: franciscojavierarceo
name: Francisco Javier Arceo
company: Red Hat
- github: tarilabs
name: Matteo Mortari
company: Red Hat
meetings:
- description: KF Model Registry community meeting (US/EMEA)
day: Monday
time: 7:00PM-8:00PM
tz: Europe/Madrid
frequency: biweely - every other Monday of the month
archive_url: https://docs.google.com/document/d/1DmMhcae081SItH19gSqBpFtPfbkr9dFhSMCgs-JKzNo/edit?usp=sharing
tarilabs marked this conversation as resolved.
Show resolved Hide resolved
- description: The Feast community meeting (US/EMEA)
day: Tuesday
time: 8:00PM-9:00PM
tz: Europe/Madrid
frequency: biweely - every other Tuesday of the month
archive_url: https://docs.google.com/document/d/1MqNNhPCi1vvcYv71j3N0uBb4qIbl2ikaFO8tKohu12I/edit#heading=h.6js8vamr57ls
contact:
slack: https://cloud-native.slack.com/archives/C073W572LA2
mailing_list: https://groups.google.com/forum/#!forum/kubeflow-discuss
teams:
- name: wg-data-leads
description: Team of Data Working Group leads
subprojects:
- name: model-registry
owners:
- https://raw.githubusercontent.com/kubeflow/model-registry/main/OWNERS
- name: spark-operator
Copy link

@franciscojavierarceo franciscojavierarceo Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: spark-operator
- name: feast
owners:
- https://raw.githubusercontent.com/feast-dev/feast/master/OWNERS
- name: spark-operator

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I should not add subprojects belonging outside of github.com/kubeflow here, what is the @kubeflow/kubeflow-steering-committee view on this?

owners:
- https://raw.githubusercontent.com/kubeflow/spark-operator/blob/master/OWNERS
- dir: wg-deployment
name: Deployment
mission_statement: >
Expand Down