diff --git a/OWNERS_ALIASES b/OWNERS_ALIASES index 5355c2eee..9dc9400d2 100644 --- a/OWNERS_ALIASES +++ b/OWNERS_ALIASES @@ -11,6 +11,12 @@ aliases: - gaocegege - johnugeorge - tenzen-y + wg-data-leads: + - ChenYi015 + - andreyvelich + - franciscojavierarceo + - rareddy + - tarilabs wg-deployment-leads: - PatrickXYS - animeshsingh diff --git a/wg-data/README.md b/wg-data/README.md new file mode 100644 index 000000000..1bed41764 --- /dev/null +++ b/wg-data/README.md @@ -0,0 +1,35 @@ + +# Data Working Group + +The WG "Data" is focused on enhancing the support for data and metadata-related tasks within Kubeflow, with a specific focus on the Spark operator, Feast and Model Registry. The group aims to simplify and improve data and feature processing between various stages of ML lifecycle. For example, from Data Preparation to model training and fine-tuning. The group also aims to facilitate the ML model's metadata management, while ensuring seamless integration with other Kubeflow components. The goal of Spark on Kubernetes Operator is to simplify the capability of running Apache Spark on Kubernetes. It automates deployment and simplifies lifecycle management of Spark Jobs on Kubernetes. The goal of Model Registry is gather, analyze, and develop model registry requirements of Kubeflow community users. The goal of Feast is to provide a customizable, operational data system that enables ML Platform Engineers, Machine Learning Engineers, and Data Scientists to accelerate production machine learning. + +The [charter](charter.md) defines the scope and governance of the Data Working Group. + +## Meetings +* KF Model Registry community meeting (US/EMEA): [Mondays at 7:00PM-8:00PM Europe/Madrid]() (biweely - every other Monday of the month). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=7:00PM-8:00PM&tz=Europe%2FMadrid). + * [Meeting notes and Agenda](https://docs.google.com/document/d/1DmMhcae081SItH19gSqBpFtPfbkr9dFhSMCgs-JKzNo/edit?usp=sharing). +* The Feast community meeting (US/EMEA): [Tuesdays at 8:00PM-9:00PM Europe/Madrid]() (biweely - every other Tuesday of the month). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=8:00PM-9:00PM&tz=Europe%2FMadrid). + * [Meeting notes and Agenda](https://docs.google.com/document/d/1MqNNhPCi1vvcYv71j3N0uBb4qIbl2ikaFO8tKohu12I/edit#heading=h.6js8vamr57ls). + +## Organizers + +* Yi Chen (**[@ChenYi015](https://github.com/ChenYi015)**), Alibaba Cloud +* Andrey Velichkevich (**[@andreyvelich](https://github.com/andreyvelich)**), Apple +* Ramesh Reddy (**[@rareddy](https://github.com/rareddy)**), Red Hat + +## Contact +- Slack: [#https://cloud-native.slack.com/archives/C073W572LA2](https://kubeflow.slack.com/messages/https://cloud-native.slack.com/archives/C073W572LA2) +- [Mailing list](https://groups.google.com/forum/#!forum/kubeflow-discuss) +- [Open Community Issues/PRs](https://github.com/kubeflow/community/labels/wg%2Farea/wg-data) +- GitHub Teams: + - [@kubeflow/wg-data-leads](https://github.com/orgs/kubeflow/teams/wg-data-leads) - Team of Data Working Group leads + + + diff --git a/wg-data/charter.md b/wg-data/charter.md new file mode 100644 index 000000000..f0cdaff69 --- /dev/null +++ b/wg-data/charter.md @@ -0,0 +1,71 @@ +# WG Data Charter + +This charter adheres to the conventions, roles, and organisation management outlined in [wg-governance] for the Working Group "Data". + +## Scope + +The WG "Data" is focused on enhancing the support for data and metadata-related tasks within Kubeflow, with a specific focus on the [Spark operator](https://github.com/kubeflow/community/pull/672), the [Model Registry](https://github.com/kubeflow/kubeflow/issues/7396), and [Feast](https://github.com/feast-dev/feast). +The group aims to simplify and improve data and feature processing between various stages of ML lifecycle. For example, from Data Preparation to model training and fine-tuning. +The group also aims to facilitate the ML model's metadata management, while ensuring seamless integration with other Kubeflow components. + +An additional goal of the group is to offer a common ground for data and metadata-related topics in the MLOps orbit that didn't have a more specific working group yet, so they can "incubate as one", coherent effort. + +For example: Data Preparation, the Feature Store, and Model Registry have been recently discussed in the Kubeflow community while not mature enough yet to have their own working group, they can be nurtured together as part of this WG. + +### In scope + +#### Code, Binaries, and Other relevant assets + +- Onboarding and maintenance of the Spark operator for scalable and distributed data processing. +[See also](https://github.com/kubeflow/spark-operator) +- Continued development of the Model Registry to manage and version machine learning models efficiently. +[See also](https://github.com/kubeflow/model-registry) + - Model Registry REST server + - Model Registry Python client + - deployment Manifests + - BFF for Model Registry + - UI front-end for Model Registry +- Onboarding and maintenance of Feast for historical feature extraction and online feature serving. +- SDKs and REST APIs for interacting with Kubeflow APIs related to data processing and ML models metadata management. +- CI/CD pipelines for Kubeflow subproject repositories in the scope of this WG. +- Documentation, in the form of Kubeflow website sections and as necessary in each repository. + +#### Cross-cutting and Externally Facing Processes + +- Ensuring seamless integration of these WG subprojects with the rest of the Kubeflow platform. For example: + - Coordinating with WG Pipelines for integrations of Model Registry and Feast with KFP. + - Coordinating with WG Serving for integrations of Model Registry and Feast with KServe and ModelMesh. +- Coordinating with release teams to ensure that the capabilities and subprojects in scope of this WG can be released properly. +- Offer mentorship to support contributors working on data-centric projects that want to integrate with Kubeflow. + +### Out of scope + +- APIs and components related to: + - ML exploration, feature development, and experimentation (covered in Notebooks/Pipelines), + - ML training (covered in Training), + - serving ML features and models for model inference (covered in Serving) +- Anything else not explicitly outlined in the scope of this WG. + +## Roles and Organization Management + +This WG adheres to the Roles and Organization Management outlined in [wg-governance] and opts-in to updates and modifications to [wg-governance]. + +### Additional responsibilities of Chairs + +- Coordinating and facilitating discussions on Data-related topics in scope of the WG, within the WG itself and the Kubeflow community. +- Ensuring alignment with overall Kubeflow goals and objectives in the context of data processing and ML model metadata's management. + +### Additional responsibilities of Tech Leads + +- Providing technical guidance and mentorship to contributors working on Spark operator, Model Registry, and the projects in scope of this WG. +- Overseeing the technical direction of the subprojects and ensuring consistency with Kubeflow's vision for data processing and metadata management. + +### Deviations from [wg-governance] + +This WG follows the outlined roles and governance in [wg-governance]. + +### Subproject Creation + +WG Technical Leads + +[wg-governance]: ../wgs/wg-governance.md diff --git a/wg-list.md b/wg-list.md index 317e9ab78..c334732f4 100644 --- a/wg-list.md +++ b/wg-list.md @@ -23,6 +23,7 @@ When the need arises, a [new WG can be created](wgs/wg-lifecycle.md) | Name | Label | Chairs | Contact | Meetings | |------|-------|--------|---------|----------| |[AutoML](wg-automl/README.md)|area/wg-automl|* [Andrey Velichkevich](https://github.com/andreyvelich), Apple
* [Ce Gao](https://github.com/gaocegege), Caicloud
* [Johnu George](https://github.com/johnugeorge), Nutanix
|* [Slack](https://kubeflow.slack.com/messages/wg-automl)
* [Mailing List](https://groups.google.com/forum/#!forum/kubeflow-discuss)|* Kubeflow AutoML Working Group Meeting (Asia & Europe friendly): [Wednesdays at 11:00am UTC (Coordinated Universal Time) (every 4 weeks on Wednesday from the 10th of March 2021)](https://calendar.google.com/calendar/u/0/r?cid=ZDQ5bnNpZWZzbmZna2Y5MW8wdThoMmpoazRAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ)
* Kubeflow AutoML Working Group Meeting (US friendly): [Wednesdays at 5:00pm UTC (Coordinated Universal Time) (every 4 weeks on Wednesday from the 24th of March 2021)](https://calendar.google.com/calendar/u/0/r?cid=ZDQ5bnNpZWZzbmZna2Y5MW8wdThoMmpoazRAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ)
+|[Data](wg-data/README.md)|area/wg-data|* [Yi Chen](https://github.com/ChenYi015), Alibaba Cloud
* [Andrey Velichkevich](https://github.com/andreyvelich), Apple
* [Ramesh Reddy](https://github.com/rareddy), Red Hat
|* [Slack](https://kubeflow.slack.com/messages/https://cloud-native.slack.com/archives/C073W572LA2)
* [Mailing List](https://groups.google.com/forum/#!forum/kubeflow-discuss)|* KF Model Registry community meeting (US/EMEA): [Mondays at 7:00PM-8:00PM Europe/Madrid (biweely - every other Monday of the month)]()
* The Feast community meeting (US/EMEA): [Tuesdays at 8:00PM-9:00PM Europe/Madrid (biweely - every other Tuesday of the month)]()
|[Deployment](wg-deployment/README.md)|area/wg-deployment|* [Yao Xiao](https://github.com/PatrickXYS), AWS
* [Animesh Singh](https://github.com/animeshsingh), IBM
* [Igor Mameshin](https://github.com/mameshini), Agile Stacks
* [Vaclav Pavlin](https://github.com/vpavlin), Red Hat
* [Yannis Zarkadas](https://github.com/yanniszark), Arrikto
|* [Slack](https://kubeflow.slack.com/messages/wg-deployment)
* [Mailing List](https://groups.google.com/forum/#!forum/kubeflow-discuss)|* Regular WG Meeting (Pacific PM): [Wednesdays at 17:30 PT (Pacific Time) (biweekly - every other Wednesday)]()
|[Manifests](wg-manifests/README.md)|area/wg-manifests|* [Julius von Kohout](https://github.com/juliusvonkohout), DHL
* [Kimonas Sotirchos](https://github.com/kimwnasptd), Canonical
|* [Slack](https://kubeflow.slack.com/messages/wg-manifests)
* [Mailing List](https://groups.google.com/forum/#!forum/kubeflow-discuss)|* Regular WG Meeting (Pacific AM): [Thursdays at 08:00 PT (Pacific Time) (biweekly - every other Thursday)]()
|[Notebooks](wg-notebooks/README.md)|area/wg-notebooks|* [Stefano Fioravanzo](https://github.com/StefanoFioravanzo), Arrikto
* [Ilias Katsakioris](https://github.com/elikatsis), Arrikto
* [Kimonas Sotirchos](https://github.com/kimwnasptd), Canonical
* [Mathew Wicks](https://github.com/thesuperzapper)
* [Yannis Zarkadas](https://github.com/yanniszark), Arrikto
|* [Slack](https://kubeflow.slack.com/messages/wg-notebooks)
* [Mailing List](https://groups.google.com/forum/#!forum/kubeflow-discuss)|* Regular Notebooks Meeting (Australia & Europe friendly): [Thursdays at 11:00 pm PT (Pacific Time) (weekly)]()
diff --git a/wgs.yaml b/wgs.yaml index 863c0cf54..e43320019 100644 --- a/wgs.yaml +++ b/wgs.yaml @@ -135,6 +135,75 @@ workinggroups: - name: katib owners: - https://raw.githubusercontent.com/kubeflow/katib/master/OWNERS +- dir: wg-data + name: Data + mission_statement: > + The WG "Data" is focused on enhancing the support for data and metadata-related + tasks within Kubeflow, with a specific focus on the Spark operator, Feast and + Model Registry. The group aims to simplify and improve data and feature processing + between various stages of ML lifecycle. For example, from Data Preparation to + model training and fine-tuning. The group also aims to facilitate the ML model's + metadata management, while ensuring seamless integration with other Kubeflow components. + The goal of Spark on Kubernetes Operator is to simplify the capability of running + Apache Spark on Kubernetes. It automates deployment and simplifies lifecycle management + of Spark Jobs on Kubernetes. The goal of Model Registry is gather, analyze, and + develop model registry requirements of Kubeflow community users. The goal of Feast + is to provide a customizable, operational data system that enables ML Platform + Engineers, Machine Learning Engineers, and Data Scientists to accelerate production + machine learning. + + charter_link: charter.md + label: area/wg-data + leadership: + chairs: + - github: ChenYi015 + name: Yi Chen + company: Alibaba Cloud + - github: andreyvelich + name: Andrey Velichkevich + company: Apple + - github: rareddy + name: Ramesh Reddy + company: Red Hat + tech_leads: + - github: ChenYi015 + name: Yi Chen + company: Alibaba Cloud + - github: andreyvelich + name: Andrey Velichkevich + company: Apple + - github: franciscojavierarceo + name: Francisco Javier Arceo + company: Red Hat + - github: tarilabs + name: Matteo Mortari + company: Red Hat + meetings: + - description: KF Model Registry community meeting (US/EMEA) + day: Monday + time: 7:00PM-8:00PM + tz: Europe/Madrid + frequency: biweely - every other Monday of the month + archive_url: https://docs.google.com/document/d/1DmMhcae081SItH19gSqBpFtPfbkr9dFhSMCgs-JKzNo/edit?usp=sharing + - description: The Feast community meeting (US/EMEA) + day: Tuesday + time: 8:00PM-9:00PM + tz: Europe/Madrid + frequency: biweely - every other Tuesday of the month + archive_url: https://docs.google.com/document/d/1MqNNhPCi1vvcYv71j3N0uBb4qIbl2ikaFO8tKohu12I/edit#heading=h.6js8vamr57ls + contact: + slack: https://cloud-native.slack.com/archives/C073W572LA2 + mailing_list: https://groups.google.com/forum/#!forum/kubeflow-discuss + teams: + - name: wg-data-leads + description: Team of Data Working Group leads + subprojects: + - name: model-registry + owners: + - https://raw.githubusercontent.com/kubeflow/model-registry/main/OWNERS + - name: spark-operator + owners: + - https://raw.githubusercontent.com/kubeflow/spark-operator/blob/master/OWNERS - dir: wg-deployment name: Deployment mission_statement: >