From 6154c5b9eedcd9ed0f0a97ecb0a31602468f9a73 Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Fri, 5 Mar 2021 17:06:03 +0100 Subject: [PATCH 01/34] Fix broken link --- assessment_framework_eng.md | 4 ++-- referentiel_evaluation.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index 9bb44d0..0701a2b 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -188,7 +188,7 @@ The state of the art in ML security is constantly evolving. While it is impossib
Ressources1.7 : -- (Technical guide) *[Privacy Enhancing Technologies Decision Tree (v2)](https://www.private-ai.ca/PETs_Decision_Tree.png)*, Private AI, 2020 +- (Technical guide) *[Privacy Enhancing Technologies Decision Tree (v2)](http://www.private-ai.ca/PETs_Decision_Tree.svg)*, Private AI, 2020 - (Web article) *[The secret-sharer: evaluating and testing unintended memorization in neural networks](https://blog.acolyer.org/2019/09/23/the-secret-sharer/)*, A. Colyer, 2019 - (Academic paper) *[Membership Inference Attacks against Machine Learning Models](https://arxiv.org/abs/1610.05820)*, R. Shokri, M. Stronati, C. Song, V. Shmatikov, 2017 - (Software & Tools) *[ML Privacy Meter](https://github.com/privacytrustlab/ml_privacy_meter): a tool to quantify the privacy risks of machine learning models with respect to inference attacks*. @@ -229,7 +229,7 @@ Depending on the level of risk and sensitivity of the projects, certain technica
Resources1.8 : -- (Technical guide) *[Privacy Enhancing Technologies Decision Tree (v2)](https://www.private-ai.ca/PETs_Decision_Tree.png)*, Private AI, 2020 +- (Technical guide) *[Privacy Enhancing Technologies Decision Tree (v2)](http://www.private-ai.ca/PETs_Decision_Tree.svg)*, Private AI, 2020 - (Web article) *[The secret-sharer: evaluating and testing unintended memorization in neural networks](https://blog.acolyer.org/2019/09/23/the-secret-sharer/)*, A. Colyer, 2019 - (Academic paper) *[Membership Inference Attacks against Machine Learning Models](https://arxiv.org/abs/1610.05820)*, R. Shokri, M. Stronati, C. Song, V. Shmatikov, 2017 - (Software & Tools) *[ML Privacy Meter](https://github.com/privacytrustlab/ml_privacy_meter): a tool to quantify the privacy risks of machine learning models with respect to inference attacks*. diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index 0a3632e..0cd3021 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -189,7 +189,7 @@ L'état de l'art de la sécurité du ML est en constante évolution. S'il est im
Ressources1.7 : -- (Technical guide) *[Privacy Enhancing Technologies Decision Tree (v2)](https://www.private-ai.ca/PETs_Decision_Tree.png)*, Private AI, 2020 +- (Technical guide) *[Privacy Enhancing Technologies Decision Tree (v2)](http://www.private-ai.ca/PETs_Decision_Tree.svg)*, Private AI, 2020 - (Web article) *[The secret-sharer: evaluating and testing unintended memorization in neural networks](https://blog.acolyer.org/2019/09/23/the-secret-sharer/)*, A. Colyer, 2019 - (Academic paper) *[Membership Inference Attacks against Machine Learning Models](https://arxiv.org/abs/1610.05820)*, R. Shokri, M. Stronati, C. Song, V. Shmatikov, 2017 - (Software & Tools) *[ML Privacy Meter](https://github.com/privacytrustlab/ml_privacy_meter): a tool to quantify the privacy risks of machine learning models with respect to inference attacks* @@ -230,7 +230,7 @@ Selon les niveaux de risque et de sensibilité des projets, certaines approches
Ressources1.8 : -- (Technical guide) *[Privacy Enhancing Technologies Decision Tree (v2)](https://www.private-ai.ca/PETs_Decision_Tree.png)*, Private AI, 2020 +- (Technical guide) *[Privacy Enhancing Technologies Decision Tree (v2)](http://www.private-ai.ca/PETs_Decision_Tree.svg)*, Private AI, 2020 - (Web article) *[The secret-sharer: evaluating and testing unintended memorization in neural networks](https://blog.acolyer.org/2019/09/23/the-secret-sharer/)*, A. Colyer, 2019 - (Academic paper) *[Membership Inference Attacks against Machine Learning Models](https://arxiv.org/abs/1610.05820)*, R. Shokri, M. Stronati, C. Song, V. Shmatikov, 2017 - (Software & Tools) *[ML Privacy Meter](https://github.com/privacytrustlab/ml_privacy_meter): a tool to quantify the privacy risks of machine learning models with respect to inference attacks* From 8551f7ec0431fac957de5066d7efaff9dab5242d Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Fri, 5 Mar 2021 17:09:28 +0100 Subject: [PATCH 02/34] Add newlines to fix a formatting issue --- assessment_framework_eng.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index 0701a2b..076f7fe 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -16,6 +16,7 @@ The evaluation is composed of the following 6 sections: --- ### Section 1 - Protecting personal or confidential data + **[Data privacy]** The use of personal or confidential data carries the risk of exposure of such data, which can have very detrimental consequences for the producers, controllers or subjects of such data. Particularly in data science projects, they must therefore be protected and the risks of their leakage or exposure must be minimised. @@ -268,6 +269,7 @@ In some sectors there are obligations to report safety incidents to the regulato --- ### Section 2 - Preventing bias, developing non-discriminatory models + **[Biases and discrimination]** The use of predictive models learned from historical data can be counterproductive when historical data are contaminated by problematic phenomena (e.g. quality of certain data points, non-comparable data, social phenomena undesirable due to the time period, etc.). A key challenge for responsible and trustworthy data science is to respect the principle of diversity, non-discrimination and equity (described for example in section 1.5 of the EU [Ethics Guidelines for Trustworthy AI](https://ec.europa.eu/newsroom/dae/document.cfm?doc_id=60419)). It is therefore essential to question this risk and to study the nature of the data used, the conditions under which they were produced and collected, and what they represent. From 9b8a5a17c1a15b694c8aaee976995ac53270c8c3 Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Fri, 5 Mar 2021 17:29:10 +0100 Subject: [PATCH 03/34] Add explanation for element 2.2 --- assessment_framework_eng.md | 7 +++++++ referentiel_evaluation.md | 7 +++++++ 2 files changed, 14 insertions(+) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index 076f7fe..50217b7 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -318,6 +318,13 @@ _(Select one answer only, which best corresponds to the level of maturity of the - [ ] 2.2.a Concerned - [ ] 2.2.b Not concerned +
+Expl2.2 : + +Configurations with risks of potential discriminations against social groups are particularly sensitive for the organisation and its counterparts. It requires special attention and the use of specific methodologies. + +
+ --- _The following items within this section apply only to organisations that have selected the "Concerned" response in R2.2. Organisations not involved are therefore invited to move on to [Section 3](#section-3---assessing-model-performance-rigorously)._ diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index 0cd3021..8a98d1a 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -318,6 +318,13 @@ _(Sélectionner une seule réponse, correspondant le mieux au niveau de maturit - [ ] 2.2.a Concerné - [ ] 2.2.b Non concerné +
+Expl2.2 : + +Les cas de figure où il existe des risques de discrimination sont particulièrement sensibles pour l'organisation et ses parties prenantes, et requièrent une attention toute particulière. + +
+ --- _Les éléments suivants au sein de cette section ne s'appliquent qu'aux organisations ayant sélectionné la réponse "Concerné" de R2.2. Les organisations non concernées sont donc invitées à passer à la [Section 3](#section-3-evaluer-la-performance-de-manière-rigoureuse-et-expliquer-les-prédictions)._ From ffabb745427ab1a13f288482a8ca72e3b15a7c89 Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Fri, 5 Mar 2021 17:32:40 +0100 Subject: [PATCH 04/34] Add an intermediate answer item to element 4.3 --- assessment_framework_eng.md | 5 +++-- referentiel_evaluation.md | 5 +++-- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index 50217b7..12a8d1d 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -655,8 +655,9 @@ _(Type: multiple responses possible)_ _(Select all response items that correspond to practices in your organisation. Please note that some combinations would not be coherent)_ - [ ] 4.3.a At this stage we do not analyse the incidents or unexpected behaviour observed -- [ ] 4.3.b We analyse incidents or unexpected behaviour encountered and publish them when relevant (e.g. article, blog) -- [ ] 4.3.c We get involved in clubs, networks or professional associations in the field of data science, and give feedback on incidents of unexpected behaviour that we observe +- [ ] 4.3.b We analyse incidents or unexpected behaviour encountered, but don't publish or share it +- [ ] 4.3.c We analyse incidents or unexpected behaviour encountered and publish them when relevant (e.g. article, blog) +- [ ] 4.3.d We get involved in clubs, networks or professional associations in the field of data science, and give feedback on incidents of unexpected behaviour that we observe
Expl4.3 : diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index 8a98d1a..4d3b70a 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -651,8 +651,9 @@ _(Type : réponses multiples possibles)_ _(Sélectionner tous les éléments de réponse correspondant à des pratiques de votre organisation. Attention, certaines combinaisons ne seraient pas cohérentes)_ - [ ] 4.3.a À ce stade nous ne faisons pas d'analyse des incidents ou comportements inattendus observés -- [ ] 4.3.b Nous analysons les incidents ou comportements inattendus rencontrés et les publions lorsque cela est pertinent (e.g. article, blog) -- [ ] 4.3.c Nous nous impliquons dans des clubs, cercles, ou associations professionnelles dans le domaine de la data science, et y faisons des retours d'expérience des incidents comportements inattendus que nous observons +- [ ] 4.3.b Nous analysons les incidents ou comportements inattendus rencontrés, mais ne les publions pas +- [ ] 4.3.c Nous analysons les incidents ou comportements inattendus rencontrés et les publions lorsque cela est pertinent (e.g. article, blog) +- [ ] 4.3.d Nous nous impliquons dans des clubs, cercles, ou associations professionnelles dans le domaine de la data science, et y faisons des retours d'expérience des incidents comportements inattendus que nous observons
Expl4.3 : From 461d6e9b53d59372e967c7a55d5e82c5e4561f0e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Cl=C3=A9ment=20Mayer?= Date: Tue, 9 Mar 2021 15:34:53 +0100 Subject: [PATCH 05/34] Add data for good ressources --- assessment_framework_eng.md | 3 +++ referentiel_evaluation.md | 3 +++ 2 files changed, 6 insertions(+) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index 12a8d1d..2af1828 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -365,6 +365,7 @@ Complement on the use of synthetic data and _data augmentation_, _re-weighting_ - (Academic paper) *Fairness metrics* : *[counterfactual fairness](https://papers.nips.cc/paper/6995-counterfactual-fairness)* - (Academic paper) *Fairness metrics* : *[adversarial debiaising](https://arxiv.org/pdf/1801.07593.pdf)* - (Technical guide) Book *Fair ML* : *[Fairness and machine learning - Limitations and opportunities](https://fairmlbook.org/)*, Solon Barocas, Moritz Hardt, Arvind Narayanan, December 2019 +- (Web article) *[Fairness in Machine Learning](https://www.substra.ai/en/blog/fairness-in-machine-learning)*, introduction to Fairness metrics on Substra Foundation's blog, Mickael Fine, 2020
@@ -478,6 +479,7 @@ On robustness, an intuitive definition is that a model is robust when its perfor - (Web article) *[Testing Robustness Against Unforeseen Adversaries](https://openai.com/blog/testing-robustness/)*, Open AI, August 2019 - (Academic paper) *Robustness metrics* : *[noise sensitivity score](https://arxiv.org/abs/1806.01477)*. - (Technical guide) *[Adversarial Robustness - Theory and Practice](https://adversarial-ml-tutorial.org/)*, Z. Kolter and A. Madry +- (Technical guide) *[Understand Robustness](https://github.com/Nathanlauga/understand-robustness/blob/main/notebooks/understand_robustness.ipynb)*, Nathan Lauga, 2020
@@ -612,6 +614,7 @@ This concept of the "end-to-end genealogy" of a learned predictive model can tak - (Software & Tools) [MLflow](https://mlflow.org/): *an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry* - (Software & Tools) [DVC](https://dvc.org/): *an Open-source Version Control System for Machine Learning Projects* - (Software & Tools) [DAGsHub](https://dagshub.com/docs/): *a platform for data version control and collaboration, based on DVC* *a platform for data version control and collaboration, based on DVC* +- (Software & Tools) [End-to-end genealogy template](https://github.com/dataforgoodfr/batch8_substra/blob/master/G%C3%A9n%C3%A9alogie%20de%20bout-en-bout/Genealogie-de-bout-en-bout_template.md): *template for Data Scientists to help collect all the information in order to trace the genealogy from end to end of a model*, 2020, Joséphine Lecoq-Vallon
diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index 4d3b70a..74a4548 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -365,6 +365,7 @@ Complément sur l'utilisation de données synthétiques et d'approches de _data - (Academic paper) *Fairness metrics* : *[counterfactual fairness](https://papers.nips.cc/paper/6995-counterfactual-fairness)* - (Academic paper) *Fairness metrics* : *[adversarial debiaising](https://arxiv.org/pdf/1801.07593.pdf)* - (Technical guide) Livre *Fair ML* : *[Fairness and machine learning - Limitations and opportunities](https://fairmlbook.org/)*, Solon Barocas, Moritz Hardt, Arvind Narayanan, Décembre 2019 +- (web article) *[L'équité (Fairness) dans le Machine Learning](https://www.substra.ai/fr/blog/fairness-dans-le-machine-learning)*, introduction aux Fairness Metrics sur le blog de Substra Foundation, Mickael Fine, 2020
@@ -477,6 +478,7 @@ Sur la robustesse, une définition intuitive est qu'un modèle est robuste lorsq - (Web article) *[Testing Robustness Against Unforeseen Adversaries](https://openai.com/blog/testing-robustness/)*, Open AI, Août 2019 - (Academic paper) *Robustness metrics* : *[noise sensitivity score](https://arxiv.org/abs/1806.01477)*. - (Technical guide) *[Adversarial Robustness - Theory and Practice](https://adversarial-ml-tutorial.org/)*, Z. Kolter et A. Madry +- (Technical guide) *[Understand Robustness](https://github.com/Nathanlauga/understand-robustness/blob/main/notebooks/understand_robustness.ipynb)*, Nathan Lauga, 2020
@@ -608,6 +610,7 @@ Ce concept de "généalogie de bout-en-bout" d'un modèle prédictif appris peut - (Software & Tools) [MLflow](https://mlflow.org/): *an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry* - (Software & Tools) [DVC](https://dvc.org/): *an Open-source Version Control System for Machine Learning Projects* - (Software & Tools) [DAGsHub](https://dagshub.com/docs/): *a platform for data version control and collaboration, based on DVC* +- (Software & Tools) [Modèle de généalogie de bout en bout](https://github.com/dataforgoodfr/batch8_substra/blob/master/G%C3%A9n%C3%A9alogie%20de%20bout-en-bout/Genealogie-de-bout-en-bout_template.md): *template à destination des Data Scientists pour aider à collecter toutes les informations afin de tracer la généalogie de bout-en-bout d'un modèle*, 2020, Joséphine Lecoq-Vallon
From 4850872194d81d616e395e58a2344aeab1bb0a23 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Cl=C3=A9ment=20Mayer?= Date: Tue, 9 Mar 2021 15:47:59 +0100 Subject: [PATCH 06/34] Add berkman klein center meta study --- references.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/references.md b/references.md index c6eba29..bf93e28 100644 --- a/references.md +++ b/references.md @@ -63,6 +63,8 @@ L'*[Institute for Ethical AI & Machine Learning](https://ethical.institute)* mai - Méta-étude *[A Unified Framework of Five Principles for AI in Society](https://hdsr.mitpress.mit.edu/pub/l0jsh9d1/release/6)*, F. Floridi, J. Cowls, Juillet 2019 +- Méta-étude *[Principled Artificial Intelligence](https://cyber.harvard.edu/publication/2020/principled-ai)*, Berkman Klein Center, Février 2020 + ### Guidelines, liste de principes ou de thèmes-clés - [UNESCO - Recommendation on the ethics of artificial intelligence](https://unesdoc.unesco.org/ark:/48223/pf0000373434_fre): From 7e4672372fdd1b6f4ff10ac1c834f87d5961e413 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Cl=C3=A9ment=20Mayer?= Date: Thu, 11 Mar 2021 14:58:57 +0100 Subject: [PATCH 07/34] Add resource on model distillation --- assessment_framework_eng.md | 2 ++ referentiel_evaluation.md | 1 + 2 files changed, 3 insertions(+) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index 2af1828..fe6fa74 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -199,6 +199,8 @@ The state of the art in ML security is constantly evolving. While it is impossib - (Software & Tools) Tools for *differential privacy*: Google *[differential privacy library](https://github.com/google/differential-privacy)*, and the Python [PyDP](https://github.com/OpenMined/PyDP) wrapper from OpenMined - (Web article) The *distillation* of a model, in addition to the compression it provides, can be used as a measure to protect the model and the training data used, see for example *[Knowledge Distillation: Simplified](https://towardsdatascience.com/knowledge-distillation-simplified-dd4973dbc764)*, Towards Data Science, 2019. - (Academic paper) *[Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531)*, G. Hinton, O. Vinyals, J. Dean, 2015 +- (Web article) *[Model distillation and privacy](https://www.substra.ai/en/blog/model-distillation)*, Substra Foundation blog post to introduce distillation approaches +, Gigs Barmentlo, 2020 diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index 74a4548..712dacc 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -199,6 +199,7 @@ L'état de l'art de la sécurité du ML est en constante évolution. S'il est im - (Software & Tools) Outils pour la *differential privacy* : Google *[differential privacy library](https://github.com/google/differential-privacy)*, et le wrapper Python [PyDP](https://github.com/OpenMined/PyDP) d'OpenMined - (Web article) La *distillation* d'un modèle, en plus de la compression qu'elle apporte, peut être utilisée comme une mesure de protection du modèle et des données d'entraînement utilisées, voir par exemple *[Knowledge Distillation: Simplified](https://towardsdatascience.com/knowledge-distillation-simplified-dd4973dbc764)*, Towards Data Science, 2019 - (Academic paper) *[Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531)*, G. Hinton, O. Vinyals, J. Dean, 2015 +- (Web article) *[Model distillation and privacy](https://www.substra.ai/en/blog/model-distillation)*, article de blog Substra Foundation pour présenter les approches de distillation, Gigs Barmentlo, 2020 From c31294d438e8f2d80ad5277eda30bb7781b5624d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Cl=C3=A9ment=20Mayer?= Date: Thu, 11 Mar 2021 15:00:16 +0100 Subject: [PATCH 08/34] Correct typo --- assessment_framework_eng.md | 2 +- referentiel_evaluation.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index fe6fa74..161e80e 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -200,7 +200,7 @@ The state of the art in ML security is constantly evolving. While it is impossib - (Web article) The *distillation* of a model, in addition to the compression it provides, can be used as a measure to protect the model and the training data used, see for example *[Knowledge Distillation: Simplified](https://towardsdatascience.com/knowledge-distillation-simplified-dd4973dbc764)*, Towards Data Science, 2019. - (Academic paper) *[Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531)*, G. Hinton, O. Vinyals, J. Dean, 2015 - (Web article) *[Model distillation and privacy](https://www.substra.ai/en/blog/model-distillation)*, Substra Foundation blog post to introduce distillation approaches -, Gigs Barmentlo, 2020 +, Gijs Barmentlo, 2020 diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index 712dacc..3e09450 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -199,7 +199,7 @@ L'état de l'art de la sécurité du ML est en constante évolution. S'il est im - (Software & Tools) Outils pour la *differential privacy* : Google *[differential privacy library](https://github.com/google/differential-privacy)*, et le wrapper Python [PyDP](https://github.com/OpenMined/PyDP) d'OpenMined - (Web article) La *distillation* d'un modèle, en plus de la compression qu'elle apporte, peut être utilisée comme une mesure de protection du modèle et des données d'entraînement utilisées, voir par exemple *[Knowledge Distillation: Simplified](https://towardsdatascience.com/knowledge-distillation-simplified-dd4973dbc764)*, Towards Data Science, 2019 - (Academic paper) *[Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531)*, G. Hinton, O. Vinyals, J. Dean, 2015 -- (Web article) *[Model distillation and privacy](https://www.substra.ai/en/blog/model-distillation)*, article de blog Substra Foundation pour présenter les approches de distillation, Gigs Barmentlo, 2020 +- (Web article) *[Model distillation and privacy](https://www.substra.ai/en/blog/model-distillation)*, article de blog Substra Foundation pour présenter les approches de distillation, Gijs Barmentlo, 2020 From 029d60809c3c5b00886b9835877839b53602ed6e Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Tue, 23 Mar 2021 18:06:11 +0100 Subject: [PATCH 09/34] Fix formatting typo --- assessment_framework_eng.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index 161e80e..ee0114e 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -199,8 +199,7 @@ The state of the art in ML security is constantly evolving. While it is impossib - (Software & Tools) Tools for *differential privacy*: Google *[differential privacy library](https://github.com/google/differential-privacy)*, and the Python [PyDP](https://github.com/OpenMined/PyDP) wrapper from OpenMined - (Web article) The *distillation* of a model, in addition to the compression it provides, can be used as a measure to protect the model and the training data used, see for example *[Knowledge Distillation: Simplified](https://towardsdatascience.com/knowledge-distillation-simplified-dd4973dbc764)*, Towards Data Science, 2019. - (Academic paper) *[Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531)*, G. Hinton, O. Vinyals, J. Dean, 2015 -- (Web article) *[Model distillation and privacy](https://www.substra.ai/en/blog/model-distillation)*, Substra Foundation blog post to introduce distillation approaches -, Gijs Barmentlo, 2020 +- (Web article) *[Model distillation and privacy](https://www.substra.ai/en/blog/model-distillation)*, Substra Foundation blog post to introduce distillation approaches, Gijs Barmentlo, 2020 From 65dd2967857be5f6787931ee39e12efcb330228b Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Wed, 24 Mar 2021 15:46:34 +0100 Subject: [PATCH 10/34] Add CodeCarbon as a reference to element 6.1 --- assessment_framework_eng.md | 1 + referentiel_evaluation.md | 1 + 2 files changed, 2 insertions(+) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index ee0114e..fec24ab 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -942,6 +942,7 @@ It is important to question and raise awareness of environmental costs. Ressources6.1 : - (Software & Tools) *[ML Impact Calculator](https://mlco2.github.io/impact/)* +- (Software & Tools) *[Code Carbon](https://codecarbon.io/)*: a Python library to estimate the amount of CO2 produced by computing resources used to execute code diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index 3e09450..b6b3e79 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -938,6 +938,7 @@ Il est important de s'interroger et de conscientiser les coûts environnementaux Ressources6.1 : - (Software & Tools) *[ML Impact Calculator](https://mlco2.github.io/impact/)* +- (Software & Tools) *[Code Carbon](https://codecarbon.io/)*: librairie Python permettant d'évaluer le coût carbone de l'exécution d'un script From 805506015d5684badc30c55b35bda594316fee45 Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Wed, 24 Mar 2021 15:50:43 +0100 Subject: [PATCH 11/34] Add Shapash as resources to element 5.4 --- assessment_framework_eng.md | 1 + referentiel_evaluation.md | 1 + 2 files changed, 2 insertions(+) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index fec24ab..42b07bb 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -872,6 +872,7 @@ Technical resources such as SHAP or LIME provide a first-hand introduction to th - (Technical guide) *[Interpretable Machine Learning, A Guide for Making Black Box Models Explainable](https://christophm.github.io/interpretable-ml-book/)*, Christoph Molnar - (Web article) *[Understanding model predictions with LIME](https://towardsdatascience.com/understanding-model-predictions-with-lime-a582fdff3a3b)*, blog L. Hulstaert, 2018 - (Software & Tools) *[SHAP](https://github.com/slundberg/shap): A game theoretic approach to explain the output of any machine learning model*. +- (Software & Tools) *[Shapash](https://github.com/MAIF/shapash): a MAIF Datalab project which aims to make machine learning interpretable and understandable by everyone. It provides several types of visualization that display explicit labels that everyone can understand - (Web article) In some cases, regulations impose being able to explain how an automated system came to a certain outcome (see for example [article 22 of the GDPR in the European Union](https://www.cnil.fr/fr/reglement-europeen-protection-donnees/chapitre3#Article22), [article 10 of the "Informatique & Libertés" law in France](https://www.legifrance.gouv.fr/affichTexteArticle.do;?idArticle=LEGIARTI000037090394&cidTexte=LEGITEXT000006068624&dateTexte=20180624), cited in particular in the [Hippocratic Oath for data scientist](https://hippocrate.tech/). diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index b6b3e79..3d794f2 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -868,6 +868,7 @@ Des ressources techniques comme SHAP ou LIME permettent d'entrer de plain-pied d - (Technical guide) *[Interpretable Machine Learning, A Guide for Making Black Box Models Explainable](https://christophm.github.io/interpretable-ml-book/)*, Christoph Molnar - (Web article) *[Understanding model predictions with LIME](https://towardsdatascience.com/understanding-model-predictions-with-lime-a582fdff3a3b)*, blog L. Hulstaert, 2018 - (Software & Tools) *[SHAP](https://github.com/slundberg/shap): A game theoretic approach to explain the output of any machine learning model* +- (Software & Tools) *[Shapash](https://github.com/MAIF/shapash): un projet open source de MAIF Datalab facilitant la prise en main et permettant de visualiser les analyses d'explicabilité et d'interprétabilité des modèles - (Web article) Dans certains cas la réglementation impose de pouvoir expliquer aux personnes concernées comment fonctionne un algorithme (voir par exemple [l'article 22 du RGPD](https://www.cnil.fr/fr/reglement-europeen-protection-donnees/chapitre3#Article22), [l'article 10 de la loi Informatique et libertés](https://www.legifrance.gouv.fr/affichTexteArticle.do;?idArticle=LEGIARTI000037090394&cidTexte=LEGITEXT000006068624&dateTexte=20180624), cités notamment dans le [Serment d'Hippocrate pour data scientist](https://hippocrate.tech/)) From 6645fc63e040e76d97794fa80af28ab8a36ef31f Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Wed, 24 Mar 2021 15:53:28 +0100 Subject: [PATCH 12/34] Add FACET as resource to element 5.4 --- assessment_framework_eng.md | 1 + referentiel_evaluation.md | 1 + 2 files changed, 2 insertions(+) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index 42b07bb..efc1533 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -873,6 +873,7 @@ Technical resources such as SHAP or LIME provide a first-hand introduction to th - (Web article) *[Understanding model predictions with LIME](https://towardsdatascience.com/understanding-model-predictions-with-lime-a582fdff3a3b)*, blog L. Hulstaert, 2018 - (Software & Tools) *[SHAP](https://github.com/slundberg/shap): A game theoretic approach to explain the output of any machine learning model*. - (Software & Tools) *[Shapash](https://github.com/MAIF/shapash): a MAIF Datalab project which aims to make machine learning interpretable and understandable by everyone. It provides several types of visualization that display explicit labels that everyone can understand +- (Software & Tools) *[FACET](https://github.com/BCG-Gamma/facet)*: a BCG Gamma project of an open source library for human-explainable AI. It combines sophisticated model inspection and model-based simulation to enable better explanations of supervised machine learning models - (Web article) In some cases, regulations impose being able to explain how an automated system came to a certain outcome (see for example [article 22 of the GDPR in the European Union](https://www.cnil.fr/fr/reglement-europeen-protection-donnees/chapitre3#Article22), [article 10 of the "Informatique & Libertés" law in France](https://www.legifrance.gouv.fr/affichTexteArticle.do;?idArticle=LEGIARTI000037090394&cidTexte=LEGITEXT000006068624&dateTexte=20180624), cited in particular in the [Hippocratic Oath for data scientist](https://hippocrate.tech/). diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index 3d794f2..cdccb6a 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -869,6 +869,7 @@ Des ressources techniques comme SHAP ou LIME permettent d'entrer de plain-pied d - (Web article) *[Understanding model predictions with LIME](https://towardsdatascience.com/understanding-model-predictions-with-lime-a582fdff3a3b)*, blog L. Hulstaert, 2018 - (Software & Tools) *[SHAP](https://github.com/slundberg/shap): A game theoretic approach to explain the output of any machine learning model* - (Software & Tools) *[Shapash](https://github.com/MAIF/shapash): un projet open source de MAIF Datalab facilitant la prise en main et permettant de visualiser les analyses d'explicabilité et d'interprétabilité des modèles +- (Software & Tools) *[FACET](https://github.com/BCG-Gamma/facet)*: un projet open source du BCG Gamma, *FACET is an open source library for human-explainable AI. It combines sophisticated model inspection and model-based simulation to enable better explanations of supervised machine learning models* - (Web article) Dans certains cas la réglementation impose de pouvoir expliquer aux personnes concernées comment fonctionne un algorithme (voir par exemple [l'article 22 du RGPD](https://www.cnil.fr/fr/reglement-europeen-protection-donnees/chapitre3#Article22), [l'article 10 de la loi Informatique et libertés](https://www.legifrance.gouv.fr/affichTexteArticle.do;?idArticle=LEGIARTI000037090394&cidTexte=LEGITEXT000006068624&dateTexte=20180624), cités notamment dans le [Serment d'Hippocrate pour data scientist](https://hippocrate.tech/)) From 17d5902eac0dd862c09e8ee143ef2cba053be4df Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Wed, 24 Mar 2021 15:59:31 +0100 Subject: [PATCH 13/34] Add FactSheets360 as a resource to element 4.2 --- assessment_framework_eng.md | 1 + referentiel_evaluation.md | 1 + 2 files changed, 2 insertions(+) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index efc1533..0244979 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -646,6 +646,7 @@ The aim is to make explicit and add to the model the description of the context - (Academic paper) [Model Cards for Model Reporting](https://arxiv.org/abs/1810.03993), M. Mitchell, S. Wu, A. Zaldivar, P. Barnes, L. Vasserman, B. Hutchinson, E. Spitzer, I. D. Raji, T. Gebru, January 2019 - (Web article) [Model Cards](https://modelcards.withgoogle.com/about) from Google is an open and scalable framework, and offers 2 examples: *To explore the possibilities of model cards in the real world, we've designed examples for two features of our Cloud Vision API, Face Detection and Object Detection. They provide simple overviews of both models' ideal forms of input, visualize some of their key limitations, and present basic performance metrics.* +- (Software & Tools) *[AI FactSheets 360](https://aifs360.mybluemix.net/)*, an IBM Research project to foster trust in AI by increasing transparency and enabling governance: *Increased transparency provides information for AI consumers to better understand how the AI model or service was created. This allows a consumer of the model to determine if it is appropriate for their situation. AI Governance enables an enterprise to specify and enforce policies describing how an AI model or service should be constructed and deployed.* diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index cdccb6a..85d65a4 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -642,6 +642,7 @@ Il s'agit d'expliciter et d'adjoindre au modèle la description du contexte d'ut - (Academic paper) [Model Cards for Model Reporting](https://arxiv.org/abs/1810.03993), M. Mitchell, S. Wu, A. Zaldivar, P. Barnes, L. Vasserman, B. Hutchinson, E. Spitzer, I. D. Raji, T. Gebru, Janvier 2019 - (Web article) [Model Cards](https://modelcards.withgoogle.com/about) de Google est un framework ouvert et évolutif, et propose 2 exemples : *To explore the possibilities of model cards in the real world, we've designed examples for two features of our Cloud Vision API, Face Detection and Object Detection. They provide simple overviews of both models' ideal forms of input, visualize some of their key limitations, and present basic performance metrics.* +- (Software & Tools) *[AI FactSheets 360](https://aifs360.mybluemix.net/)* d'IBM Research est un projet visant à définir une méthodologie et des exemples pour cartographier et décrire un modèle et son cycle de vie. From 80e7309d15300876e1c6cce633f0c51afa32c721 Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Wed, 24 Mar 2021 16:03:57 +0100 Subject: [PATCH 14/34] Add 2 articles illustrating issues with facial recognition --- references.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/references.md b/references.md index bf93e28..7fd6740 100644 --- a/references.md +++ b/references.md @@ -41,7 +41,7 @@ - *[Google’s medical AI was super accurate in a lab. Real life was a different story](https://www.technologyreview.com/2020/04/27/1000658/google-medical-ai-accurate-lab-real-life-clinic-covid-diabetes-retina-disease/)*, MIT Technology Review -- Various controversies: +- Various scandals and or controversies: - [Awful AI](https://github.com/daviddao/awful-ai): a curated list to track current scary usages of AI - hoping to raise awareness to its misuses in society, David Dao @@ -51,6 +51,10 @@ - [Faulty Facial Recognition Led to His Arrest—Now He’s Suing](https://www.vice.com/en_us/article/bv8k8a/faulty-facial-recognition-led-to-his-arrestnow-hes-suing), Septembre 2020, vice.com + - [Argentina: Child Suspects’ Private Data Published Online](https://www.hrw.org/news/2020/10/09/argentina-child-suspects-private-data-published-online) - Facial Recognition System Uses Flawed Data, Poses Further Risks to Children + + - [Minneapolis prohibits use of facial recognition software by its police department](https://www.theverge.com/2021/2/13/22281523/minneapolis-prohibits-facial-recognition-software-police-privacy) + ## Travaux dans ce domaine L'*[Institute for Ethical AI & Machine Learning](https://ethical.institute)* maintient un panorama très complet des inititives réglementaires, rapports, guidelines, frameworks divers et variés en lien avec la pratique et l'usage de l'IA et la data science : voir leur repository [Awesome AI Guidelines](https://github.com/EthicalML/awesome-artificial-intelligence-guidelines#online-courses-and-learning-resources) sur Github. From b892768a8aea792480a551498375e96d57c2bd51 Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Wed, 24 Mar 2021 16:11:34 +0100 Subject: [PATCH 15/34] Add OpenDP and Opacus as differential privacy resources to elements 1.7 and 1.8 --- assessment_framework_eng.md | 4 ++++ referentiel_evaluation.md | 4 ++++ 2 files changed, 8 insertions(+) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index 0244979..072fb3a 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -197,6 +197,8 @@ The state of the art in ML security is constantly evolving. While it is impossib - (Academic paper) *[Inverting Gradients - How easy is it to break privacy in federated learning](https://arxiv.org/abs/2003.14053)*, J. Geiping, H. Bauermeister, H. Dröge, M. Moeller, 2020 - (Web article) *[Top Five ML risks](https://github.com/OWASP/Top-5-Machine-Learning-Risks/blob/master/Top%205%20Machine%20Learning%20Risks.md)*, OWASP - (Software & Tools) Tools for *differential privacy*: Google *[differential privacy library](https://github.com/google/differential-privacy)*, and the Python [PyDP](https://github.com/OpenMined/PyDP) wrapper from OpenMined +- (Software & Tools) *[OpenDP](https://opendp.org)*: *a community effort to build trustworthy, open-source software tools for statistical analysis of sensitive private data. Offers the rigorous protections of differential privacy for the individuals who may be represented in confidential data and statistically valid methods of analysis for researchers who study the data* +- (Software & Tools) *[Opacus](https://opacus.ai/)*: *a Facebook Open Source project, to enable training PyTorch models with Differential Privacy* - (Web article) The *distillation* of a model, in addition to the compression it provides, can be used as a measure to protect the model and the training data used, see for example *[Knowledge Distillation: Simplified](https://towardsdatascience.com/knowledge-distillation-simplified-dd4973dbc764)*, Towards Data Science, 2019. - (Academic paper) *[Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531)*, G. Hinton, O. Vinyals, J. Dean, 2015 - (Web article) *[Model distillation and privacy](https://www.substra.ai/en/blog/model-distillation)*, Substra Foundation blog post to introduce distillation approaches, Gijs Barmentlo, 2020 @@ -239,6 +241,8 @@ Depending on the level of risk and sensitivity of the projects, certain technica - (Academic paper) *[Inverting Gradients - How easy is it to break privacy in federated learning](https://arxiv.org/abs/2003.14053)*, J. Geiping, H. Bauermeister, H. Dröge, M. Moeller, 2020 - (Web article) *[Top Five ML risks](https://github.com/OWASP/Top-5-Machine-Learning-Risks/blob/master/Top%205%20Machine%20Learning%20Risks.md)*, OWASP - (Software & Tools) Tools for *differential privacy*: Google *[differential privacy library](https://github.com/google/differential-privacy)*, and the Python [PyDP](https://github.com/OpenMined/PyDP) wrapper from OpenMined +- (Software & Tools) *[OpenDP](https://opendp.org)*: *a community effort to build trustworthy, open-source software tools for statistical analysis of sensitive private data. Offers the rigorous protections of differential privacy for the individuals who may be represented in confidential data and statistically valid methods of analysis for researchers who study the data* +- (Software & Tools) *[Opacus](https://opacus.ai/)*: *a Facebook Open Source project, to enable training PyTorch models with Differential Privacy* - (Web article) The *distillation* of a model, in addition to the compression it provides, can be used as a measure to protect the model and the training data used, see for example *[Knowledge Distillation: Simplified](https://towardsdatascience.com/knowledge-distillation-simplified-dd4973dbc764)*, Towards Data Science, 2019. - (Academic paper) *[Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531)*, G. Hinton, O. Vinyals, J. Dean, 2015 diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index 85d65a4..05bc2cf 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -197,6 +197,8 @@ L'état de l'art de la sécurité du ML est en constante évolution. S'il est im - (Academic paper) *[Inverting Gradients - How easy is it to break privacy in federated learning?](https://arxiv.org/abs/2003.14053)*, J. Geiping, H. Bauermeister, H. Dröge, M. Moeller, 2020 - (Web article) *[Top Five ML risks](https://github.com/OWASP/Top-5-Machine-Learning-Risks/blob/master/Top%205%20Machine%20Learning%20Risks.md)*, OWASP - (Software & Tools) Outils pour la *differential privacy* : Google *[differential privacy library](https://github.com/google/differential-privacy)*, et le wrapper Python [PyDP](https://github.com/OpenMined/PyDP) d'OpenMined +- (Software & Tools) *[OpenDP](https://opendp.org)*: *a community effort to build trustworthy, open-source software tools for statistical analysis of sensitive private data. Offers the rigorous protections of differential privacy for the individuals who may be represented in confidential data and statistically valid methods of analysis for researchers who study the data* +- (Software & Tools) *[Opacus](https://opacus.ai/)*: *a Facebook Open Source project, to enable training PyTorch models with Differential Privacy* - (Web article) La *distillation* d'un modèle, en plus de la compression qu'elle apporte, peut être utilisée comme une mesure de protection du modèle et des données d'entraînement utilisées, voir par exemple *[Knowledge Distillation: Simplified](https://towardsdatascience.com/knowledge-distillation-simplified-dd4973dbc764)*, Towards Data Science, 2019 - (Academic paper) *[Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531)*, G. Hinton, O. Vinyals, J. Dean, 2015 - (Web article) *[Model distillation and privacy](https://www.substra.ai/en/blog/model-distillation)*, article de blog Substra Foundation pour présenter les approches de distillation, Gijs Barmentlo, 2020 @@ -239,6 +241,8 @@ Selon les niveaux de risque et de sensibilité des projets, certaines approches - (Academic paper) *[Inverting Gradients - How easy is it to break privacy in federated learning?](https://arxiv.org/abs/2003.14053)*, J. Geiping, H. Bauermeister, H. Dröge, M. Moeller, 2020 - (Web article) *[Top Five ML risks](https://github.com/OWASP/Top-5-Machine-Learning-Risks/blob/master/Top%205%20Machine%20Learning%20Risks.md)*, OWASP - (Software & Tools) Outils pour la *differential privacy* : Google *[differential privacy library](https://github.com/google/differential-privacy)*, et le wrapper Python [PyDP](https://github.com/OpenMined/PyDP) d'OpenMined +- (Software & Tools) *[OpenDP](https://opendp.org)*: *a community effort to build trustworthy, open-source software tools for statistical analysis of sensitive private data. Offers the rigorous protections of differential privacy for the individuals who may be represented in confidential data and statistically valid methods of analysis for researchers who study the data* +- (Software & Tools) *[Opacus](https://opacus.ai/)*: *a Facebook Open Source project, to enable training PyTorch models with Differential Privacy* - (Web article) La *distillation* d'un modèle, en plus de la compression qu'elle apporte, peut être utilisée comme une mesure de protection du modèle et des données d'entraînement utilisées, voir par exemple *[Knowledge Distillation: Simplified](https://towardsdatascience.com/knowledge-distillation-simplified-dd4973dbc764)*, Towards Data Science, 2019 - (Academic paper) *[Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531)*, G. Hinton, O. Vinyals, J. Dean, 2015 From ec254db6ba08a067f308f320de4dbe4708431130 Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Wed, 24 Mar 2021 16:20:25 +0100 Subject: [PATCH 16/34] Add Quantmetry blog article on ML lifecycle (FR only) --- references.md | 4 ++++ referentiel_evaluation.md | 1 + 2 files changed, 5 insertions(+) diff --git a/references.md b/references.md index 7fd6740..af2f8ff 100644 --- a/references.md +++ b/references.md @@ -37,6 +37,10 @@ - La *distillation* d'un modèle, en plus de la compression qu'elle apporte, peut être utilisée comme une mesure de protection du modèle et des données d'entraînement utilisées, voir par exemple *[Knowledge Distillation : Simplified](https://towardsdatascience.com/knowledge-distillation-simplified-dd4973dbc764)*, Towards Data Science, 2019, et *[Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531)*, G. Hinton, O. Vinyals, J. Dean, 2015 +- Cycle de vie complet : + + - **[En route vers le cycle de vie des modèles !](https://www.quantmetry.com/blog/premier-etape-cycle-vie-modeles/)*, G. Martinon, Janvier 2020 + - "Performance is not outcome", erreurs, crises : - *[Google’s medical AI was super accurate in a lab. Real life was a different story](https://www.technologyreview.com/2020/04/27/1000658/google-medical-ai-accurate-lab-real-life-clinic-covid-diabetes-retina-disease/)*, MIT Technology Review diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index 05bc2cf..1052269 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -517,6 +517,7 @@ Suivre l'évolution de la performance des modèles dans le temps est également - (Technical guide) *[Continuous delivery for machine learning](https://martinfowler.com/articles/cd4ml.html)*, D. Sato, A. Wider, C. Windheuser, Septembre 2019 - (Technical guide) *[Monitoring Machine Learning Models in Production - A comprehensive guide](https://christophergs.com/machine%20learning/2020/03/14/how-to-monitor-machine-learning-models/)*, Christopher Samiullah, Mars 2020 - (Web article) *[Google’s medical AI was super accurate in a lab. Real life was a different story](https://www.technologyreview.com/2020/04/27/1000658/google-medical-ai-accurate-lab-real-life-clinic-covid-diabetes-retina-disease/)*, MIT Technology Review +- (Web article) *[En route vers le cycle de vie des modèles !](https://www.quantmetry.com/blog/premier-etape-cycle-vie-modeles/)*, G. Martinon, Janvier 2020 From 3e429170d5a695010b6012e6c34162fee2a4c522 Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Wed, 24 Mar 2021 16:23:12 +0100 Subject: [PATCH 17/34] Add Pandas Profiling as resource to element 2.1 --- assessment_framework_eng.md | 1 + referentiel_evaluation.md | 1 + 2 files changed, 2 insertions(+) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index 072fb3a..0eb1296 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -308,6 +308,7 @@ It is a question of ensuring that oneself considers these subjects and therefore - (Web article) *[Hidden Bias](https://pair.withgoogle.com/explorables/hidden-bias/)* explorable from [PAIR](https://pair.withgoogle.com/) - (Technical guide) *[Tour of Data Sampling Methods for Imbalanced Classification](https://machinelearningmastery.com/data-sampling-methods-for-imbalanced-classification/)* +- (Software & Tools) *[Pandas Profiling](https://github.com/pandas-profiling/pandas-profiling)*: *Create HTML profiling reports from pandas `DataFrame` objects. The pandas `df.describe()` function is great but a little basic for serious exploratory data analysis. `pandas_profiling` extends the pandas `DataFrame` with `df.profile_report()` for quick data analysis diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index 1052269..1b42dab 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -308,6 +308,7 @@ Il s'agit de s'obliger à s'interroger sur ces sujets et donc à réfléchir aux - (Web article) *[Hidden Bias](https://pair.withgoogle.com/explorables/hidden-bias/)* explorable from [PAIR](https://pair.withgoogle.com/) - (Technical guide) *[Tour of Data Sampling Methods for Imbalanced Classification](https://machinelearningmastery.com/data-sampling-methods-for-imbalanced-classification/)* +- (Software & Tools) *[Pandas Profiling](https://github.com/pandas-profiling/pandas-profiling)*: *Create HTML profiling reports from pandas `DataFrame` objects. The pandas `df.describe()` function is great but a little basic for serious exploratory data analysis. `pandas_profiling` extends the pandas `DataFrame` with `df.profile_report()` for quick data analysis From 1b35d4b1e1b4768738147616b77e341b1e075ce5 Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Wed, 21 Apr 2021 17:44:20 +0200 Subject: [PATCH 18/34] Init 2021H1 release candidate branch --- assessment_framework_eng.md | 2 ++ referentiel_evaluation.md | 2 ++ 2 files changed, 4 insertions(+) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index ac901ea..c20a38a 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -2,6 +2,8 @@ The [evaluation framework](#evaluation-framework-to-assess-the-maturity-of-an-organisation) below is the result of the participatory work initiated in the spring of 2019 by Substra Foundation and ongoing since then. It is based on the identification of the risks that we are trying to prevent by aiming for a responsible and trustworthy practice of data science, and best practices to mitigate them. It also brings together for each topic technical resources that can be good entry points for interested organisations. +Last update: 1st semester 2021. + ## Evaluation framework to assess the maturity of an organisation The evaluation is composed of the following 6 sections: diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index f10e578..92a51a0 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -2,6 +2,8 @@ Le [référentiel d'évaluation](#référentiel-dévaluation-de-la-maturité-dune-organisation) ci-dessous est le fruit travail participatif initié au printemps 2019 par Substra Foundation et en cours depuis. Il procède de l'identification des [risques](#risques) que l'on cherche à prévenir en visant une pratique responsable et de confiance de la data science, et des bonnes pratiques qui permettent d'y faire face. Il regroupe également pour chaque sujet des ressources techniques qui peuvent être de bons points d'entrée pour les organisations intéressées. +Dernière mise à jour : 1er semestre 2021. + ## Référentiel d'évaluation de la maturité d'une organisation L'évaluation est composée des 6 sections suivantes : From 110340c1aba018cde7f905b7af417b5493714643 Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Mon, 10 May 2021 16:00:55 +0200 Subject: [PATCH 19/34] Add an intermediate answer item for orgs planning to audit or certify their compliance --- assessment_framework_eng.md | 3 ++- referentiel_evaluation.md | 5 +++-- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index d9fb4a1..07b980b 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -110,7 +110,8 @@ _(Type: single answer)_ _(Select one answer only, which best corresponds to the level of maturity of the organisation on this topic)_ - [ ] 1.4.a Yes -- [ ] 1.4.b No +- [ ] 1.4.b We are currently preparing an upcoming audit or certification of our organisation's compliance with personal and confidential data requirements +- [ ] 1.4.c No
Expl1.4 : diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index f90a153..6a1800b 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -110,12 +110,13 @@ _(Type : réponse unique)_ _(Sélectionner une seule réponse, correspondant le mieux au niveau de maturité de l'organisation sur ce sujet)_ - [ ] 1.4.a Oui -- [ ] 1.4.b Non +- [ ] 1.4.b Nous préparons actuellement l'audit ou la certification de la conformité de notre organisation aux exigences relatives aux données personnelles et confidentielles +- [ ] 1.4.c Non
Expl1.4 : -Dans de nombreux secteurs il existe des exigences de conformité spécifiques. Il est généralement possible de formaliser la conformité d'une organisation par une certification ou un audit spécialisé, l'obtention d'un label. +Dans de nombreux secteurs il existe des exigences de conformité spécifiques. Il est généralement possible de formaliser la conformité d'une organisation par une certification, un audit spécialisé ou l'obtention d'un label.
From dd6940dc5d6ee59e4c693e72810fc444f4bbd16b Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Mon, 10 May 2021 16:05:52 +0200 Subject: [PATCH 20/34] Fix typos --- references.md | 2 +- referentiel_evaluation.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/references.md b/references.md index af2f8ff..1b578f9 100644 --- a/references.md +++ b/references.md @@ -61,7 +61,7 @@ ## Travaux dans ce domaine -L'*[Institute for Ethical AI & Machine Learning](https://ethical.institute)* maintient un panorama très complet des inititives réglementaires, rapports, guidelines, frameworks divers et variés en lien avec la pratique et l'usage de l'IA et la data science : voir leur repository [Awesome AI Guidelines](https://github.com/EthicalML/awesome-artificial-intelligence-guidelines#online-courses-and-learning-resources) sur Github. +L'*[Institute for Ethical AI & Machine Learning](https://ethical.institute)* maintient un panorama très complet des initiatives réglementaires, rapports, guidelines, frameworks divers et variés en lien avec la pratique et l'usage de l'IA et la data science : voir leur repository [Awesome AI Guidelines](https://github.com/EthicalML/awesome-artificial-intelligence-guidelines#online-courses-and-learning-resources) sur Github. ### Méta-études diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index f90a153..71cfe9a 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -1,6 +1,6 @@ # Data science responsable et de confiance - Référentiel d'évaluation -Le [référentiel d'évaluation](#référentiel-dévaluation-de-la-maturité-dune-organisation) ci-dessous est le fruit travail participatif initié au printemps 2019 par Substra Foundation et en cours depuis. Il procède de l'identification des [risques](#risques) que l'on cherche à prévenir en visant une pratique responsable et de confiance de la data science, et des bonnes pratiques qui permettent d'y faire face. Il regroupe également pour chaque sujet des ressources techniques qui peuvent être de bons points d'entrée pour les organisations intéressées. +Le [référentiel d'évaluation](#référentiel-dévaluation-de-la-maturité-dune-organisation) ci-dessous est le fruit du travail participatif initié au printemps 2019 par Substra Foundation et en cours depuis. Il procède de l'identification des [risques](#risques) que l'on cherche à prévenir en visant une pratique responsable et de confiance de la data science, et des bonnes pratiques qui permettent d'y faire face. Il regroupe également pour chaque sujet des ressources techniques qui peuvent être de bons points d'entrée pour les organisations intéressées. Dernière mise à jour : 1er semestre 2021. From 5ef2490133a91e6a3a9db4b377b88b0594b06793 Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Mon, 10 May 2021 16:22:22 +0200 Subject: [PATCH 21/34] Clarify the global context and output of the initiative --- README.md | 8 ++++++-- references.md | 2 +- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 645c521..31d94f7 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,10 @@ *Note: although kickstarted in French, this work has been translated in English and will be updated in both languages from January 2021 onwards. Follow [this link](./assessment_framework_eng.md) to access the assessment in English.* +## Résumé rapide + +Ce dépôt de fichiers héberge le référentiel cadre de la data science responsable et de confiance (aussi dit *assessment*), élaboré de manière participative dans le cadre de l'initiative du même nom initiée par Substra Foundation en 2019. Il regroupe également les notes des [ateliers-meetups](https://www.meetup.com/fr-FR/data-science-responsable-et-de-confiance/) qui jalonnent cette initiative, co-animés par Substra Foundation et Dataforgood. + ## Navigation dans le repository `/` @@ -27,9 +31,9 @@ En s'appuyant sur les travaux, cadres et corpus existants, **nous travaillons de ### Une initiative de plus ? -Pourquoi cette initiative, dans un univers qui voit déjà émerger un certain nombre de travaux ? Nous tenons à jour [la liste des travaux](./references.md#travaux-dans-ce-domaine) que nous avons identifiés. Ils sont tous intéressants, inspirants, utiles. Beaucoup proposent des _guidelines_, des engagements à prendre, traitent de l'éthique de l'usage de technologies d'IA. Certains explorent des voies nouvelles : licences spécifiques aux modèles prédictifs, plateforme d'analyse de risque... Mais à ce stade aucun ne nous a semblé couvrir complètement les points suivants : +Pourquoi cette initiative, dans un univers qui voyait déjà en 2019, et voit encore plus aujourd'hui, émerger un certain nombre de travaux ? Nous tenons à jour [une liste de travaux](./references.md#travaux-dans-ce-domaine) que nous avons identifiés. Ils sont tous intéressants, inspirants, utiles. Beaucoup proposent des _guidelines_, des chartes, des engagements à prendre, traitent de l'éthique de l'usage de technologies d'IA. Certains explorent des voies nouvelles : licences spécifiques aux modèles prédictifs, plateforme d'analyse de risque... Mais à ce stade aucun ne nous a semblé répondre aux deux exigences suivantes : -1. s'intéresser à **l'activité data science d'une organisation** (comme ensemble de pratiques, de processus, de méthodes...), au cycle de vie complet d'un modèle ; +1. porter sur toute **l'activité data science d'une organisation** (comme ensemble de pratiques, de processus, de méthodes...), par opposition à porter sur l'élaboration d'un modèle/système d'IA ou le pilotage d'un projet ; 1. être fait **pour être utilisé comme un outil concret d'évaluation** de la maturité de l'organisation. diff --git a/references.md b/references.md index 1b578f9..422841a 100644 --- a/references.md +++ b/references.md @@ -158,6 +158,6 @@ Management: Towards an Open-Access Standard Protocol](https://aiforsocialgood.gi ## Notes et observations - Beaucoup de travaux s'intéressent à l'éthique par les usages et par la non-reproduction de discrimination -- Il y a cependant très peu de choses sur comment un modèle est élaboré (voir le [papier de Quantum Black](https://aiforsocialgood.github.io/icml2019/accepted/track2/pdfs/32_aisg_icml2019.pdf)) +- On trouve en revanche peu de choses sur le cycle de vie de l'élaboration d'un modèle (voir par exemple le [papier de Quantum Black](https://aiforsocialgood.github.io/icml2019/accepted/track2/pdfs/32_aisg_icml2019.pdf)) - Le plus complet est peut-être le questionnaire d'évaluation de l'UE, mais il est loin d'être actionnable, opérationnel (63 questions dont de nombreuses sont des questions très ouvertes), et son processus d'élaboration et d'évolution est relativement fermé - Des référentiels de la sécurité des systèmes d'information, bien plus généraux, pourraient être utilisés comme références pour éviter d'être redondant sur certains points. Par exemple le [guide de la sécurité des données personnelles](https://www.cnil.fr/fr/principes-cles/guide-de-la-securite-des-donnees-personnelles) de la CNIL. From 4206f71b20bfb4aed751693f48d30a3867c04a9f Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Mon, 10 May 2021 16:37:31 +0200 Subject: [PATCH 22/34] Enhance explanation of eval elements 1.7 and 1.8 --- assessment_framework_eng.md | 4 ++-- referentiel_evaluation.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index d9fb4a1..e940a04 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -185,7 +185,7 @@ _(Select one answer only, which best corresponds to the level of maturity of the
Expl1.7 : -The state of the art in ML security is constantly evolving. While it is impossible to guard against all vulnerabilities at all times, it is crucial to be aware of them and to keep a watch on them. The article [Demystifying the Membership Inference Attack](https://medium.com/disaitek/demystifying-the-membership-inference-attack-e33e510a0c39) is for example an interesting entry point in the context of sensitive data. +The state of the art in ML security is constantly evolving. If data scientists are now familiar in general with the membership inference attack (see proposed resources), new ones are being published regularly. While it is impossible to guard against all vulnerabilities at all times, it is crucial to be aware of them and to keep a watch on them. The article [Demystifying the Membership Inference Attack](https://medium.com/disaitek/demystifying-the-membership-inference-attack-e33e510a0c39) is for example an interesting entry point in the context of sensitive data.
@@ -227,7 +227,7 @@ _(Select all the answer items that correspond to practices in your organisation)
Expl1.8 : -The state of the art in ML security is constantly evolving. While it is impossible to guard against all vulnerabilities at all times, it is crucial to be aware of them and to keep a watch on them. The article [Demystifying the Membership Inference Attack](https://medium.com/disaitek/demystifying-the-membership-inference-attack-e33e510a0c39) is for example an interesting entry point in the context of sensitive data. +TThe state of the art in ML security is constantly evolving. If data scientists are now familiar in general with the membership inference attack (see proposed resources), new ones are being published regularly. While it is impossible to guard against all vulnerabilities at all times, it is crucial to be aware of them and to keep a watch on them. The article [Demystifying the Membership Inference Attack](https://medium.com/disaitek/demystifying-the-membership-inference-attack-e33e510a0c39) is for example an interesting entry point in the context of sensitive data. Depending on the level of risk and sensitivity of the projects, certain technical approaches to guard against them will be selected and implemented. It is important to follow the evolution of research and state-of-the-art practices, and to document the choices made. The notion of "end-to-end genealogy" is introduced here. diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index 71cfe9a..4096c93 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -185,7 +185,7 @@ _(Sélectionner une seule réponse, correspondant le mieux au niveau de maturit
Expl1.7 : -L'état de l'art de la sécurité du ML est en constante évolution. S'il est impossible de se prémunir contre toutes les vulnérabilités à tout instant, il est crucial de s'en préoccuper et d'organiser une veille. L'article [Demystifying the Membership Inference Attack](https://medium.com/disaitek/demystifying-the-membership-inference-attack-e33e510a0c39) est par exemple un point d'entrée intéressant dans un contexte de données sensibles. +L'état de l'art de la sécurité du ML est en constante évolution, et si la *membership inference attack* est maintenant relativement connue (voir ressources proposées), d'autres sont publiées régulièrement. S'il est impossible de se prémunir contre toutes les vulnérabilités à tout instant, il est crucial de s'en préoccuper et d'organiser une veille. L'article [Demystifying the Membership Inference Attack](https://medium.com/disaitek/demystifying-the-membership-inference-attack-e33e510a0c39) est par exemple un point d'entrée intéressant dans un contexte de données sensibles.
@@ -227,7 +227,7 @@ _(Sélectionner tous les éléments de réponse correspondant à des pratiques d
Expl1.8 : -L'état de l'art de la sécurité du ML est en constante évolution. S'il est impossible de se prémunir contre toutes les vulnérabilités à tout instant, il est crucial de s'en préoccuper et d'organiser une veille. L'article [Demystifying the Membership Inference Attack](https://medium.com/disaitek/demystifying-the-membership-inference-attack-e33e510a0c39) est par exemple un point d'entrée intéressant dans un contexte de données sensibles. +L'état de l'art de la sécurité du ML est en constante évolution, et si la *membership inference attack* est maintenant relativement connue (voir ressources proposées), d'autres sont publiées régulièrement. S'il est impossible de se prémunir contre toutes les vulnérabilités à tout instant, il est crucial de s'en préoccuper et d'organiser une veille. L'article [Demystifying the Membership Inference Attack](https://medium.com/disaitek/demystifying-the-membership-inference-attack-e33e510a0c39) est par exemple un point d'entrée intéressant dans un contexte de données sensibles. Selon les niveaux de risque et de sensibilité des projets, certaines approches techniques pour s'en prémunir seront sélectionnées et implémentées. Il est important de suivre l'évolution de l'état de l'art et des pratiques, et de documenter les choix réalisés. On introduit ici la notion de "généalogie de bout-en-bout". From 315437577fb143cfd7b5d5b1fe28cd92c3830b36 Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Mon, 10 May 2021 17:53:55 +0200 Subject: [PATCH 23/34] Create a new eval element 2.4 on links between modelisation choices and bias --- assessment_framework_eng.md | 37 +++++++++++++++++++++++++++++++++++++ referentiel_evaluation.md | 37 +++++++++++++++++++++++++++++++++++++ 2 files changed, 74 insertions(+) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index d9fb4a1..6a81fbf 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -379,6 +379,43 @@ Complement on the use of synthetic data and _data augmentation_, _re-weighting_
+--- + +Q2.4 : **Links between modelisation choices and bias** +_(Condition : R2.2 <> 2.2.b)_ +Recent work has shown the role that modeling and learning choices can play in the formation of discriminatory bias. Differential privacy, compression, the choice of the learning rate, early stopping mechanisms for example can have disproportionate impacts on certain subgroups. Within your organisation, the general level of knowledge of collaborators working on data science projects on this topic is: + +R2.4 : +_(Type: single answer)_ +_(Select one answer only, which best corresponds to the level of maturity of the organisation on this topic)_ + +- [ ] 2.4.a Complete beginner +- [ ] 2.4.b Basic +- [ ] 2.4.c Confirmed +- [ ] 2.4.d Expert + +
+Expl2.4 : + +If datasets used to train and evaluate a model require a particular attention to prevent discriminatory biases, recent work show that modeling choices have to be taken into account too. The article *"Moving beyond “algorithmic bias is a data problem”"* suggested in resources synthesizes very well how the learning algorithm, the model structure, adding or not differential privacy, compression, etc. can have consequences on the fairness of a model. Extracts: + +> - *A key reason why model design choices amplify algorithmic bias is because notions of fairness often coincide with how underrepresented protected features are treated by the model* +> - [...] *design choices to optimize for either privacy guarantees or compression amplify the disparate impact between minority and majority data subgroups* +> - [...] *the impact of popular compression techniques like quantization and pruning on low-frequency protected attributes such as gender and age and finds that these subgroups are systematically and disproportionately impacted in order to preserve performance on the most frequent features* +> - [...] *learning rate and length of training can also disproportionately impact error rates on the long-tail of the dataset. Work on memorization properties of deep neural networks shows that challenging and underrepresented features are learnt later in the training process and that the learning rate impacts what is learnt. Thus, early stopping and similar hyper-parameter choices disproportionately and systematically impact a subset of the data distribution.* + +These topics require a strong expertise and few practitioners are familiar with them yet. In the context of this evaluation element, the recommendation is to learn about them and become aware of the complex trade-offs it implies, consider them during concrete projects rather than hiding them away, and follow how the state-of-the-art evolves and what best practices emerge. + +
+ +
+Resources2.4 : + +- (Academic paper) *[Moving beyond “algorithmic bias is a data problem”](https://www.cell.com/patterns/fulltext/S2666-3899(21)00061-1)*, Sara Hooker, Opinion, April 2021 +- (Academic paper) *[Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings](https://arxiv.org/abs/1607.06520)*, T. Bolukbasi, K.-W. Chang, J. Zou, V. Saligrama, A. Kalai, 2016 + +
+ --- --- diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index f90a153..c31dbcb 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -379,6 +379,43 @@ Complément sur l'utilisation de données synthétiques et d'approches de _data
+--- + +Q2.4 : **Liens entre les choix de modélisation et les biais** +_(Condition : R2.2 <> 2.2.b)_ +Des travaux récents mettent en évidence le rôle que peuvent jouer les choix de modélisation et d'apprentissage dans la formation de biais discriminatoires. Les techniques de renforcement de la confidentialité, la compression, le choix du *learning rate* ou les mécanismes d'*early stopping* par exemple peuvent contribuer à défavoriser certains sous-groupes de manière disproportionnée. Prévenir ces derniers n'est donc pas qu'une question de jeu de données. Au sein de votre organisation, sur ce sujet le niveau de connaissance générale des collaborateurs intervenant sur les projets de data science est : + +R2.4 : +_(Type : réponse unique)_ +_(Sélectionner une seule réponse, correspondant le mieux au niveau de maturité de l'organisation sur ce sujet)_ + +- [ ] 2.4.a Complètement débutant +- [ ] 2.4.b Basique +- [ ] 2.4.c Confirmé +- [ ] 2.4.d Expert + +
+Expl2.4 : + +Si les jeux de données utilisés pour entraîner et évaluer un modèle requièrent une réflexion particulière pour prévenir les biais discriminatoires, des travaux récents montrent qu'il en va de même pour les choix de modélisation. Comme le synthétise très bien l'article *Moving beyond “algorithmic bias is a data problem”* proposé dans les ressources, les paramètres de l'algorithme d'apprentissage, la structure du modèle, l'adjonction ou non de confidentialité différentielle, la compression éventuelle, etc. peuvent avoir des conséquences sur la *fairness* d'un modèle. Extraits : + +> - *A key reason why model design choices amplify algorithmic bias is because notions of fairness often coincide with how underrepresented protected features are treated by the model* +> - [...] *design choices to optimize for either privacy guarantees or compression amplify the disparate impact between minority and majority data subgroups* +> - [...] *the impact of popular compression techniques like quantization and pruning on low-frequency protected attributes such as gender and age and finds that these subgroups are systematically and disproportionately impacted in order to preserve performance on the most frequent features* +> - [...] *learning rate and length of training can also disproportionately impact error rates on the long-tail of the dataset. Work on memorization properties of deep neural networks shows that challenging and underrepresented features are learnt later in the training process and that the learning rate impacts what is learnt. Thus, early stopping and similar hyper-parameter choices disproportionately and systematically impact a subset of the data distribution.* + +Ces sujets étant très techniques, encore peu diffusés et connus des praticiens, il s'agit dans le cadre de cet élément d'évaluation de s'y acculturer, s'en préoccuper dans les projets et ne pas occulter le sujet, et suivre l'état de l'art et les bonnes pratiques qui émergeront. + +
+ +
+Ressources2.4 : + +- (Academic paper) *[Moving beyond “algorithmic bias is a data problem”](https://www.cell.com/patterns/fulltext/S2666-3899(21)00061-1)*, Sara Hooker, Opinion, Avril 2021 +- (Academic paper) *[Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings](https://arxiv.org/abs/1607.06520)*, T. Bolukbasi, K.-W. Chang, J. Zou, V. Saligrama, A. Kalai, 2016 + +
+ --- --- From aacf66fd93ae49dd49eb349a6058f689865f149d Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Tue, 25 May 2021 17:06:09 +0200 Subject: [PATCH 24/34] Add another answer item to 1.4, on having at least 1 project certified --- assessment_framework_eng.md | 5 +++-- referentiel_evaluation.md | 5 +++-- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index 07b980b..78741dd 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -110,8 +110,9 @@ _(Type: single answer)_ _(Select one answer only, which best corresponds to the level of maturity of the organisation on this topic)_ - [ ] 1.4.a Yes -- [ ] 1.4.b We are currently preparing an upcoming audit or certification of our organisation's compliance with personal and confidential data requirements -- [ ] 1.4.c No +- [ ] 1.4.b No +- [ ] 1.4.c We are currently preparing an upcoming audit or certification of our organisation's compliance with personal and confidential data requirements +- [ ] 1.4.d Not the the organization level, but it is the case for at least one project
Expl1.4 : diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index 6a1800b..13f2c33 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -110,8 +110,9 @@ _(Type : réponse unique)_ _(Sélectionner une seule réponse, correspondant le mieux au niveau de maturité de l'organisation sur ce sujet)_ - [ ] 1.4.a Oui -- [ ] 1.4.b Nous préparons actuellement l'audit ou la certification de la conformité de notre organisation aux exigences relatives aux données personnelles et confidentielles -- [ ] 1.4.c Non +- [ ] 1.4.b Non +- [ ] 1.4.c Pas encore, nous préparons actuellement l'audit ou la certification de la conformité de notre organisation aux exigences relatives aux données personnelles et confidentielles +- [ ] 1.4.d Pas au niveau de l'organisation, mais c'est en revanche le cas pour un projet au moins
Expl1.4 : From acf2dd7d425a3abe089dae9b6cf00123138d46ff Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Tue, 25 May 2021 17:40:07 +0200 Subject: [PATCH 25/34] Add an answer item to 1.5 on pseudonymising some features with identifiers and/or splitting datasets in multiple tables --- assessment_framework_eng.md | 1 + referentiel_evaluation.md | 1 + 2 files changed, 2 insertions(+) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index e940a04..a0c3ba4 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -133,6 +133,7 @@ _(Specific risk domain: use of personal or confidential data)_ - [ ] 1.5.b We need to use personal or confidential data in certain projects and the data minimisation principle is then systematically applied - [ ] 1.5.c Employees are aware of the data minimisation principle and generally apply it - [ ] 1.5.d The "who can do the most can do the least" reflex with regard to data still exists here and there within our organisation. In some projects, we keep datasets that are much richer in personal and confidential data than what is strictly useful to the project +- [ ] 1.5.e Employees are aware of the data minimisation principle, but it is not applied as a general standard. However, we give a particular attention to implementing personal data-related risk mitigation measures (i.e. pseudonymising some features by identifiers with a separate correspondence table, split datasets in multiple tables kept apart)
Expl1.5 : diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index 4096c93..71c62e8 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -133,6 +133,7 @@ _(Domaine de risque spécifique : utilisation de données personnelles ou confid - [ ] 1.5.b Nous avons besoin d'en utiliser dans certains projets et le principe de minimisation est alors systématiquement appliqué - [ ] 1.5.c Le principe de minimisation est connu des collaborateurs, qui l'appliquent en général - [ ] 1.5.d Le réflexe "qui peut le plus peut le moins" vis-à-vis des données existe encore ici et là au sein de notre organisation. Dans certains projets, nous conservons des jeux de données beaucoup plus riches en données personnelles et confidentielles que ce qui est strictement utile au projet +- [ ] 1.5.e Le principe de minimisation est connu des collaborateurs, mais son application n'est pas la norme. En revanche, nous apportons une attention particulière à mettre en oeuvre des mesures de limitation des risques pour les données à caractère personnel (par exemple : pseudonymiser certaines features par des identifiants avec une table de correspondance séparée, éclater les données en plusieurs bases ou tables réparties)
Expl1.5 : From 5ba6a04584b5ee59482959aa572163a6f3b75a37 Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Tue, 25 May 2021 17:50:28 +0200 Subject: [PATCH 26/34] Add a reference on an ML vulnerability linked to pickle files --- assessment_framework_eng.md | 1 + referentiel_evaluation.md | 1 + 2 files changed, 2 insertions(+) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index d8290b6..237261a 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -207,6 +207,7 @@ The state of the art in ML security is constantly evolving. While it is impossib - (Web article) The *distillation* of a model, in addition to the compression it provides, can be used as a measure to protect the model and the training data used, see for example *[Knowledge Distillation: Simplified](https://towardsdatascience.com/knowledge-distillation-simplified-dd4973dbc764)*, Towards Data Science, 2019. - (Academic paper) *[Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531)*, G. Hinton, O. Vinyals, J. Dean, 2015 - (Web article) *[Model distillation and privacy](https://www.substra.ai/en/blog/model-distillation)*, Substra Foundation blog post to introduce distillation approaches, Gijs Barmentlo, 2020 +- (Web article) *[Never a dill moment: Exploiting machine learning pickle files](https://blog.trailofbits.com/2021/03/15/never-a-dill-moment-exploiting-machine-learning-pickle-files/)*, Trail of Bits, March 2021: exposition of a vulnerability of ML models using pickle for objects storage
diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index 8c83a86..6a13dcc 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -207,6 +207,7 @@ L'état de l'art de la sécurité du ML est en constante évolution. S'il est im - (Web article) La *distillation* d'un modèle, en plus de la compression qu'elle apporte, peut être utilisée comme une mesure de protection du modèle et des données d'entraînement utilisées, voir par exemple *[Knowledge Distillation: Simplified](https://towardsdatascience.com/knowledge-distillation-simplified-dd4973dbc764)*, Towards Data Science, 2019 - (Academic paper) *[Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531)*, G. Hinton, O. Vinyals, J. Dean, 2015 - (Web article) *[Model distillation and privacy](https://www.substra.ai/en/blog/model-distillation)*, article de blog Substra Foundation pour présenter les approches de distillation, Gijs Barmentlo, 2020 +- (Web article) *[Never a dill moment: Exploiting machine learning pickle files](https://blog.trailofbits.com/2021/03/15/never-a-dill-moment-exploiting-machine-learning-pickle-files/)*, Trail of Bits, Mars 2021 : exposition d'une vulnérabilité des modèles de ML utilisant *pickle* pour le stockage d'objets
From 9898651e0c2a2098f4a482e7d9704569a1ab29c2 Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Tue, 25 May 2021 17:54:36 +0200 Subject: [PATCH 27/34] Add Salesforce's Model Cards published examples as a reference to element 4.2 --- assessment_framework_eng.md | 1 + referentiel_evaluation.md | 1 + 2 files changed, 2 insertions(+) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index 237261a..6366738 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -698,6 +698,7 @@ The aim is to make explicit and add to the model the description of the context - (Academic paper) [Model Cards for Model Reporting](https://arxiv.org/abs/1810.03993), M. Mitchell, S. Wu, A. Zaldivar, P. Barnes, L. Vasserman, B. Hutchinson, E. Spitzer, I. D. Raji, T. Gebru, January 2019 - (Web article) [Model Cards](https://modelcards.withgoogle.com/about) from Google is an open and scalable framework, and offers 2 examples: *To explore the possibilities of model cards in the real world, we've designed examples for two features of our Cloud Vision API, Face Detection and Object Detection. They provide simple overviews of both models' ideal forms of input, visualize some of their key limitations, and present basic performance metrics.* +- (Web article) *[Model Cards for AI Model Transparency](https://blog.einstein.ai/model-cards-for-ai-model-transparency/)*, Salesforce: examples of *Model Cards* used and published by Salesforce - (Software & Tools) *[AI FactSheets 360](https://aifs360.mybluemix.net/)*, an IBM Research project to foster trust in AI by increasing transparency and enabling governance: *Increased transparency provides information for AI consumers to better understand how the AI model or service was created. This allows a consumer of the model to determine if it is appropriate for their situation. AI Governance enables an enterprise to specify and enforce policies describing how an AI model or service should be constructed and deployed.*
diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index 6a13dcc..96cf955 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -695,6 +695,7 @@ Il s'agit d'expliciter et d'adjoindre au modèle la description du contexte d'ut - (Academic paper) [Model Cards for Model Reporting](https://arxiv.org/abs/1810.03993), M. Mitchell, S. Wu, A. Zaldivar, P. Barnes, L. Vasserman, B. Hutchinson, E. Spitzer, I. D. Raji, T. Gebru, Janvier 2019 - (Web article) [Model Cards](https://modelcards.withgoogle.com/about) de Google est un framework ouvert et évolutif, et propose 2 exemples : *To explore the possibilities of model cards in the real world, we've designed examples for two features of our Cloud Vision API, Face Detection and Object Detection. They provide simple overviews of both models' ideal forms of input, visualize some of their key limitations, and present basic performance metrics.* +- (Web article) *[Model Cards for AI Model Transparency](https://blog.einstein.ai/model-cards-for-ai-model-transparency/)*, Salesforce : exemples de *Model Cards* utilisées et publiées par Salesforce - (Software & Tools) *[AI FactSheets 360](https://aifs360.mybluemix.net/)* d'IBM Research est un projet visant à définir une méthodologie et des exemples pour cartographier et décrire un modèle et son cycle de vie.
From c5ddf493c5147d3d4625d428950f500abfb7f836 Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Tue, 25 May 2021 18:11:16 +0200 Subject: [PATCH 28/34] Add an answer item to 5.5 on public AI registers --- assessment_framework_eng.md | 3 +++ referentiel_evaluation.md | 3 +++ 2 files changed, 6 insertions(+) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index 6366738..74fea03 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -953,6 +953,7 @@ _(Specific risk domain: use of predictive models, provision or operation of pred - [ ] 5.5.c An information notice is made available in the terms and conditions of the system or an equivalent document, freely accessible - [ ] 5.5.d The system or service is explicit to the user that a predictive model is being used - [ ] 5.5.e The system or service provides the user with additional information on the results it would have provided in slightly different scenarios (e.g. "counterfactual explanations" such as the smallest change in input data that would have resulted in a given different output) +- [ ] 5.5.f We are pionneers in using public AI registers, enabling us to provide transparency to our stakeholders and to capture user feedbacks
Expl5.5 : @@ -966,6 +967,8 @@ Using automatic systems based on models whose rules have been "learned" (and not - (Academic paper) *[Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR](https://arxiv.org/abs/1711.00399)*, S. Wachter, B. Mittelstadt, C. Russell, 2018 - (Technical guide) *[Interpretable Machine Learning - Counterfactual explanations](https://christophm.github.io/interpretable-ml-book/counterfactual.html)*, C. Molnar, 2020 +- (Web article) *[AI registers: finally, a tool to increase transparency in AI/ML](https://towardsdatascience.com/ai-registers-finally-a-tool-to-increase-transparency-in-ai-ml-f5694b1e317d)*, Natalia Modjeska, December 2020 +- (Whitepaper) *[Public AI Registers: Realising AI transparency and civic participation in government use of AI](https://uploads-ssl.webflow.com/5c8abedb10ed656ecfb65fd9/5f6f334b49d5444079726a79_AI%20Registers%20-%20White%20paper%201.0.pdf)*, Saidot, Septembre 2020
diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index 96cf955..a529dee 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -950,6 +950,7 @@ _(Domaine de risque spécifique : utilisation de modèles prédictifs pour son p - [ ] 5.5.c Une notice d'information est mise à disposition dans les conditions générales d'utilisation du système ou un document équivalent, en libre accès - [ ] 5.5.d Le système ou le service est explicite vis-à-vis de l'utilisateur quant au fait qu'un modèle prédictif est utilisé - [ ] 5.5.e Le système ou le service propose à l'utilisateur des informations supplémentaires sur les résultats qu'il aurait fourni dans des cas de figure légèrement différents (par exemple des "explications contrefactuelles" comme le plus petit changement dans les données d'entrée qui aurait permis d'arriver à une sortie donnée) +- [ ] 5.5.f Nous sommes pionniers dans l'utilisation de registres publics pour les modèles d'IA, qui nous permettent de fournir de la transparence à nos parties prenantes et également de capter des retours utilisateurs
Expl5.5 : @@ -963,6 +964,8 @@ Utiliser des systèmes automatiques basés sur des modèles dont les règles ont - (Academic paper) *[Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR](https://arxiv.org/abs/1711.00399)*, S. Wachter, B. Mittelstadt, C. Russell, 2018 - (Technical guide) *[Interpretable Machine Learning - Counterfactual explanations](https://christophm.github.io/interpretable-ml-book/counterfactual.html)*, C. Molnar, 2020 +- (Web article) *[AI registers: finally, a tool to increase transparency in AI/ML](https://towardsdatascience.com/ai-registers-finally-a-tool-to-increase-transparency-in-ai-ml-f5694b1e317d)*, Natalia Modjeska, Décembre 2020 +- (Whitepaper) *[Public AI Registers: Realising AI transparency and civic participation in government use of AI](https://uploads-ssl.webflow.com/5c8abedb10ed656ecfb65fd9/5f6f334b49d5444079726a79_AI%20Registers%20-%20White%20paper%201.0.pdf)*, Saidot, Septembre 2020
From f38191032db0e9b28fa076439d0dbb45c2ed7b88 Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Tue, 25 May 2021 18:17:41 +0200 Subject: [PATCH 29/34] Add Counterfit as a key reference to elements 1.7 and 1.8 --- assessment_framework_eng.md | 2 ++ referentiel_evaluation.md | 2 ++ 2 files changed, 4 insertions(+) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index 74fea03..c63561e 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -194,6 +194,7 @@ The state of the art in ML security is constantly evolving. While it is impossib
Ressources1.7 : +- (Software & Tools) *[AI security risk assessment using Counterfit](https://www.microsoft.com/security/blog/2021/05/03/ai-security-risk-assessment-using-counterfit/)*, Microsoft, May 2021 : Counterfit is an open source tool enabling testing different attacks on ML models to identify their vulnerabilities. [Link](https://github.com/Azure/counterfit/) to GitHub repo - (Technical guide) *[Privacy Enhancing Technologies Decision Tree (v2)](http://www.private-ai.ca/PETs_Decision_Tree.svg)*, Private AI, 2020 - (Web article) *[The secret-sharer: evaluating and testing unintended memorization in neural networks](https://blog.acolyer.org/2019/09/23/the-secret-sharer/)*, A. Colyer, 2019 - (Academic paper) *[Membership Inference Attacks against Machine Learning Models](https://arxiv.org/abs/1610.05820)*, R. Shokri, M. Stronati, C. Song, V. Shmatikov, 2017 @@ -239,6 +240,7 @@ Depending on the level of risk and sensitivity of the projects, certain technica
Resources1.8 : +- (Software & Tools) *[AI security risk assessment using Counterfit](https://www.microsoft.com/security/blog/2021/05/03/ai-security-risk-assessment-using-counterfit/)*, Microsoft, May 2021 : Counterfit is an open source tool enabling testing different attacks on ML models to identify their vulnerabilities. [Link](https://github.com/Azure/counterfit/) to GitHub repo - (Technical guide) *[Privacy Enhancing Technologies Decision Tree (v2)](http://www.private-ai.ca/PETs_Decision_Tree.svg)*, Private AI, 2020 - (Web article) *[The secret-sharer: evaluating and testing unintended memorization in neural networks](https://blog.acolyer.org/2019/09/23/the-secret-sharer/)*, A. Colyer, 2019 - (Academic paper) *[Membership Inference Attacks against Machine Learning Models](https://arxiv.org/abs/1610.05820)*, R. Shokri, M. Stronati, C. Song, V. Shmatikov, 2017 diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index a529dee..ad90926 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -194,6 +194,7 @@ L'état de l'art de la sécurité du ML est en constante évolution. S'il est im
Ressources1.7 : +- (Software & Tools) *[AI security risk assessment using Counterfit](https://www.microsoft.com/security/blog/2021/05/03/ai-security-risk-assessment-using-counterfit/)*, Microsoft, Mai 2021 : l'outil open source Counterfit permet de tester différentes attaques sur un modèle de ML pour identifier ses éventuelles vulnérabilités. [Lien](https://github.com/Azure/counterfit/) vers le dépôt GitHub - (Technical guide) *[Privacy Enhancing Technologies Decision Tree (v2)](http://www.private-ai.ca/PETs_Decision_Tree.svg)*, Private AI, 2020 - (Web article) *[The secret-sharer: evaluating and testing unintended memorization in neural networks](https://blog.acolyer.org/2019/09/23/the-secret-sharer/)*, A. Colyer, 2019 - (Academic paper) *[Membership Inference Attacks against Machine Learning Models](https://arxiv.org/abs/1610.05820)*, R. Shokri, M. Stronati, C. Song, V. Shmatikov, 2017 @@ -239,6 +240,7 @@ Selon les niveaux de risque et de sensibilité des projets, certaines approches
Ressources1.8 : +- (Software & Tools) *[AI security risk assessment using Counterfit](https://www.microsoft.com/security/blog/2021/05/03/ai-security-risk-assessment-using-counterfit/)*, Microsoft, Mai 2021 : l'outil open source Counterfit permet de tester différentes attaques sur un modèle de ML pour identifier ses éventuelles vulnérabilités. [Lien](https://github.com/Azure/counterfit/) vers le dépôt GitHub - (Technical guide) *[Privacy Enhancing Technologies Decision Tree (v2)](http://www.private-ai.ca/PETs_Decision_Tree.svg)*, Private AI, 2020 - (Web article) *[The secret-sharer: evaluating and testing unintended memorization in neural networks](https://blog.acolyer.org/2019/09/23/the-secret-sharer/)*, A. Colyer, 2019 - (Academic paper) *[Membership Inference Attacks against Machine Learning Models](https://arxiv.org/abs/1610.05820)*, R. Shokri, M. Stronati, C. Song, V. Shmatikov, 2017 From 290578983d389354d5f970a5548d33823846d7fb Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Fri, 18 Jun 2021 17:43:14 +0200 Subject: [PATCH 30/34] Fix a syntax typo --- assessment_framework_eng.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index 774af98..7ebb697 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -112,7 +112,7 @@ _(Select one answer only, which best corresponds to the level of maturity of the - [ ] 1.4.a Yes - [ ] 1.4.b No - [ ] 1.4.c We are currently preparing an upcoming audit or certification of our organisation's compliance with personal and confidential data requirements -- [ ] 1.4.d Not the the organization level, but it is the case for at least one project +- [ ] 1.4.d Not at the organization level, but it is the case for at least one project
Expl1.4 : From d4cb31691d69e4a6ca2c6655f073374094df23bb Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Fri, 18 Jun 2021 17:51:12 +0200 Subject: [PATCH 31/34] Add a key reference to 2.4 --- assessment_framework_eng.md | 1 + referentiel_evaluation.md | 1 + 2 files changed, 2 insertions(+) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index 7ebb697..ef5c6aa 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -418,6 +418,7 @@ These topics require a strong expertise and few practitioners are familiar with Resources2.4 : - (Academic paper) *[Moving beyond “algorithmic bias is a data problem”](https://www.cell.com/patterns/fulltext/S2666-3899(21)00061-1)*, Sara Hooker, Opinion, April 2021 +- (Academic paper) *[Algorithmic Factors Influencing Bias in Machine Learning](https://arxiv.org/abs/2104.14014)*, W. Blanzeisky, P. Cunningham, April 2021: The authors defines 4 types of algorithmic choices : Data description (for the first version on the model, and feature engineering), Irreductible Errors, Impact of regularization (present in DL or more classical ML), Impact of class & feature imbalance. Those 4 types of choices will generate what they call underestimation bias, opposed to negative latency, bias due to data (that can be due to an under-representative dataset, or other reasons). They also propose some mitigation process. - (Academic paper) *[Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings](https://arxiv.org/abs/1607.06520)*, T. Bolukbasi, K.-W. Chang, J. Zou, V. Saligrama, A. Kalai, 2016
diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index bbeeda6..571c6db 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -418,6 +418,7 @@ Ces sujets étant très techniques, encore peu diffusés et connus des praticien Ressources2.4 : - (Academic paper) *[Moving beyond “algorithmic bias is a data problem”](https://www.cell.com/patterns/fulltext/S2666-3899(21)00061-1)*, Sara Hooker, Opinion, Avril 2021 +- (Academic paper) *[Algorithmic Factors Influencing Bias in Machine Learning](https://arxiv.org/abs/2104.14014)*, W. Blanzeisky, P. Cunningham, April 2021: les auteurs définissent 4 types de choix algorithmiques pouvant être à l'origine de biais : *Data description (for the first version on the model, and feature engineering), Irreductible Errors, Impact of regularization (present in DL or more classical ML), Impact of class & feature imbalance*. Ces 4 types de choix peuvent générer ce qu'ils appellent un biais de sous-estimation (*underestimation bias*), qu'ils opposent à la *negative latency*, biais dûs aux données. Ils proposent des mesures de mitigation. - (Academic paper) *[Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings](https://arxiv.org/abs/1607.06520)*, T. Bolukbasi, K.-W. Chang, J. Zou, V. Saligrama, A. Kalai, 2016
From a64fc8671ed266507ffdad088fe7ffca2ec0b861 Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Thu, 24 Jun 2021 13:30:33 +0200 Subject: [PATCH 32/34] Fix formatting typos and a missing resource in the EN version --- assessment_framework_eng.md | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index ef5c6aa..fc1e25a 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -506,7 +506,6 @@ The use of predictive models that have been validated and tested on historical d --- - Q3.4 : **Performance validation** Does your organisation implement the following approaches: @@ -569,12 +568,10 @@ Monitoring the performance of models over time is also particularly important in - (Technical guide) *[Continuous delivery for machine learning](https://martinfowler.com/articles/cd4ml.html)*, D. Sato, A. Wider, C. Windheuser, September 2019 - (Technical guide) *[Monitoring Machine Learning Models in Production - A comprehensive guide](https://christophergs.com/machine%20learning/2020/03/14/how-to-monitor-machine-learning-models/)*, Christopher Samiullah, March 2020 - (Web article) *[Google's medical AI was super accurate in a lab. Real life was a different story](https://www.technologyreview.com/2020/04/27/1000658/google-medical-ai-accurate-lab-real-life-clinic-covid-diabetes-retina-disease/)*, MIT Technology Review +- (Web article) (In French) *[En route vers le cycle de vie des modèles !](https://www.quantmetry.com/blog/premier-etape-cycle-vie-modeles/)*, G. Martinon, Janvier 2020
---- - - --- Q3.6 : **Decision making and ranges of indecision** From 589d504e852dff43fb138d6e494ab948213886ff Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Thu, 1 Jul 2021 11:13:49 +0200 Subject: [PATCH 33/34] Fix typos and formatting glitches --- assessment_framework_eng.md | 6 +++--- references.md | 2 +- referentiel_evaluation.md | 4 ++-- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index fc1e25a..32c3ace 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -232,7 +232,7 @@ _(Select all the answer items that correspond to practices in your organisation)
Expl1.8 : -TThe state of the art in ML security is constantly evolving. If data scientists are now familiar in general with the membership inference attack (see proposed resources), new ones are being published regularly. While it is impossible to guard against all vulnerabilities at all times, it is crucial to be aware of them and to keep a watch on them. The article [Demystifying the Membership Inference Attack](https://medium.com/disaitek/demystifying-the-membership-inference-attack-e33e510a0c39) is for example an interesting entry point in the context of sensitive data. +The state of the art in ML security is constantly evolving. If data scientists are now familiar in general with the membership inference attack (see proposed resources), new ones are being published regularly. While it is impossible to guard against all vulnerabilities at all times, it is crucial to be aware of them and to keep a watch on them. The article [Demystifying the Membership Inference Attack](https://medium.com/disaitek/demystifying-the-membership-inference-attack-e33e510a0c39) is for example an interesting entry point in the context of sensitive data. Depending on the level of risk and sensitivity of the projects, certain technical approaches to guard against them will be selected and implemented. It is important to follow the evolution of research and state-of-the-art practices, and to document the choices made. The notion of "end-to-end genealogy" is introduced here. @@ -317,7 +317,7 @@ It is a question of ensuring that oneself considers these subjects and therefore - (Web article) *[Hidden Bias](https://pair.withgoogle.com/explorables/hidden-bias/)* explorable from [PAIR](https://pair.withgoogle.com/) - (Technical guide) *[Tour of Data Sampling Methods for Imbalanced Classification](https://machinelearningmastery.com/data-sampling-methods-for-imbalanced-classification/)* -- (Software & Tools) *[Pandas Profiling](https://github.com/pandas-profiling/pandas-profiling)*: *Create HTML profiling reports from pandas `DataFrame` objects. The pandas `df.describe()` function is great but a little basic for serious exploratory data analysis. `pandas_profiling` extends the pandas `DataFrame` with `df.profile_report()` for quick data analysis +- (Software & Tools) *[Pandas Profiling](https://github.com/pandas-profiling/pandas-profiling)*: Create HTML profiling reports from pandas `DataFrame` objects. The pandas `df.describe()` function is great but a little basic for serious exploratory data analysis. `pandas_profiling` extends the pandas `DataFrame` with `df.profile_report()` for quick data analysis
@@ -933,7 +933,7 @@ Technical resources such as SHAP or LIME provide a first-hand introduction to th - (Technical guide) *[Interpretable Machine Learning, A Guide for Making Black Box Models Explainable](https://christophm.github.io/interpretable-ml-book/)*, Christoph Molnar - (Web article) *[Understanding model predictions with LIME](https://towardsdatascience.com/understanding-model-predictions-with-lime-a582fdff3a3b)*, blog L. Hulstaert, 2018 - (Software & Tools) *[SHAP](https://github.com/slundberg/shap): A game theoretic approach to explain the output of any machine learning model*. -- (Software & Tools) *[Shapash](https://github.com/MAIF/shapash): a MAIF Datalab project which aims to make machine learning interpretable and understandable by everyone. It provides several types of visualization that display explicit labels that everyone can understand +- (Software & Tools) *[Shapash](https://github.com/MAIF/shapash)*: a MAIF Datalab project which aims to make machine learning interpretable and understandable by everyone. It provides several types of visualization that display explicit labels that everyone can understand - (Software & Tools) *[FACET](https://github.com/BCG-Gamma/facet)*: a BCG Gamma project of an open source library for human-explainable AI. It combines sophisticated model inspection and model-based simulation to enable better explanations of supervised machine learning models - (Web article) In some cases, regulations impose being able to explain how an automated system came to a certain outcome (see for example [article 22 of the GDPR in the European Union](https://www.cnil.fr/fr/reglement-europeen-protection-donnees/chapitre3#Article22), [article 10 of the "Informatique & Libertés" law in France](https://www.legifrance.gouv.fr/affichTexteArticle.do;?idArticle=LEGIARTI000037090394&cidTexte=LEGITEXT000006068624&dateTexte=20180624), cited in particular in the [Hippocratic Oath for data scientist](https://hippocrate.tech/). diff --git a/references.md b/references.md index 422841a..4f4baa2 100644 --- a/references.md +++ b/references.md @@ -39,7 +39,7 @@ - Cycle de vie complet : - - **[En route vers le cycle de vie des modèles !](https://www.quantmetry.com/blog/premier-etape-cycle-vie-modeles/)*, G. Martinon, Janvier 2020 + - *[En route vers le cycle de vie des modèles !](https://www.quantmetry.com/blog/premier-etape-cycle-vie-modeles/)*, G. Martinon, Janvier 2020 - "Performance is not outcome", erreurs, crises : diff --git a/referentiel_evaluation.md b/referentiel_evaluation.md index 571c6db..421f649 100644 --- a/referentiel_evaluation.md +++ b/referentiel_evaluation.md @@ -317,7 +317,7 @@ Il s'agit de s'obliger à s'interroger sur ces sujets et donc à réfléchir aux - (Web article) *[Hidden Bias](https://pair.withgoogle.com/explorables/hidden-bias/)* explorable from [PAIR](https://pair.withgoogle.com/) - (Technical guide) *[Tour of Data Sampling Methods for Imbalanced Classification](https://machinelearningmastery.com/data-sampling-methods-for-imbalanced-classification/)* -- (Software & Tools) *[Pandas Profiling](https://github.com/pandas-profiling/pandas-profiling)*: *Create HTML profiling reports from pandas `DataFrame` objects. The pandas `df.describe()` function is great but a little basic for serious exploratory data analysis. `pandas_profiling` extends the pandas `DataFrame` with `df.profile_report()` for quick data analysis +- (Software & Tools) *[Pandas Profiling](https://github.com/pandas-profiling/pandas-profiling): Create HTML profiling reports from pandas `DataFrame` objects. The pandas `df.describe()` function is great but a little basic for serious exploratory data analysis. `pandas_profiling` extends the pandas `DataFrame` with `df.profile_report()` for quick data analysis*
@@ -933,7 +933,7 @@ Des ressources techniques comme SHAP ou LIME permettent d'entrer de plain-pied d - (Technical guide) *[Interpretable Machine Learning, A Guide for Making Black Box Models Explainable](https://christophm.github.io/interpretable-ml-book/)*, Christoph Molnar - (Web article) *[Understanding model predictions with LIME](https://towardsdatascience.com/understanding-model-predictions-with-lime-a582fdff3a3b)*, blog L. Hulstaert, 2018 - (Software & Tools) *[SHAP](https://github.com/slundberg/shap): A game theoretic approach to explain the output of any machine learning model* -- (Software & Tools) *[Shapash](https://github.com/MAIF/shapash): un projet open source de MAIF Datalab facilitant la prise en main et permettant de visualiser les analyses d'explicabilité et d'interprétabilité des modèles +- (Software & Tools) *[Shapash](https://github.com/MAIF/shapash)*: un projet open source de MAIF Datalab facilitant la prise en main et permettant de visualiser les analyses d'explicabilité et d'interprétabilité des modèles - (Software & Tools) *[FACET](https://github.com/BCG-Gamma/facet)*: un projet open source du BCG Gamma, *FACET is an open source library for human-explainable AI. It combines sophisticated model inspection and model-based simulation to enable better explanations of supervised machine learning models* - (Web article) Dans certains cas la réglementation impose de pouvoir expliquer aux personnes concernées comment fonctionne un algorithme (voir par exemple [l'article 22 du RGPD](https://www.cnil.fr/fr/reglement-europeen-protection-donnees/chapitre3#Article22), [l'article 10 de la loi Informatique et libertés](https://www.legifrance.gouv.fr/affichTexteArticle.do;?idArticle=LEGIARTI000037090394&cidTexte=LEGITEXT000006068624&dateTexte=20180624), cités notamment dans le [Serment d'Hippocrate pour data scientist](https://hippocrate.tech/)) From d07fb0c127a6033d7155810faa1b441f957022b2 Mon Sep 17 00:00:00 2001 From: Eric Boniface Date: Thu, 1 Jul 2021 11:34:08 +0200 Subject: [PATCH 34/34] Fix a typo in the EN version --- assessment_framework_eng.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/assessment_framework_eng.md b/assessment_framework_eng.md index 32c3ace..db3b0a9 100644 --- a/assessment_framework_eng.md +++ b/assessment_framework_eng.md @@ -403,7 +403,7 @@ _(Select one answer only, which best corresponds to the level of maturity of the
Expl2.4 : -If datasets used to train and evaluate a model require a particular attention to prevent discriminatory biases, recent work show that modeling choices have to be taken into account too. The article *"Moving beyond “algorithmic bias is a data problem”"* suggested in resources synthesizes very well how the learning algorithm, the model structure, adding or not differential privacy, compression, etc. can have consequences on the fairness of a model. Extracts: +If datasets used to train and evaluate a model require a particular attention to prevent discriminatory biases, recent work shows that modeling choices have to be taken into account too. The article *"Moving beyond “algorithmic bias is a data problem”"* suggested in resources synthesizes very well how the learning algorithm, the model structure, adding or not differential privacy, compression, etc. can have consequences on the fairness of a model. Extracts: > - *A key reason why model design choices amplify algorithmic bias is because notions of fairness often coincide with how underrepresented protected features are treated by the model* > - [...] *design choices to optimize for either privacy guarantees or compression amplify the disparate impact between minority and majority data subgroups*