From 9dc69c417306ea0cea14b247535496e2b3775799 Mon Sep 17 00:00:00 2001 From: Lodewijk <44677545+lodewijk81@users.noreply.github.com> Date: Fri, 8 Dec 2023 15:52:16 +0100 Subject: [PATCH] Added first draft of Docs page and Ethics Policy --- README.md | 7 +- docs/ethics/policy.md | 144 ++++++++++++++++++++ docs/ethics/workflow-review.md | 4 + docs/experiments/htr-viewer.md | 16 --- docs/experiments/places-visualization.md | 7 - docs/experiments/skosmos-concept-browser.md | 9 -- docs/experiments/word-embeddings.md | 0 docs/index.md | 15 +- mkdocs.yml | 20 ++- 9 files changed, 164 insertions(+), 58 deletions(-) create mode 100644 docs/ethics/policy.md create mode 100644 docs/ethics/workflow-review.md delete mode 100644 docs/experiments/htr-viewer.md delete mode 100644 docs/experiments/places-visualization.md delete mode 100644 docs/experiments/skosmos-concept-browser.md delete mode 100644 docs/experiments/word-embeddings.md diff --git a/README.md b/README.md index 4badef4c..83b12538 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,9 @@ # lab.globalise.huygens.knaw.nl -

-GLOBALISE -

-Static website for our GLOBALISE Lab at [http://lab.globalise.huygens.knaw.nl/](http://lab.globalise.huygens.knaw.nl/). +Static website for GLOBALISE Docs at [https://docs.globalise.huygens.knaw.nl/](https://docs.globalise.huygens.knaw.nl/). ## Development -These static pages are generated with [Material for MkDocs](https://squidfunk.github.io/mkdocs-material/) using a GitHub Action on every push (see the [`gh-pages`](https://github.com/globalise-huygens/lab.globalise.huygens.knaw.nl/tree/gh-pages) branch). For local development, follow the instructions below. +These static pages are generated with [Material for MkDocs](https://squidfunk.github.io/mkdocs-material/) using a GitHub Action on every push (see the [`gh-pages`](https://github.com/globalise-huygens/docs.globalise.huygens.knaw.nl/tree/gh-pages) branch). For local development, follow the instructions below. ### Local development diff --git a/docs/ethics/policy.md b/docs/ethics/policy.md new file mode 100644 index 00000000..3991cbac --- /dev/null +++ b/docs/ethics/policy.md @@ -0,0 +1,144 @@ +# GLOBALISE Ethics Policy + +**Date:** December 18, 2023 +**Version:** 1.0 + +[TOC] + + + +## I. Introduction + +This policy document governs and guides GLOBALISE’s work and ethos. We created it because we believe that articulating our values and obligations to one another reinforces the respect and care among the team and in our work. Having a policy also provides us with clear avenues to correct our culture should it ever stray from course. + +We openly share this policy document to contribute to the ongoing conversation about inclusion in academia and the ethics of artificial intelligence. We encourage others to adapt and utilise it. + + +This is a living document which we will revisit and revise. We encourage anyone with inquiries or a desire to discuss it to [reach out to us](https://globalise.huygens.knaw.nl/contact-us/). Your input and engagement are welcomed and valued. + + +## II. Mission Statement + +GLOBALISE is an infrastructure project dedicated to enhancing the way we access and understand the archives of the Dutch East India Company (VOC). This infrastructural endeavour aims to unveil interactions between European stakeholders and, more crucially, non-European entities, operating within and around the VOC’s ‘empire’. By doing so, GLOBALISE will shed new light on the mechanisms of early globalisation, colonialism, and their formative and enduring impact on regions stretching from Europe, especially the Netherlands, to the vast expanses of the Indian Ocean and Indonesian Archipelago. + +Our mission can be encapsulated in the following core commitments: + + + + +1. **Enhancing Accessibility** At its core, GLOBALISE is committed to making a substantial segment of the VOC archives not just accessible, but contextually relevant to audiences worldwide. Through this project, history enthusiasts, researchers, and the global community will gain an unprecedented level of insight into the VOC's expansive influence and interactions. This objective will be achieved by transforming the archives into digitised and searchable text using handwritten text recognition. Additionally, GLOBALISE will employ historic and semantic contextualisation to enhance research possibilities and allow for richer representations of history. And finally it will develop versatile interfaces, designed to cater to a diverse range of users. + +2. **Linked Open Data** We will model, publish, and host our data as Linked Open Data (LOD) to promote a holistic comprehension of the materials, link them to external datasets and thesauri, and allow for easy re-use of our data. Our approach connects well with other LOD initiatives in the GLAM sector. + +3. **Tools** We will provide a suite of open source tools designed for the creation, querying, filtering, visualisation, sharing, annotation, and refinement of project data. + +4. **Addressing Power Imbalances and Biases** We acknowledge power imbalances and historical injustices recorded in and accompanying the creation of the VOC archives and actively work towards amplifying marginalised perspectives, supplementing the VOC archives with non-European perspectives to challenge dominant narratives and foster a more comprehensive understanding of (colonial) history. + +5. **Transparency** We believe in being transparent about the origins and frameworks behind data. This means being clear about how data is constructed, understanding its context, and recognizing its limitations. We also invite public participation, embracing approaches like citizen science and feedback. + +6. **Free and Open Access** We champion free and unrestricted access. Adhering to the [FAIR](http://go-fair.org) principles, all our resources, software, and data are licensed under [open and permissive licence](https://opensource.org/)s. + +7. **Diversity, Inclusion, Equity, and Decolonisation** This commitment extends across all aspects of the project, from the selection of datasets to resource allocation, community engagement, and team composition. + + +## III. Ethics Guidelines + +In the GLOBALISE initiative, ethical adherence is a cornerstone throughout the project lifecycle, encompassing work packages, future plans, and governance. Should any aspect fall short of these ethical standards during periodic evaluations, it is imperative to restructure it to conform to the following core principles. + + + + +1. **Diversity, Equity, and Inclusion (DEI)** GLOBALISE is dedicated to advancing DEI in every part of the project: people, governance, perspectives, datasets, algorithms, and interfaces. + + **Diversity** encompasses a wide range of differences and variations within any given environment or system. Diversity may include variations in not only individual characteristics like ethnicity, age, gender identity, religion, physical abilities and disabilities, cultural background, and education but also extends to encompass differences in ideas, perspectives, datasets, algorithms, infrastructural elements, and any other factors that contribute to the overall complexity and richness of the system in question. Embracing diversity means recognizing, appreciating, and harnessing the breadth and depth of distinctions. + + **Equity** strives to rectify disparities and create a level playing field for all elements within a system. + + + **Inclusion** refers to the behaviours, attitudes, and social norms within our project that ensures that there is space for multiple identities, groups and expressions. + +The way in which we advance DEI is visible in the following points: + + **Countering Bias** We pay attention to situations involving vulnerable groups and those that have been historically disadvantaged or at a risk of exclusion and to situations characterised by asymmetries of power. + + **Education and Training** We provide regular training and workshops to increase awareness and sensitivity about diversity, equity, and inclusion both in our project and in our work. + + **Accessibility and Design** Our infrastructure design approach acknowledges and addresses advantages and challenges faced by different social groups. + + **Stakeholder Participation** GLOBALISE finds it critical to work with stakeholders who may directly or indirectly be affected by the infrastructure. We do so by having a large steering team and external advisors, as well as organising regular workshops to share our work, collaborate with and get feedback from stakeholders. Furthermore, we are working on setting up longer term mechanisms for stakeholder participation. + + **Documentation** Finally we include all our interventions and strategies to promote DEI in extensive documentation and reports. + +2. **Transparency** means open disclosure about our project’s data sources, algorithms, decisions, and governance structures. This entails: + + + **Documentation** of different parts of the GLOBALISE infrastructure to aid transparency and explainability. This includes data cards/sheets for datasets, model cards for NLP models, thesaurus for terminology, reports on stakeholder participation, etc. + +**Communication** of the characteristics, limitations, and potential shortcomings of the system to users and stakeholders, through interface design and user guides. + + +3. **Accountability** encapsulates the project’s ownership of its decisions and outcomes, adherence to laws and policies, and its obligation to address consequences. This includes: + + + **Auditability** It is important to establish mechanisms that facilitate the infrastructure’s auditability. This will include providing extensive provenance on the data produced and provided by GLOBALISE and any other authoritative layers that we add, creating meticulous data sheets for every dataset produced and publishing datasets and research in peer-reviewed journals. + +**Training and Education** to help develop accountability practices. + +**Redress Mechanisms** Establishing systems to inform and provide recourse to users and third parties. + +4. **Societal and Environmental Wellbeing** GLOBALISE should benefit society and ensure that it is sustainable and minimises environmental impact. + +**Acknowledging Impact** involves recognizing how the project can affect various communities. This entails understanding the social and cultural dimensions of the content within the archive. + + + **Dealing with Offensive Language** Archives often contain historical materials that use language or express views that are now considered offensive or inappropriate. It’s important to address this issue sensitively. + + + + **Enhancing Cultural Safety** Ensuring that communities interacting with sensitive collections feel respected and safe is essential. This involves being mindful of how language and content are presented. + + + **Historical Contextualization** Recognizing and contextualising records of contentious or violent past events with sensitivity toward affected groups. + +**Reducing Life Cycle Impact** It’s important to consider the environmental footprint of the GLOBALISE project throughout its entire lifecycle, from development to implementation and beyond. + + + +The last 2 apply to AI systems: + + + + +5. **Robustness** Technical robustness focuses on the stability and reliability of the AI systems in the GLOBALISE infrastructure. Additionally, they should be socially robust, implying they should consider potential unintended consequences and harms that may arise from their use. This includes addressing questions such as: + +**Accuracy** Ensuring system reliability in unforeseen circumstances and minimising potential harms from inaccuracies. + +6. **Privacy and Data Governance** We prioritise privacy and data protection, ensuring the quality and integrity of data, controlling data access, etc. + + **Oversight mechanisms** for data collection, storage, processing and use. GLOBALISE will store their data with institutes such as the IISG which have acquired the [Core Trust Seal](https://www.coretrustseal.org/), making them a reliable and sustainable repository for digital materials. + + +**Privacy** Assessing who can access users’ data, and under what circumstances. + + +## IV. References + + +Chilcott, Alicia. "Towards protocols for describing racially offensive language in UK public archives." In _Archives in a Changing Climate-Part I & Part II_, pp. 151-168. Cham: Springer Nature Switzerland, 2022. + +Colored Conventions Project, [https://coloredconventions.org](https://coloredconventions.org) + +D’Ignazio, Catherine, and Lauren F. Klein. Data Feminism. Cambridge, MA: The MIT Press, 2020. + +EU Commission. “[Ethics guidelines for trustworthy AI](https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai).” 2019. + +Indigenous Archives Collective Position Statement on the right of Reply to Indigenous Knowledges and Information Held in Archvies, [https://indigenousarchives.net/indigenous-archives-collective-position-statement-on-the-right-of-reply-to-indigenous-knowledges-and-information-held-in-archives/](https://indigenousarchives.net/indigenous-archives-collective-position-statement-on-the-right-of-reply-to-indigenous-knowledges-and-information-held-in-archives/) + +[LINCS](https://lincsproject.ca/docs/about-lincs/policies), [https://lincsproject.ca/](https://lincsproject.ca/) + +[VOX Media](https://code-of-conduct.voxmedia.com/) + + +## V. Licence + +This document has been made available under a [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) licence. We extend an invitation to customise and utilise this material for your own project. We recommend modifying it to align with your project’s mission and identity, and we appreciate credit for building on our work. \ No newline at end of file diff --git a/docs/ethics/workflow-review.md b/docs/ethics/workflow-review.md new file mode 100644 index 00000000..5d6e42c1 --- /dev/null +++ b/docs/ethics/workflow-review.md @@ -0,0 +1,4 @@ +# GLOBALISE Ethical Review of Work Processes + +TBA + diff --git a/docs/experiments/htr-viewer.md b/docs/experiments/htr-viewer.md deleted file mode 100644 index 86025315..00000000 --- a/docs/experiments/htr-viewer.md +++ /dev/null @@ -1,16 +0,0 @@ -# GLOBALISE Transcriptions Viewer - -**Date:** October 4, 2023 -**URL:** https://transcriptions.globalise.huygens.knaw.nl/ -**Status:** Prototype - -The aim of the GLOBALISE project is to facilitate research with the _Overgekomen Brieven en Papieren_ series of documents from the VOC archives. As a first step to reaching this goal, we generate transcriptions of the c. 5 million of handwritten pages made available by the Dutch National Archives using [automatic transcription software](https://github.com/knaw-huc/loghi). - -While we publish text files of the transcriptions on the [GLOBALISE Dataverse](https://datasets.iisg.amsterdam/dataverse/globalise), we also experiment with building an interface for easy searching and exploring the material. A first prototype can be accessed through the link below. Please share your feedback through our [contact form](https://globalise.huygens.knaw.nl/contact-us/). In the future, improved versions will be made available. - -

GLOBALISE Transcriptions Viewer
-https://transcriptions.globalise.huygens.knaw.nl/

- -The collection of archival documents made available in the viewer comprises inventory numbers 1053-4454 and 7527-11024 from the VOC archives, National Archives, The Hague. The scans of the original documents (n=4,802,212) from the period 1610-1796 are available on the [website of the National Archives](https://www.nationaalarchief.nl/onderzoeken/archief/1.04.02/). - -Please note that the transcriptions will contain errors. They have not been manually checked for accuracy or completeness. Some labels, characterizations and information about persons, actions and events may be offensive and troubling to individuals and communities. Be careful when relying on these transcriptions and be aware of their limitations. \ No newline at end of file diff --git a/docs/experiments/places-visualization.md b/docs/experiments/places-visualization.md deleted file mode 100644 index cd3d17d8..00000000 --- a/docs/experiments/places-visualization.md +++ /dev/null @@ -1,7 +0,0 @@ -# GLOBALISE Places Visualization - -Initially as an intern at the GLOBALISE project and now as a student assistant, Ruben Land is working on a dataset of places that occur in the _Overgekomen Brieven en Papieren_ series of VOC documents. He uses R Shiny to create interactive visualizations of his work. These can be accessed by clicking the image below. - -

GLOBALISE Transcriptions Viewer
-https://globalise.shinyapps.io/mapping_places/

- diff --git a/docs/experiments/skosmos-concept-browser.md b/docs/experiments/skosmos-concept-browser.md deleted file mode 100644 index 5368c544..00000000 --- a/docs/experiments/skosmos-concept-browser.md +++ /dev/null @@ -1,9 +0,0 @@ -# Thesaurus concepts browser - -We're working on developing a GLOBALISE thesaurus with definitions of concepts that occur in the _Overgekomen Brieven en Papieren_ series of VOC documents. A preliminary version of the thesaurus can be explored in our [SKOSMOS environment](https://vocabulary.globalise.dev.diginfra.net/).[^1] - -Please note that the thesaurus is constantly being improved and extended, and the the URIs in the current version are not stable. - -

GLOBALISE Transcriptions Viewer

- -[^1]: This demo is running the [SKOSMOS software](https://skosmos.org/), developed by the National Library of Finland, to provide a user-friendly interface to our thesaurus. The SKOSMOS software is open source and available on [GitHub](https://github.com/NatLibFi/Skosmos). diff --git a/docs/experiments/word-embeddings.md b/docs/experiments/word-embeddings.md deleted file mode 100644 index e69de29b..00000000 diff --git a/docs/index.md b/docs/index.md index 8f8e976a..6b14670e 100644 --- a/docs/index.md +++ b/docs/index.md @@ -8,25 +8,20 @@ hide:

GLOBALISE Logo
-GLOBALISE Lab logo

-The aim of the [GLOBALISE project](https://globalise.huygens.knaw.nl/) is to develop an online infrastructure that unlocks the key series of VOC documents and reports for advanced new research methods. On this Lab page, we share our ongoing and completed experiments and prototypes. We welcome your feedback through our [contact form](https://globalise.huygens.knaw.nl/contact-us/). +The aim of the [GLOBALISE project](https://globalise.huygens.knaw.nl/) is to develop an online infrastructure that unlocks the key series of VOC documents and reports for advanced new research methods. On this Docs page, we provide background documentation about the project. Currently limited to information about the ethics policy, we will soon extend the information provided here with documentation about the GLOBALISE ontology, source corpus, and guiding principles for the design and development of the online interfaces, among others. We welcome your feedback through our [contact form](https://globalise.huygens.knaw.nl/contact-us/). GLOBALISE is funded by the [The Netherlands Organization for Scientific Research (NWO)](https://www.nwo.nl/en). -## Experiments +## Table of Contents -!!! warning "This is a work in progress" - This website is still under construction. Please check back later for more information. +### GLOBALISE Ethics Guidelins -### Ongoing - - -- [Viewer for transcriptions](experiments/htr-viewer.md) of the c. 5 million pages of VOC documents that comprise the GLOBALISE corpus. -- [Visualization of places](experiments/places-visualization.md) occurring in the c. 5 million pages of VOC documents that comprise the GLOBALISE corpus. +- [Ethics Policy](ethics/policy.md.md) +- [Ethical Review of Work Processes](ethics/workflow-review.md). diff --git a/mkdocs.yml b/mkdocs.yml index 6368f386..3cb659a0 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -1,13 +1,13 @@ -site_name: GLOBALISE Lab -repo_name: "globalise-huygens/lab.globalise.huygens.knaw.nl/" -repo_url: "https://github.com/globalise-huygens/lab.globalise.huygens.knaw.nl/" +site_name: GLOBALISE Docs +repo_name: "globalise-huygens/docs.globalise.huygens.knaw.nl/" +repo_url: "https://github.com/globalise-huygens/docs.globalise.huygens.knaw.nl/" theme: name: material # logo: static/img/logo/globalise_g.svg favicon: static/img/logo/globalise_g.svg - icon: - logo: material/flask + #icon: + # logo: material/flask features: # - navigation.instant - navigation.tracking @@ -18,11 +18,9 @@ theme: nav: - Home: index.md #- About: about.md - - "Ongoing experiments": - - "Transcriptions viewer": experiments/htr-viewer.md - - "Places visualization": experiments/places-visualization.md - # - "Completed experiments": - # - "Word Embeddings": experiments/word-embeddings.md + - "GLOBALISE Ethics Guidelines": + - "Ethics Policy": ethics/policy.md + - "Ethical Review of Work Processes": ethics/workflow-review.md markdown_extensions: - toc: permalink: True @@ -47,7 +45,7 @@ extra: copyright:

Creative Commons License
- GLOBALISE Lab is the experimental playground of the GLOBALISE Docs provides background documentation to the GLOBALISE project.