Merge pull request #43 from yarikoptic/enh-codespell

codespell: config, workflow + typos fixed
psychoinformatics-de · Dec 6, 2024 · 4447bb7 · 4447bb7
2 parents 71d83d8 + 06300ed
commit 4447bb7
Showing 10 changed files with 41 additions and 14 deletions.
diff --git a/.codespellrc b/.codespellrc
@@ -0,0 +1,5 @@
+[codespell]
+skip = .git,*.pdf,*.svg,*.min.js,*.map,*.scss,*.css
+ignore-regex = (Nat\.? Commun.?|highlighter: rouge)
+#
+# ignore-words-list =
diff --git a/.github/workflows/codespell.yml b/.github/workflows/codespell.yml
@@ -0,0 +1,22 @@
+---
+name: Codespell
+
+on:
+  push:
+    branches: [gh-pages]
+  pull_request:
+    branches: [gh-pages]
+
+permissions:
+  contents: read
+
+jobs:
+  codespell:
+    name: Check for spelling errors
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v3
+      - name: Codespell
+        uses: codespell-project/actions-codespell@v2
diff --git a/_episodes/03-remote-collaboration.md b/_episodes/03-remote-collaboration.md
@@ -63,7 +63,7 @@ locations. Although each scenario will be slightly different, the
 setup steps that we will cover with GIN will look similar on other
 git-based hosting solutions.
 
-## Prelude: file availability, getting and droping content
+## Prelude: file availability, getting and dropping content
 
 Before we proceed to data publishing let's first take a look at the
 dataset we created during the first module. We used two ways of adding
@@ -122,7 +122,7 @@ datalad drop inputs/images/chinstrap_01.jpg
 drop(error): /home/alice/Documents/rdm-workshop/example-dataset/inputs/images/chinstrap_01.jpg (file)
 [unsafe; Could only verify the existence of 0 out of 1 necessary copy; (Use --reckless availability to override this check, or adjust numcopies.)]
 
-# If you were to run this with DataLad version < 0.16.0, the safety check would be overriden with --nocheck instead of --reckless availablity)
+# If you were to run this with DataLad version < 0.16.0, the safety check would be overridden with --nocheck instead of --reckless availability)
 ```
 {: .output}
 
@@ -716,7 +716,7 @@ With a cloned dataset, you can do the following:
 
 - Change a (text) file. For example, in the
   `inputs/images/chinstrap_02.yaml` file we entered `penguin_count:
-  2`, but if you look closely at the related fotograph, there are
+  2`, but if you look closely at the related photograph, there are
   actually three penguins (one is partially behind a rock). Edit the
   file and save the changes with an informative commit message, such
   as "Include penguins occluded by rocks in the count" or something
@@ -738,7 +738,7 @@ With a cloned dataset, you can do the following:
   location on the website.
 
 #### Contributing back
-When ready, you can contribute back wih `datalad push`. If the other
+When ready, you can contribute back with `datalad push`. If the other
 person has granted you access to their repository (as should be the
 case during the workshop), you can do it right away. Note that in this
 case you are pushing to `origin` - this is a default name given to a

diff --git a/_episodes/04-dataset-management.md b/_episodes/04-dataset-management.md
@@ -363,7 +363,7 @@ This is important: a *superdataset* does not record individual changes within th
 In other words, it points to the subdataset location and a point in its life (indicated by a specific commit).
 
 Let's acknowledge that we want our superdataset to point to the updated version of the subdataset (ie. that which has all three tabular files) by saving this change in the superdataset's history.
-In other words, while the subdataset progressed by three comits, in the parent dataset we can record it as a single change (from empty to populated subdataset):
+In other words, while the subdataset progressed by three commits, in the parent dataset we can record it as a single change (from empty to populated subdataset):
 
 ~~~
 datalad save -d . -m "Progress the subdataset version"
@@ -566,6 +566,6 @@ save(ok): . (dataset)
 ~~~
 {: .output}
 
-The end! We have produced a nested datset:
+The end! We have produced a nested dataset:
 - the superdataset (penguin-report) directly contains our code, figures, and report (tracking their history), and includes inputs as a subdatset.
 - the subdataset (inputs) tracks the history of the raw data files.
diff --git a/_episodes/91-branching.md b/_episodes/91-branching.md
@@ -263,7 +263,7 @@ $ git merge preproc
 
 #### And... what now?
 
-Branching opens up the possibility to keep parallel developments neat and orderly next to eachother, hidden away in branches. A `checkout` of your favourite branch lets you travel to its timeline and view all of the changes it contains, and a `merge` combines one or more timelines into another one.
+Branching opens up the possibility to keep parallel developments neat and orderly next to each other, hidden away in branches. A `checkout` of your favourite branch lets you travel to its timeline and view all of the changes it contains, and a `merge` combines one or more timelines into another one.
 
 > ## Exercise
 >
@@ -383,7 +383,7 @@ $ datalad save -m "Fix: Change absolute to relative paths</code></td>
 
 In order to propose the fix to the central dataset as an addition, the collaborator pushes their branch to the central sibling.
 When the central sibling is on GitHub or a similar hosting service, the hosting service assists with merging `fix-paths` to `main` with a **pull request** - a browser-based description and overview of the changes a branch carries.
-Collaborators can conviently take a look and decide whether they accept the pull request and thereby merge the `fix-paths` into `upstream`'s `main`.
+Collaborators can conveniently take a look and decide whether they accept the pull request and thereby merge the `fix-paths` into `upstream`'s `main`.
 You can see how opening and merging PRs look like in GitHub's interface in the expandable box below.
 
 > ## Creating a PR on GitHub

diff --git a/_episodes/92-filesystem-operations.md b/_episodes/92-filesystem-operations.md
@@ -231,7 +231,7 @@ datalad remove -d local-dataset
 uninstall(error): . (dataset) [to-be-dropped dataset has revisions that are not available at any known sibling. Use `datalad push --to ...` to push these before dropping the local dataset, or ignore via `--reckless availability`. Unique revisions: ['main']]
 ~~~
 
-``remove`` advises to either ``push`` the "unique revisions" to a different place (i.e., creating a sibling to host your pristine, version-controlled changes), or, similarily to how it was done for ``drop``, to disable the availability check with ``--reckless availability``.
+``remove`` advises to either ``push`` the "unique revisions" to a different place (i.e., creating a sibling to host your pristine, version-controlled changes), or, similarly to how it was done for ``drop``, to disable the availability check with ``--reckless availability``.
 
 ~~~
 datalad remove -d local-dataset --reckless availability

diff --git a/_extras/for_instructors.md b/_extras/for_instructors.md
@@ -154,7 +154,7 @@ This can be done with [Let's Encrypt](https://letsencrypt.org/) by following ins
    ~~~
    sudo tljh-config set https.enabled true
    ~~~
-2. Set your email addres for Let's Encrypt:
+2. Set your email address for Let's Encrypt:
    ~~~
    sudo tljh-config set https.letsencrypt.email <you@example.com>
    ~~~
@@ -326,7 +326,7 @@ Different authentication options are possible (e.g. admin can also authenticate
 2022/12/01 12:40:30Z: OsProductName: Ubuntu
 2022/12/01 12:40:30Z: OsVersion: 22.04
 ~~~
-- If the JupyterHub bootsrap script succeeded, within the last 30 lines you will find:
+- If the JupyterHub bootstrap script succeeded, within the last 30 lines you will find:
 ~~~
 [  210.143720] cloud-init[1233]: Waiting for JupyterHub to come up (1/20 tries)
 [  210.147437] cloud-init[1233]: Done!

diff --git a/bin/dependencies.R b/bin/dependencies.R
@@ -79,7 +79,7 @@ create_description <- function(required_pkgs) {
   # We have to write the description twice to get the hidden dependencies
   # because renv only considers explicit dependencies.
   #
-  # This is needed because some of the hidden dependencis will require system
+  # This is needed because some of the hidden dependencies will require system
   # libraries to be configured.
   suppressMessages(repo <- BiocManager::repositories())
   deps <- remotes::dev_package_deps(dependencies = TRUE, repos = repo)

diff --git a/bin/repo_check.py b/bin/repo_check.py
@@ -146,7 +146,7 @@ def check_labels(reporter, repo_url):
     for name in sorted(overlap):
         reporter.check(EXPECTED[name].lower() == actual[name].lower(),
                        None,
-                       'Color mis-match for label {0} in {1}: expected {2}, found {3}',
+                       'Color mismatch for label {0} in {1}: expected {2}, found {3}',
                        name, repo_url, EXPECTED[name], actual[name])
 
 

diff --git a/setup.md b/setup.md
@@ -29,7 +29,7 @@ about collaboration you won't be able to publish all example data).
 
 ## Participate with own computer: install software
 
-If you want to follow the exaples on your own machine, you will need
+If you want to follow the examples on your own machine, you will need
 to install DataLad and some additional software which we will use
 during the walkthrough. Note that Linux or MacOS are strongly
 recommended for this workshop; although DataLad works on all main