-
Notifications
You must be signed in to change notification settings - Fork 0
/
02-vc.Rmd
345 lines (292 loc) · 17.8 KB
/
02-vc.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
# Version Control Systems
*Version Control Systems* @wiki_version_control (VCS - also named as _Revision Control_,
_Source Code Control_, _Source Code Management_) are Computer Systems
traditionally used to manage the complexities of software development, majorly
Computer Systems involving multiple software modules, and large development teams.
Such systems help to organize such development, especially when collaboration of
multiple developers is done, and several times people are doing code changes in
a same, or close, part of the code, which may break it, or even generate
unexpected results that would not be traceable.
<!-- ## Why using it (out of CS)? -->
<!-- Science is becoming more collaborative everyday, with groups of people contributing within a -->
<!-- same research group, across groups and across institutions. At the same time, efforts on Open Science, and Reproducible Science are demanded by the scientific community as a way to measure progress in transparent and objective criteria, as well as providing accountability on such developments. Since -->
<!-- the interactions between academics and industries are becoming more frequent, scientists are also called -->
<!-- to bring a same accountability level as if they were stakeholders of such companies. -->
<!-- Given this scenario, CVS systems are often used in a way to provide towards the scientific method (when using computational resources): -->
<!-- - Open Science -->
<!-- -- Open data: -->
<!-- -- Open source-code: -->
<!-- -- Open access: -->
<!-- -- Open peer-review: -->
<!-- -- Open educational resources: -->
<!-- - FAIR data -->
<!-- -- Data is Findable: -->
<!-- -- Data is accessible: -->
<!-- -- Data (and code) are interoperable: -->
<!-- -- Data (and experiments) are reusable: -->
<!-- - Reproducible experiments: -->
## VCS: Local repository vs. Remote repository
Common VCS's (ex.: [CVS](https://en.wikipedia.org/wiki/Concurrent_Versions_System),
[SVN](https://en.wikipedia.org/wiki/Concurrent_Versions_System) and
[Git](https://en.wikipedia.org/wiki/Git)) always are client-server systems.
The VCS client is not merely a means to access a central repository hosted
at the VCS server, but actually is part of running checks and balances that are
necessary when adding code to the central repository. A situation that demands that
is when multiple users are doing a change in a same piece of code, and checking
it only at the central repository may complicate more a situation already
complicated (simultaneous contribution). Therefore, it hosts, locally
a local copy of the codebase, which reflects a view of that codebase in time
(more preciselly when that local user retrieved a copy from the central repository
for editing).
Once the changes are done, the user saves a local copy of these at the local repository,
and if these meet the criteria for changes (and this may vary a lot among systems),
the user is able to persist this change at the local repository, as a new version of
the codebase. This process is known as _commit_ (in this case, a local one).
## Git
[Git](https://git-scm.com/) is a contemporary VCS that is used as the backbone
of operations run on GitHub. So, when planning to operate with GitHub, it is
almost certain that you will need to install a Git client on your machine.
We can say _almost_ because GitHub provides an own client,
[GitHub Desktop](https://desktop.github.com/) which provides a Graphical User
Interface(GUI) to a git client.
We will not cover such GUI usage since on Compute Canada you will not have access
to GitHub Desktop on their machines, but only to
a regular Git client (that you can use to access either GitHub, or another Git
compliant server such as a private repository built with
[GitLab](https://about.gitlab.com/)).
### Installing a git client on your machine
#### Ubuntu (or any other Debian-based environment)
First update your Operating System (OS) repository indexes:
```{bash, eval=FALSE}
sudo apt-get update
```
Then proceed to install:
```{bash, eval=FALSE}
sudo apt-get install git
```
Note: If you use Ubuntu, git is already provided as a base repository on this
distribution. If you use another linux, and this does not work for you, please
[open an issue](https://github.com/ricardobarroslourenco/CCprimer/issues) on
this project and use the label _requests_, and I will try to solve and include
here for future reference.
#### Apple
Apple is an OS that not use repositories by default, and git is not provided in
the main set of software. So you would have three main options to install git:
- Install Xcode (*Preferred*): Apple has a software development kit (SDK) named
[XCode](https://developer.apple.com/xcode/) which provides git among several other
software for MacOS and iOS software development. The advantage of using XCode´s
git distribution is that it comes adjusted for your OS and is supported by Apple,
whcih avoids conflicts and security issues.
- To install XCode (and git by consequence), open the App Store, and search
XCode as a software developed by Apple, and install it. To test, open the
Terminal App, and run _git_. You should receive an output of basic
description of commands available on git.
- Install Homebrew, and then install Git (if you are a Homebrew user, or uses a
lot of *.nix software not supported by Apple it is useful):
- [Download and install homebrew](https://brew.sh/) from their website;
- Install git as:
```{bash, eval=FALSE}
brew install git
```
- In the terminal, run _git_ and you should receive an output of basic
description of commands available on git.
- Install a standalone version of git (*not recommended*): You can download and
install from git´s website a standalone client (I will not be covering this
approach, because git will not be updated via this kind a install, posing a
security risk and install this at your own risk - and effort!)
#### Windows
Since Windows works with standalone installations, the install follows as this:
- Go to the Git website and [download](https://git-scm.com/download/win) the
latest client (download the standalone version, unless you have severe storage
limitations on your machine);
- Once downloaded, run the installer with admin permissions and follow the
default installation;
- When installed, open PowerShell (PS) (or the command prompt, if you do not
have PS installed), and run _git_ and hit enter. You should receive an output of basic
description of commands available on git.
### Setting up your local git client
Once you have your git client up and running, you need to setup the access
credentials to connect to a git repository, such as GitHub or GitLab for example.
To do so, run on your bash terminal (or command-line if on windows):
```{bash, eval=FALSE}
git config -- global user.name "Your full name"
```
Note: On quotes you need to specify a name (depending on the environment,
it should be your full name, or an alias, such as employee number, for example). This
command has a global scope, so all users on your machine would have the same setup.
Then you need to specify an e-mail address (if on GitHub, it is preferred that
is linked to your account):
```{bash, eval=FALSE}
git config -- global user.email "Your e-mail address"
```
Then check, if all values were set:
```{bash, eval=FALSE}
git config --list
```
You should see an output that reflects those previously set name and e-mail values.
<!-- https://docs.github.com/pt/get-started/getting-started-with-git/setting-your-username-in-git -->
<!-- https://www.edgoad.com/2021/02/using-personal-access-tokens-with-git-and-github.html -->
## GitHub
<!-- GitHub is a web.... --- Describe -->
GitHub is a Web collaborative Version Control service based on a Git platform. It was
created in 2008, and as of today is the largest repository of source code in the World.
Aside of VCS capabilities, it provides other computational services, such as
Continuous Integration (CI) and Continuous Deployment (CD), Web Page Hosting on GitHub
Pages, security solutions, a software marketplace, among many others. It was recently
acquired by Microsoft, as one of the largest transactions in the tech domain.
GitHub has several different [tiers of access](https://docs.github.com/en/get-started/learning-about-github/githubs-products#about-githubs-products),
from basic free accounts up to corporate ones.
A main advantage for academic users would be using the [GitHub Education](https://education.github.com) which includes a free [GitHub PRO](https://docs.github.com/en/get-started/learning-about-github/githubs-products#github-pro)
account and several software for free (or services credit hours) while you are
enrolled in an academic institution. GitHub Education also have [different tiers of users](https://education.github.com/benefits)
aiming students, teachers, and the academic institutions itselves. Once you
create a simple free GitHub account, you can return back to GitHub Education,
and request to be enrolled in the program.
### Creating an account
To create an account is simple:
- Go to [GitHub pricing page](https://github.com/pricing), select the Free tier,
and click _Join for Free_;
- Just follow the prompts to create your personal account.
- You should receive an e-mail from them on the registered e-mail account, to
verify your identity. If you fail to do so, your account would be basically [useless](https://docs.github.com/en/get-started/signing-up-for-github/verifying-your-email-address#about-email-verification).
Note: Since I have created my account some years ago, I am just using GitHub's
user documentation as reference. If things get complicated in this step, please
let me know, and I will expand here.
### Insert GitHub credentials on your local Git
Once you have your GitHub account created, and verified, it is time to setup these
credentials on your local Git client. While logged on your GitHub account you should:
* Click on your _name/avatar/photo at the upper right corner of the screen_, and
then click on _Settings_;
* Then click on _Developer Settings_ (it is the last item on the bottom of
items at the left panel);
* Now click on the bottom item, _Personal access tokens_;
* Then, click on _Generate new token_, you will be requested again for your
password on this step;
* Now you will be required to fill in some info:
+ _Note_: This will be a name of this Token. I often create a Token per
software+device I use, then someone can write "Rstudio on Laptop", to
differentiate from "Rstudio on laboratory desktop".
- Isolating tokens on different (software,device) pairs helps isolating
accesses to your GitHub account, and is important to contain a security
breach. If someone is able to fetch one of your tokens, you will know from
which machine and which software on it came from.
+ _Expiration_: You should define a expiration date for your token. Choosing
a date is an open ended question, but ideally systems exposed to the Web, or
multiple users such an HPC cluster, should be changed frequently.
- _Avoid at all means to use the No expiration_ mode, because you never
want to forget unattended keys of your GitHub (ex. You graduate, and your
key is left on a machine that another student will use in the lab).
+ _Select scopes_: Perhaps this is the trickiest setup. The definition of a
Token is actually a definition of a [OAuth](https://en.wikipedia.org/wiki/OAuth)
Token. On GitHub, this implies on being able to select all options that a user
has in terms of operations on the platform, and actually, narrowing down to
what a user wants to grant permission for in such Token.
- Note: It is tempting to grant _all permissions_ to a single Token, but the user
should ask it that is really necessary. In one end, granting all permissions
is too permissive, and being as problematic as having a token with no
expiration data.
- To look into what each scope covers, please look into this
[page](https://docs.github.com/en/developers/apps/building-oauth-apps/scopes-for-oauth-apps#available-scopes).
* Once you have generated your token, treat is as a password (even in terms of
security/sharing/etc.). You should not persist a copy of it, but only use for your
local git setup. Even if you loose access to it, you can always revoke that
token, and create a new one.
* To set your local git to access GitHub (remind that you need first to set [global variables](https://ricardobarroslourenco.github.io/CCprimer/version-control-systems.html#setting-up-your-local-git-client)),
you just need to access GitHub with it. A simple way, that will be further
explained in more detail would be making a local copy of a repository using _git clone_:
```{bash, eval=FALSE}
git clone https://github.com/a_very_weird_user_name/a_more_weird_user_project_name.git
```
A concrete example would be the source-code of this document:
```{bash, eval=FALSE}
git clone https://github.com/ricardobarroslourenco/CCprimer.git
```
* Once done, your local git should request:
+ User-name: The one you have set for your GitHub account;
+ Password: Your token.
* After this, it should make a local copy of that cloned repository (more
especifically this copy will be stored at the directory path you have run the
clone command - in bash / terminal / command line).
+ Note: If you have not been requested a username/password, maybe you have already
a GitHub account already set on your machine (so please note if any errors occur
when trying to access GitHub with git, no username + password is required -
if these occur, they will be output on your bash / terminal / command line)
* Finally, to persist the token on your local install, run:
```{bash, eval=FALSE}
git config --global credential.helper cache
```
+ If you need to clean-up the token(s) installed:
```{bash, eval=FALSE}
$ git config --global --unset credential.helper
```
<!-- https://www.edgoad.com/2021/02/using-personal-access-tokens-with-git-and-github.html -->
## A combined usage of Git-GitHub
This section describes a usage of Git-GitHub intended for scientific usage. Therefore,
is important to remark that there is still no consolidated practice on such application,
considering that these tools were not meant for academic usage, but rather for a software
engineering one. Such application varies across research groups and scientific domains, and on a best effort, we will summarize some good practices and update as needed.
### Basic commands to update a repository
After cloning a public repository, a user may want to update it in several ways.
A very common HPC scenario is when the user is testing code in production,
in a cluster environment, while debugging. Other scenarios are related to persist
small data outputs - considering the repository storage limitations.
Given that, users should do, once saved their changes to the local git is to
_add_, _commit_ and then _push_ changes.
#### Git Add
When the user create or delete file(s) it is necessary to inform `git` that such
addition/deletion happened via:
```{bash, eval=FALSE}
git add folder_url
```
If you want to update the entire project, you need to go via bash to the root of
the git project, and then:
```{bash, eval=FALSE}
git add .
```
#### Git commit
After eventual addition/deletion(s) are informed to git, then, it is necessary
to inform updates and the nature of these. This can be done via:
```{bash, eval=FALSE}
git commit -m "message describing the commit"
```
Ideally the message would describe what was done. Important to note that this
could be done multiple times, because this is only persisting local changes (on
the machine which the client is installed), not global changes. It is good practice
to group "same topic" commits, as it is easier to do a rollback of such changes,
avoiding to impact changes not related to a found issue.
#### Git push
Once the local repository changes are finalized, the user needs to synchronize
these local changes with the main git(like)/GitHub repository. Only after this,
these changes would become available to all users.
Often the repository on github has a `main` branch, therefore, the command is:
```{bash, eval=FALSE}
git push -u origin main
```
In which `origin` is the local repository, and `main` is the branch you want to
update in the git server. These may be different, depending on your project
characteristics.
<!-- ### Creating a repository -->
<!-- A _repository_ is a collection of code, stored on a git server (in this case, -->
<!-- a GitHub Repository). As any intellectual task, you may split your information/code -->
<!-- as desired. -->
<!-- In CS, people often keep a same service (a part of a system) at a -->
<!-- same repository. -->
<!-- In academia, is a consensus that researchers tend to keep all the codebase used -->
<!-- in a publication at a same repository. Then some variation occurs across communities. -->
<!-- The GIScience community, more especifically the R GIScience community, developed a -->
<!-- Depending on how you set your environment, there are some options to crea -->
<!-- <!-- https://www.atlassian.com/git/tutorials/setting-up-a-repository/git-init -->
<!-- ### Retrieving code from a repository -->
<!-- <!-- https://www.atlassian.com/git/tutorials/setting-up-a-repository/git-clone -->
<!-- <!-- https://www.atlassian.com/git/tutorials/syncing/git-pull -->
<!-- ### Submitting code to a repository -->
<!-- <!-- https://www.atlassian.com/git/tutorials/saving-changes/git-commit -->
<!-- <!-- https://www.atlassian.com/git/tutorials/syncing/git-push -->
<!-- ### Conciliating _"conflicts"_ -->
<!-- #### Git Merge -->
<!-- <!-- https://www.atlassian.com/git/tutorials/using-branches/git-merge -->
<!-- #### Git Rebase -->
<!-- <!-- https://www.atlassian.com/git/tutorials/rewriting-history/git-rebase -->
<!-- ### Contributing to open-source projects -->
<!-- #### Creating a pull request -->
<!-- <!-- https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests -->