In this repository you will find the material for exercise 2 (to be completed for the 14th of February 2024). This is part of the course 'Computational Text Analysis' taught at the University of Edinburgh by Dr. Marion Lieutaud. There is no datafile in this repository as the code in the Rmd script will show you how to download it directly.
This is a group exercise. If you have attended the class on the 7th of February, then you know who your group members and you will be working together on this exercise. If you missed the class, then you will be doing this exercise alone.
Start by forking this repository as shown in the class (click on the 'fork' tab towards the top right). If you're part of a group, you should decide who in the group will do this (it's best if you don't all fork it). Then once one of you has created a forked repository, that person can add the others as 'collaborators': click on tab 'settings' towards the top right, then 'collaborators' then 'invite collaborators'. You will need your group members' Github username to invite them to become collaborators on the forked repository. Once you're all collaborators on the forked repository, you can start the exciting work of collaborative computing! Each of you can clone the forked repository (using Github Desktop as we did last week) so you can work on the R code on your own laptops. You may want to distribute the exercise questions between group members. This will help avoid the headache of code clashes, which can happen if several group members are trying to work on the same questions at the same time. You will also want to regularly 'pull' commits (again, using Github desktop) from the online forked repository: this basically means that you are regularly updating the repository you're working on. It's important to do that regularly or you might not realise that other group members have already edited content you're working on. Once you're happy with your code on R, save your edits, then go to Github Desktop, click 'commit to main' and then 'push origin'. This will push your commits onto the online forked repository that your collaborators have access to. Don't forget to click on 'push origin' or it won't work.
The exercise questions are at the end of the .Rmd file. You can either choose to work them through separate .Rmd files saved under different names for each group members (e.g. exercise2_Alex.Rmd; exercise2_Jess.Rmd) and combine your code into one final document before next class. Or you can all work on different section of the same .Rmd document. The first method is a little safer (less risks of severe clashes) but more tedious. By next class, you will need to have a single .Rmd document, and you will need to knit it to an html output (click on 'knit' within R and let the magic happen).
To do this exercise, you should draw on the code presented in the exercise, the demos, previous exercises and the livecoding scripts that we have developed in the tutorials (these are available on Learn, in the tutorial material section for each week). You can also look for help and inspiration in other online resources, including for example the quanteda tutorial page [https://quanteda.io], the quanteda quick start guide [https://quanteda.io/articles/quickstart], and Stack Overflow [https://stackoverflow.com]. You may find Regex Cheatsheets (e.g. [https://hypebright.nl/index.php/en/2020/05/25/ultimate-cheatsheet-for-regex-in-r-2/] and the Stringr package cheatsheet (for string detection) [https://raw.githubusercontent.com/rstudio/cheatsheets/main/strings.pdf] particularly useful.
You could also use ChatGPT but you should know that its use of R is quite outdated; ChatGPT is (for now anyways) not great at coding, especially in R, and you will not learn much by copy-pasting from it. You may also be asked to explain your code in next week's class. Feel free to get help whichever way you want, but make sure you understand the code you're writing!