AI has been developing rapidly throughout the recent years. The release of GPT-3 has shown a remarkable development in AI text generation as compared to previous attempts. In 2021, Microsoft introduced their new tool Github Copilot, an AI trained on publicly available code from Github, to generate code suggestions for developers. As this and similar tools become more and more prevalent, teachers and those in the academic field will need to prepare themselves as students begin to take advantage of these tools. As technologies like this become more and more widely, students will be more and more likely to use them to cheat.
When professors give their students assignments, if the students are dishonest, students can already search the internet and plagiarize code from sources like StackOverflow. The problem with this approach is that the code is often copied verbatim, which means that professors can use automated systems to determine if a student's code is found online. Github Copilot, on the other hand, procedurally generates its own code. While this is important in corporate environments, it makes detecting plagiarism nearly impossible for professors.
This brings up an important question: is using Github Copilot actually cheating? Is using Copilot plagiarism or are you allowed to claim its work as your own? In addition, what makes Copilot different than code snippets or autocomplete? My goal in this paper is to answer this question from my perspective as a student and to examine one example submission generated by the AI.
The first question that must be addressed is that of legality. Does using Github Copilot violate any copyright or any other laws?
Specifically, let's look at plagiarism. If a student were to copy a philosophy paper from online, for example, this would be an example of academic dishonesty, as the student would be claiming someone else's work to be their own. Even if they were to "cite their sources" and simply put a quote mark at the beginning and end of the paper and indicate that the entire paper was just one long quote from another source, most professors would reject this paper and not give the students credit for it. On the other hand, if a student copies a paper without citing their source, there could be legal issues related to stealing another's intellectual property.
Whether Github Copilot itself is in violation of copyright laws is a different issue. There is a question as to whether Github has legal rights to train itself on code under open-source licenses, but this is a topic for another report. We will continue this paper assuming that Github is not in violation of any copyright laws; however, this paper is not addressing this issue.
In the same way as lifting entire paragraphs from the internet is disallowed, if a student were to generate a majority of their submission via Github Copilot, a teacher would most likely reject the assignment. After all, the student didn't do the work and didn't show that they have learned whatever topic was intended by the professor, so they don't deserve a good grade.
However, while teachers might not accept assignments generated by AI, they first would need to know that the assignment was generated by an AI. The problem is that, from a legal/copyright standpoint, students do not need to give credit to Github Copilot if they use it.
According to Github Copilot's website, "GitHub Copilot is a tool, like a compiler or a pen. The suggestions GitHub Copilot generates, and the code you write with its help, belong to you, and you are responsible for it." Copilot's creators claim that Copilot, like a code editor, is merely a tool and that you have complete rights to all the code it generates. If I write code using Visual Studio Code, that doesn't mean that Microsoft has full rights to my code! The code I write is still my code, and I hold all the rights to that code. It's just like with paper and pencils - if I use a Pilot pen to sketch out an idea on a napkin, I still maintain full legal rights to the idea. Neither the pen manufacturer nor the napkin manufacturer can take your idea - it's your idea. You simply used their tools to communicate the idea.
In addition, unlike papers found online, you do not need to credit Github Copilot in your code. As they say on their website, "the code you create with GitHub Copilot’s help belongs to you. While every friendly robot likes the occasional word of thanks, you are in no way obligated to credit GitHub Copilot. Just like with a compiler, the output of your use of GitHub Copilot belongs to you."
Thus, from a legal/copyright standpoint, code generated by Github Copilot would be fair use for students. If a student were to use Github Copilot without mentioning it to a professor, that student wouldn't be able to be legally able to be prosecuted.
This, of course, assumes that Copilot acts like a compiler or a code editor. Let's compare how Copilot acts as compared to other tools which can be integrated into an editor.
Over time, programmers have worked to make their lives easier. They write tools which help them write programs easily, tools to help write their code for them. While the levels of code writers are much more granular than discussed here, we will limit our discussion to a few levels of tools which can be used to help developers.
The first level is to use no suggestions at all. This would be like using a app like Notepad or TextEdit to write code. There would be absolutely no help whatsoever for the developer. While this is certainly the hardest way to write code, it would also force the students to remember every rule in a given programming language, from remembering to close parenthesis to always ending a line with a semicolon.
The next level of assistance would be to use a simple editor with a few more tools. This editor would include highlighting, parenthesis matching, auto indent, and a few other basic tools to help users on their way. Apps like Vim and Gedit fall into this category. They make it easier to find mistakes and to fix them, but you still need to remember every rule of your language. It helps, but you still need to know how Java works, for example, in order to write Java programs.
The next level would be to use an editor with syntax highlighting and intelligent code completion. This would give you suggestions when you're typing for function and class names. This encompasses most standard code editors used in programming classes, like Eclipse or Visual Studio Code with IntelliSense. You still need to know how to code, but you don't need to remember the exact name of every class or method. Most people using these tools will be familiar enough with a language so that they know the name of a function they've begun typing. If you begin typing rand
in python, you most likely know that you're trying to call the randint()
function. On the other hand, you may not remember that the function is called randint
and may be trying to see which random functions are provided. While slightly less knowledge of a language is required, knowledge of the given language is still required.
The next level is to use an editor with snippets. Snippets are standard coding patterns that the editor will automatically complete for you. For example, if you're creating a function called functionname
in Java, it can autocomplete the header public static void functionname(){}
. If you have private variables in a Java class, snippets can be used to automatically generate getter and setter methods for it. Snippets are used to remove the monotony of rewriting the same code over and over again, but they still require you to know most of the language. They can let a user know less about a language, but they still require mostly extensive knowledge of the language.
Finally, Github Copilot is on a completely different scale. With Copilot, you need to know next to nothing about how a language works in order to complete an assignment using it. If the user is lucky, they won't even need to know what their code is doing, depending on how well the outputted code works. As long as the function works, it works, so some students would have no problem plugging an assignment into Copilot, generating and testing 10 different responses, and determine if any of them pass all the given test cases.
Ultimately, then, the main difference between Copilot and the other tools is that Copilot doesn't necessarily require you to know the language you're using while all of the other tools do require you to know the language you're being taught. Ultimately, then, from a learning perspective, the other tools can be allowed as they force you to learn the language, but only Copilot can be used by students to pretend that they know a given language.
It's like using a calculator. In introductory math classes, students are forbidden from using a calculator on their assignments. Once they reach high school, however, most students are allowed (if not encouraged) to use a calculator. Copilot is a similar tool. While Copilot shouldn't be used by someone learning a language, it can be used once a language is known. Of course, just like in a math course, students should ask their professors if calculators are allowed. Some college math courses may be focused on teaching a specific method of completing computations, so they may forbid students from using a standard calculator, instead making them use some other method to calculate a result.
Now that we've gone over some background to how Github Copilot compares to other tools, let's examine how good Copilot is at generating responses to college assignments.
In my initial test, I received three assignments from Dr. Edward Kovach, a computer science professor at Franciscan University of Steubenville. I placed the assignments body in a comment at the beginning of a file named FMCLprog<number>.<extension>
and had Copilot generate responses based off of these. I won't detail all three responses here, but they can be found on Github. All three were generated in a similar way as the analysed response was.
In this paper, we will analyze the program for the first piece of homework. The filename was FMCLprog1.py
# Homework 1
# Design a program with the class FMCLprog1. This class will prompt the user for two ints and display those numbers with their sum. 5 points. Due Friday, 9/3/21. FMCL = First, Middle, Confirmation (if any), Last initials in your name.
# Done in python
class FMCLprog1:
def __init__(self):
self.num1 = int(input("Enter a number: "))
self.num2 = int(input("Enter another number: "))
self.sum = self.num1 + self.num2
print("The sum of your numbers is", self.sum)
FMCLprog1()
We will now go through the lines of the program and analyze Copilot's response.
# Homework 1
# Design a program with the class FMCLprog1. This class will prompt the user for two ints and display those numbers with their sum. 5 points. Due Friday, 9/3/21. FMCL = First, Middle, Confirmation (if any), Last initials in your name.
# Done in python
I entered the first three lines myself. These were copied verbatim from the professor's assignment. I simply added the #
s so that these lines were treated as comments by the program and by Copilot.
class FMCLprog1:
def __init__(self):
self.num1 = int(input("Enter a number: "))
self.num2 = int(input("Enter another number: "))
self.sum = self.num1 + self.num2
print("The sum of your numbers is", self.sum)
After entering three blank new lines, Copilot generated the FMCLprog1
class all at once. Often, before generating a new part of a program, Copilot will wait for three new lines, so this was expected.
The program is a near perfect interpretation of the professor's instructions. It creates a class FMCLprog1
which asks for two inputs. It then displays the sum of these two inputs.
Then, after another three lines, Copilot calls the init
method, causing the program to run:
FMCLprog1()
One thing to note about Copilot's response is that it does not completely follow the instructions. The instructions say that the program should "display those numbers with their sum", but Copilot only outputs the sum. The program is able to properly add the two numbers together, but does not display them properly.
One other thing to note is how there is no indication that the program was written by an AI. While a teacher could count the number of new lines between the different sections of the program, this wouldn't be a perfect indicator of whether a program was written by AI or not. In addition, things like this are simple for a student to fake.
From this and from other past experimentation, Copilot is imperfect. Using it isn't plagiarism, but using it without informing your professor could be considered academic dishonesty. The initial testing shows that, while the AI is imperfect, it is still able to produce very nearly perfect submissions.
In his article Your Wish Is My CMD, Neil Savage points out that very little code on sources like Github is labeled with its intention FOOTNOTE. Even if it can generate runnable code, from this experiment, Copilot is not always able to generate code that perfectly follows the given instructions. It can generally follow the instructions, but it often needs some human correction to complete a task properly. This lines up with my experience testing Copilot. Copilot can generate functions, classes, and other code incredibly quickly and accurately; the problem comes when Copilot tries to generate code based off of a very specific prompt. It sometimes gets it right, but more often than not, it'll end up generating a related response which does part of the task provided, but not all of it.
Submissions generated via Copilot and other tools would be nearly indistinguishable from normal submissions, especially in a large classroom. In addition, from my other (undocumented) experience working with Copilot, a simple knowledge of a language and some simple manipulation of Copilot is more than enough to make the bot generate perfect, working code. While Copilot may not be a perfect student to cheat with yet, given a bit of time or give it to a student with a cursory understanding of programming, and Copilot will be a formidable weapon used against professors.
In the next year or so, I am planning a much larger experiment based on Copilot's capabilities. While I am still hashing out an exact methodology on Github, the result will end with having professors grade one extra assignment each. The submission will be either generated by AI or taken from the internet. As it will be a blind study, professors will not be told the source of their assignment and will be asked to grade the code along with student submissions. They will then take the grade, compare it to other students' grades, and see if Copilot does better, worse, or average in their class. I am still working on the exact details, but if you are interested in helping with this process, please contact me via the email listed on my Github profile and let me know! Thank you for your interest in my project!