-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added doc about inclusion of packages mentioned in the student's subm… #22
base: master
Are you sure you want to change the base?
Conversation
…ission in the test environment
There's a small issue here. Technically "[t]herefore, any libraries included in the student submissions must also be included in the test environment" is not correct. Strictly speaking, this would be unfortunate if it was true. In that case the instructor would have to match and use the exact same libraries that every student is using. In a big class, that would be especially difficult. So the modularity is a perk. For instance, if students are using However, extra In the example code you emailed me, your test files ultimately required a few extra Also, it should be mentioned that instead of using |
Happy to pulll after some modifications have been made |
On 11/3/24 9:22 AM, Taylor R. Brown wrote:
Happy to pulll after some modifications have been made
The issue is a bit more complex, it seems.
Let's work with a small example that uses the dplyr package.
Let's call this "Case 1". Assume a student submits the following code:
library(dplyr)
data("iris")
df <- iris
tmp_df <- filter(df, Species == "setosa")
Assume that the auto-grading program is checking this as follows:
library(testthat)
test_that("1(a)", {
expect_equal(dim(tmp_df)[1], 50)
expect_equal(dim(tmp_df)[2], 5)
})
Then everything works as expected.
Let's call this "Case 2". Now, consider that the student modularizes
his or her code, creates a function and filters in the function to
return a data.frame object, as follows:
library(dplyr)
data("iris")
q_1_a <- function() {
df <- iris
tmp_df <- filter(df, Species == "setosa")
tmp_df
}
Assume that the auto-grading program is now structured as follows:
library(testthat)
test_that("1(a)", {
tmp_df <- q_1_a()
expect_equal(dim(tmp_df)[1], 50)
expect_equal(dim(tmp_df)[2], 5)
})
Now suddenly, the test fails because dplyr is not available in the
environment of the auto-grader.
There are two options to fix the above issue: (1) force the student to
use namespaces, i.e., instead of filter(...), use dplyr::filter(...) in
q_1_a(); or (2) import package dplyr in the auto-grading program.
Option 1 (using namespaces) is a non-starter for me for a variety of
reasons, primary among them being that the concept of namespaces may not
be familiar to all students. The course I teach is a cross-sectional
course in data mining and machine learning that attracts students from
other disciplines besides CS. So while students may be familiar with
some block-structured programming language, functions, and variable
scope, and can pick up R quickly, they may not be familiar with the more
CS-oriented concepts as namespaces and OOP.
Option 2 works for me, but I understand that the instructor would have
to match and use the exact same libraries that every student is using.
To me, this is not a problem as I introduce the libraries to be used for
their modeling and expect the students to only use those libraries.
In the end, it is a matter of style. Case 1 encourages free use of
global variables with the stipulation that the names of the global
variables are pre-determined and shared by the instructor with the
students. Case 2 encourages modularization through functional scope so
the student can write small functions that perform the task, and the
instructor simply calls those functions.
Question is, where is the happy medium such that the gradeR package can
support both cases? I do not know yet.
Apologies for a long response. Let me know if something workable
strikes you; I will be doing the same.
Cheers,
…--
- vijay
---
Vijay K. Gurbani, Ph.D.
Research Associate Professor of Computer Science,
Illinois Institute of Technology
Chicago, Illinois ***@***.***
http://www.cs.iit.edu/~vgurbani
|
Option 2 is better, but regards to your example, the problem with it is that your test code is assuming things about the student implementation, which is not a great idea. The student can surprise you by either not defining or, or changing it to somethign expecting. At the very least, you should re-define this function you're calling in your own test code. Then, when you redefine it, you'll see you're clearly using third party libraries, and you can deal with them either by using Even better, check the output without using the function at all. "Option 2 works for me, but I understand that the instructor would have to match and use the exact same libraries that every student is using. " Not so. The instructor can test student output without using all of those libraries. The student is limited to using libraries that are already installed on whatever machine the code is being run on. This is a feature as well. It forces students to minimize dependencies. |
The same problem --- i.e., the student surprising you --- exists even in the case when pre-determined names of global variables are used; I don't see much of a difference. When I give the homeworks, I am meticulous to state the expected return value, for example: "(v) [0.33 points] How many frequent itemsets are there with a support of 0.10? Your function should return an object of class itemset."
I am not sure what you mean by "re-define this function you're calling in your test code.". Perhaps a quick example using the code I provided may explain your thought better. Thanks,
|
Yes, at the very least, you need to request specific variable names. If students, say, spell them wrong, that's on them. Requesting types is more strict. R is dynamically typed, so usually a few types can be returned and still pass my checks (e.g. tibbls and data frames are both fine). If you asked them to define a function called If Anything you use in your test file, if it is dependent on a third party library, I don't see how you can expect to not have to use |
Right, in my case, q_1_a() is NOT demo code; it is the name of a function that they are instructed to write as part of their homework. The homework writeup stipulates (1) the name of the function, (2) any parameters it takes, and (3) the return value.
Agreed; the intent is not to have the student code dictate libraries to be used to reduce the complexity on the auto-grader. So, by explicitly importing the library in the auto-grader code, I can continue using it and allow the students to modularize their code. Question is, how should we update the gradeR document to support such a use case, assuming we want to support it. The only use case documented in the vignette right now is the use of pre-determined global variables. I am trying to stay away from such global variables due to unintended consequences and side effects that may creep into all but the most simple of programs. Thanks,
|
@tbrown122387, good morning. Any more thoughts on the above chain? One way to proceed is to expand the gradeR vignette so it supports multiple ways to structure homeworks for auto-grading. Currently there is only one way to do so --- using pre-determined and agreed-upon names of variables available in the global namespace. This may not work for all cases, so we could present a second way to use the package as I have done. WDYT? Thanks,
|
I’m thinking more of a minimal edit. Maybe an additional sentence or two |
I think it will be hard to fit it in a sentence or two. The issue is
complex enough.
The way to use the auto-grader as currently shown is good and it works.
However, there are other ways to approach the problem. If interested, I
don't mind writing a vignette on my approach if you think that is
warranted.
Cheers,
- vijay
---
Vijay K. Gurbani, Ph.D.
Research Associate Professor
Department of Computer Science, Illinois Institute of Technology
Chicago, Illinois | http://www.cs.iit.edu/~vgurbani
<http://mypages.iit.edu/~vgurbani> | ***@***.*** ***@***.***>
…On Mon, Nov 11, 2024 at 9:10 AM Taylor R. Brown ***@***.***> wrote:
I’m thinking more of a minimal edit. Maybe an additional sentence or two
—
Reply to this email directly, view it on GitHub
<#22 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APBHV5C67WC7VVU54MQVNYL2ADCGPAVCNFSM6AAAAABQ4FT37KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRYGQYDGOBSGY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
A whole vignette? Is your approach that distinct from mine? |
@tbrown122387, Good morning. Well, the three sentences I suggested couple of weeks ago in the pull are: "Note that the student submissions are run in a new, clean environment. The tests are run in a different environment than the student submissions. Therefore, any libraries included in the student submissions must also be included in the test environment." However, these did garner some pushback [1], which is why we are having the current conversation. If you think we can wordsmith the above sentences, that'll be one way to proceed. Any suggestions on wordsmithing the above? Thanks,
[1] #22 (comment) |
…ission in the test environment