-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ramsay t/translation efficiency 2 #6746
Conversation
…suggestion that fixes the line explosion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ramsay-t let me know when this is ready for review again.
@ana-pantilie @Unisay I've done some fairly horrible things with our golden testing functions but it seems to work :) There are two examples - one currently certifying and one not - with a simple process to add more. Yes, technically a test of the certifier ought to not invoke the optimiser and should use the AST stacks as input, rather than UPLC and calling the optimiser, but that would be really annoying to serialise and store. I would argue that if we change the optimiser enough to change the code it produces then we probably should re-read the certifier tests at least quickly to make we haven't changed anything the certifier should care about. |
Added | ||
----- | ||
|
||
- CI Tests for the certifier, using some simple unoptimised UPLC | ||
|
||
Changed | ||
------- | ||
|
||
- Complete reordering of decisions in UntypedTranslation.lagda.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually we add user-facing changes to the changelog, so I'd remove these two.
data Translation (R : Relation) { X : Set } {{_ : DecEq X}} : (X ⊢) → (X ⊢) → Set₁ where | ||
istranslation : {ast ast' : X ⊢} → R ast ast' → Translation R ast ast' | ||
var : {x : X} → Translation R (` x) (` x) -- We assume we won't want to translate variables individually? | ||
data Translation (R : Relation) { X : Set } {{_ : DecEq X}} : (X ⊢) → (X ⊢) → Set₁ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this have no constructors now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does, they've just moved to line 68 because it is co-inductive with the TransMatch definition, so the type signature has to be up here.
[ testGroup "simple certification" $ makeTestTree simpleTests | ||
, testGroup "failing certification" $ makeTestTree failingTests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good that we can run tests on CI now, but an important aspect of testing is communicating intent: why do we expect the len
test to fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the comment above the definition on line 48 says: they "fail" in the sense that the certifier says "no" (currently). We will need to look into why that is, but you explicitly asked me to include this so we know the semantics aren't changed (until we do change them).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually you want to add tests which actually behave the way you expect them to. So this failing test should be transformed into a succeeding test, and commented out until it's fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps "failing" is the wrong name - we want some tests that don't certify. Otherwise we are just demonstrating that the certifier says "yes", and not that it actually checks things. I'm not sure what English word you would prefer to "failing" in that case? Rejected, or Negative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I know really we should have tests that we intend to be rejected, but I was hoping to avoid turning this into a solid week of test engineering ;) I shall do that though...
Please re-format the |
@@ -0,0 +1,62 @@ | |||
{-# LANGUAGE OverloadedStrings #-} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pragma is likely not needed as the language extension is already switched on by default in the cabal file.
@@ -0,0 +1,62 @@ | |||
{-# LANGUAGE OverloadedStrings #-} | |||
|
|||
{- | The tests in this file run tests of the uplc certifier. Various unoptimised UPLC is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please lets keep lines within the 100 character width.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...this is 89?
I'm all for the classic 80 character limit if you'd prefer, but then lets set the github commit hook to enforce that.
Can you add the rest of the tests we were using to check the certifier manually? Like the ones generated by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me, but to repeat my previous point I currently do not believe this is the right solution long-term despite it making things faster short-term. More on that here.
The problem is they are randomly generated, so we can't do them as "golden" tests. Then we would have to decide what we are testing? Do we assume they all certify? Are we defining that as the last line of the response being "The compilation was successfully certified." - or do we actually set the exit code properly in the executable when certification fails? |
How are they randomly generated? They're just UPLC programs, we can add the UPLC code as you have with the other tests you added here.
It doesn't make sense to have certifier integration tests which are expected to fail. That would mean that either the compiler or the certifier is broken in some way.
I think just checking the text is fine for now, we can improve that in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@effectfully this appears to have changed nothing in the CI :( perhaps you meant something different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to import it, like it's done in this file with os-support
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that fails fast at least :)
…C 8.10 because they depend on the Agda.
-- These run but the certifier says "no" | ||
rejectedTests :: [String] | ||
rejectedTests = | ||
[ | ||
|
||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't make sense to keep this. It's dead code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it isn't its called on line 104 😸
srcTests :: [ String ] | ||
srcTests = | ||
[ "inc" | ||
-- , len |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least add a comment describing why this test is not enabled.
|
||
srcTests :: [ String ] | ||
srcTests = | ||
[ "inc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add the other tests as well as I asked? That should be very quick. I would normally be ok with you adding these in a separate PR, but since you decided to combine the testing framework with changes to the decision procedures as well, I will have to insist that you show that the changes to the decision procedures do not break any of the tests we have been running manually until now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are in the makeExample list....? Since they are generated from the executable it made more sense to let it generate fresh versions each time and certify those. We could include hardcoded instances or each of them, but the fresh ones are maybe more interesting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that, sorry. Thanks for making the changes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except: why do these never trigger the bug that the short “len” example triggers? They are probably already optimised and so are just returning an ID change all the time…. If we can make them unoptimised then they will be more useful but will currently fail (possibly randomly?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember that the uplc
examples are unoptimised. If I printed the factorial
or whatever script, I remember seeing force ... delays
that could definitely be reduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the generated certificates, there does seem to be some changes happening, so these are meaningful tests - although they might be more meaningful if we used the plc examples and then fed those in so there was more work happening...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we were typing at the same time :) These are fine for now, although I could spend ages improving them!
I broke the other PR :( sorry. This one should be properly signed and mergable?