How to debug a flaky test failure? #11931
Replies: 1 comment
-
After two days of debugging, now I figured out the root cause and proposed a fix #11960. What's learned is that when there is non-determinism, concurrency in the code should be always be examined carefully. Of course, the first step is always to reliably reproduce the problem. Luckily, the following changes is enough to reproduce the problem reliably (within 1 minute) with diff --git a/compiler/test/dotty/tools/dotc/IdempotencyTests.scala b/compiler/test/dotty/tools/dotc/IdempotencyTests.scala
index 30dbb47475..187d382666 100644
--- a/compiler/test/dotty/tools/dotc/IdempotencyTests.scala
+++ b/compiler/test/dotty/tools/dotc/IdempotencyTests.scala
@@ -24,8 +24,8 @@ class IdempotencyTests {
val opt = defaultOptions
val posIdempotency = aggregateTests(
- compileFilesInDir("tests/pos", opt)(TestGroup("idempotency/posIdempotency1")),
- compileFilesInDir("tests/pos", opt)(TestGroup("idempotency/posIdempotency2")),
+ compileFile("tests/pos/i6507b.scala", opt)(TestGroup("idempotency/posIdempotency1")),
+ compileFile("tests/pos/i6507b.scala", opt)(TestGroup("idempotency/posIdempotency2")),
)
val orderIdempotency = {
@@ -55,9 +55,12 @@ class IdempotencyTests {
val allTests = aggregateTests(orderIdempotency, posIdempotency)
- val tests = allTests.keepOutput.checkCompile()
- allChecks.checkRuns()
- tests.delete()
+ while (true) do {
+ val tests = allTests.keepOutput.checkCompile()
+ allChecks.keepOutput.checkRuns()
+ }
+ // tests.delete()
+
}
} We can use the following script to compare why it failed:
The files can be found here, the diff is the following: --- i6057b-1.txt 2021-03-30 16:49:13.000000000 +0200
+++ i6057b-2.txt 2021-03-30 16:49:19.000000000 +0200
@@ -256,7 +256,7 @@
406: STRINGconst 56 [tests/pos/i6507b.scala]
408:
- 292 position bytes:
+ 286 position bytes:
lines: 11
line sizes: 26, 0, 13, 79, 36, 56, 3, 0, 28, 1, 0
positions:
@@ -329,14 +329,14 @@
319: 197 .. 215
321: 193 .. 193
327: 183 .. 196
- 330: 200 .. 215
+ 330: 183 .. 183
333: 183 .. 196
337: 200 .. 215
341: 197 .. 215
344: 197 .. 215
346: 193 .. 193
352: 183 .. 196
- 355: 200 .. 215
+ 355: 183 .. 183
358: 183 .. 196
362: 200 .. 215
366: 197 .. 215 The next step is to understand which tree gets the wrong position by reading the print closely. In the output, they are all related to the type of
Compile the file with
We will get output like the following: val x$1#8163: (96 : Int#40)#768 = 96#798#799
{
val x$1#8172: (97 : Int#40)#860 = 97#879#880
(
{
val x$1#8180: (98 : Int#40)#941 = 98#960#961
(
EmptyTuple#6744#61#998.*:#6759#1001[(98 : Int#40)#951,
Tuple#406#906
]#1002(x$1#8180#953)#1003
:Any#444 *: Tuple#406#696)#1017
}#1018#1019.*:#6759#1022[(97 : Int#40)#870, Tuple#406#825]#1023(
x$1#8172#872
)#1024
:Any#444 *: Tuple#406#696)#1025
}#1026#1027.*:#6759#1030[(96 : Int#40)#778, Tuple#406#733]#1031(
x$1#8163#780
)#1032:Any#444 *: Tuple#406#696#1033 The tree with id --- a/compiler/src/dotty/tools/dotc/ast/Trees.scala
+++ b/compiler/src/dotty/tools/dotc/ast/Trees.scala
@@ -128,6 +128,10 @@ object Trees {
(if (myTpe == null ||
(myTpe.asInstanceOf[AnyRef] eq tpe.asInstanceOf[AnyRef])) this
else cloneIn(source)).asInstanceOf[Tree[Type]]
+ if tree.uniqueId == 941 then
+ Thread.dumpStack()
+ println("span = " + span)
+ println("tpe = " + tpe)
+ println("myTpe = " + myTpe)
+ println("this.uniqueId = " + this.uniqueId)
tree overwriteType tpe
tree.asInstanceOf[ThisTree[Type]]
}
--- a/compiler/src/dotty/tools/dotc/ast/Positioned.scala
+++ b/compiler/src/dotty/tools/dotc/ast/Positioned.scala
@@ -44,6 +44,9 @@ abstract class Positioned(implicit @constructorOnly src: SourceFile) extends Src
def span: Span = mySpan
def span_=(span: Span): Unit =
+ if uniqueId == 941 then
+ println("setting position from")
+ Thread.dumpStack() By tracking where tree 941 comes from, we will notice that the span of the tree is pretty stable during But the phase val pickledF = inContext(ctx.fresh) {
Future { ... }
def force(): Array[Byte] =
val result = Await.result(pickledF, Duration.Inf)
positionWarnings.foreach(report.warning(_))
result
if !Pickler.ParallelPickling || ctx.settings.YtestPickler.value then force()
unit.pickled += (cls -> force) The code enables parallel pickling by default without waiting for pickling to finish. It means, if the future runs a little slowly, it may observe the mutation in the |
Beta Was this translation helpful? Give feedback.
-
We have a flaky test failure on Windows:
#11885
I'm wondering what are the good tricks/practices for debugging such flaky failures?
(In this case, we ignore flaky failures related to concurrency, as it seems it's not related to concurrency).
Beta Was this translation helpful? Give feedback.
All reactions