From 9313215deb054f07c31fd5b17563b9a6fb918310 Mon Sep 17 00:00:00 2001 From: Lukas Gebhard Date: Sun, 9 Aug 2020 13:29:14 +0200 Subject: [PATCH] Clarify color coding in section 2.2 --- content/post/project-polusa-dataset.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/post/project-polusa-dataset.md b/content/post/project-polusa-dataset.md index 5db890f..24fff2f 100644 --- a/content/post/project-polusa-dataset.md +++ b/content/post/project-polusa-dataset.md @@ -75,7 +75,7 @@ After having experimented with threshold \\(k\\), we set it to \\(k := 9\\). Fin This way, we remove 5 % of articles from the base selection, mostly consisting of outdated versions that resulted from minor article revisions, e.g., word insertions or corrections of numbers. -As an example, here are two versions of an article. Our procedure correctly identifies the first one as a near duplicate of the second one. Differences are highlighted in red; skipped passages are identical. +As an example, here are two versions of an article. Our procedure correctly identifies the first one as a near duplicate of the second one. Passages that only occur in the respective document but not the other are highlighted in red; skipped passages are identical.