Skip to content

Commit

Permalink
Clarify color coding in section 2.2
Browse files Browse the repository at this point in the history
  • Loading branch information
Lukas Gebhard committed Aug 9, 2020
1 parent fbf3eea commit 9313215
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion content/post/project-polusa-dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ After having experimented with threshold \\(k\\), we set it to \\(k := 9\\). Fin

This way, we remove 5 % of articles from the base selection, mostly consisting of outdated versions that resulted from minor article revisions, e.g., word insertions or corrections of numbers.

As an example, here are two versions of an article. Our procedure correctly identifies the first one as a near duplicate of the second one. Differences are highlighted in red; skipped passages are identical.
As an example, here are two versions of an article. Our procedure correctly identifies the first one as a near duplicate of the second one. Passages that only occur in the respective document but not the other are highlighted in red; skipped passages are identical.

<table>
<tr>
Expand Down

0 comments on commit 9313215

Please sign in to comment.