Home

What is DIRT?

DIRT (Dynamic Identification of Reused Text) aims to allow users (primarily academics) to find passages that are shared by pairs of documents within a corpus. It will allow them to view pairs of documents and their common passages, as well as show which documents within a corpus have common passages with one particular document within the same corpus, known as the focus document.

DIRT also aims to be extensible to support other languages, although ancient Chinese will be the focus for the prototype. DIRT should be able to find matches in a UTF-8 encoded corpus in any language, with a language specific module improving the permissiveness of matching.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

What is DIRT?

Table of Content

Clone this wiki locally