-
Notifications
You must be signed in to change notification settings - Fork 3
/
README.txt
59 lines (42 loc) · 2.71 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
HamleDT (HArmonized Multi-LanguagE Dependency Treebank) is a compilation of
existing dependency treebanks (or dependency conversions of other treebanks),
transformed so that they all conform to the same annotation style. For more
information please see the project website at
http://ufal.mff.cuni.cz/hamledt
This repository contains makefiles and support scripts needed for HamleDT
development. You also need Treex and Interset, which are in separate
repositories. In particular, the tree transformation and harmonization code
is part of Treex (implemented as Treex blocks), see the ufal/treex Github
repository.
History:
These files were originally stored in the TectoMT Subversion repository
(https://svn.ms.mff.cuni.cz/svn/tectomt_devel/trunk/treex/devel/hamledt).
Some important points in time:
r5974 (2011-06-27 zabokrtsky) ... created treex/devel/normalize_treebanks
r7684 (2011-12-31) .............. HamleDT 0.9 or 1.0 approximate date (not fixed and archived)
r8819 (2012-06-11 popel) ........ normalize_treebanks renamed to hamledt
r11004 (2013-08-28 rosa) ......... hamledt copied to hamledt2
r11606 (2014-02-15 zeman) ........ HamleDT release 1.5 (Prague, article in LRE)
r11870 (2014-03-14 zeman) ........ removed old hamledt (after checking all languages for HamleDT release 2.0)
r11991 (2014-03-23 zeman) ........ hamledt2 renamed to hamledt
r12700 (2014-05-24 zeman) ........ HamleDT release 2.0 (Prague + Stanford)
r14841 (2015-04-23 zeman) ........ pruned large generatable files, hamledt with history copied to Github ufal/hamledt
r14847 (2015-04-24 zeman) ........ hamledt removed from the Subversion repository
commit 19f47665fed00b9defe5119b557ca950384db0ba (2015-08-18 zeman) ..... HamleDT release 3.0 (UD)
See also
https://svn.ms.mff.cuni.cz/trac/tectomt_devel/ (password-protected access, only for ÚFAL members)
https://github.com/ufal/hamledt
Notes on migration to Github:
Created a users.txt file following the instructions in
http://git-scm.com/book/es/v2/Git-and-Other-Systems-Migrating-to-Git
git svn clone https://svn.ms.mff.cuni.cz/svn/tectomt_devel --authors-file=users.txt --no-metadata --trunk=trunk/treex/devel/hamledt --prefix=svn/
Tag statistics and similar files that were comparably large and that could be
generated again if necessary were removed from the repository. The history was
then pruned using the BFG repo-cleaner (https://rtyley.github.io/bfg-repo-cleaner/),
with the blob size limit set to 400K. Subsequently the git garbage collection
was invoked as recommended in the BFG documentation:
java -jar bfg-1.12.3.jar --private -b 400K hamledt
cd hamledt
git reflog expire --expire=now --all && git gc --prune=now --aggressive
git remote add origin https://github.com/ufal/hamledt.git
git push -u origin master