forked from joshua-decoder/joshua
-
Notifications
You must be signed in to change notification settings - Fork 1
Joshua Statistical Machine Translation Toolkit
License
Unknown, LGPL-2.1 licenses found
Licenses found
Unknown
LICENSE
LGPL-2.1
COPYING
afader/joshua
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Running the Joshua Decoder: --------------------------- If you wish to run the complete machine translation pipeline, Joshua includes a black-box implementation that enables the entire pipeline to be run by typing a single restartable command. See the documentation for a walkthrough and more information about the many options available to the pipeline. - web: http://joshua-decoder.org/5.0/pipeline.html - local mirror: ./joshua-decoder.org/5.0/pipeline.html Manually Running the Joshua Decoder: ------------------------------------ To run the decoder, first set these environment variables: export JAVA_HOME=/path/to/java # maybe /usr/java/home export JOSHUA=/path/to/joshua You might also find it helpful to set these: export LC_ALL=en_US.UTF-8 export LANG=en_US.UTF-8 Then, compile Joshua by typing: cd $JOSHUA ant The basic method for invoking the decoder looks like this: cat SOURCE | JOSHUA -c CONFIG > OUTPUT You can test this using the sample configuration files and inputs can be found in the example/ directory. For example, type: cat examples/example/test.in | $JOSHUA/bin/decoder -c examples/example/joshua.config The decoder output will load the language model and translation models defined in the configuration file, and will then decode the five sentences in the example file. There are a variety of command line options that you can feed to Joshua. For example, you can enable multithreaded decoding with the -threads N flag: cat examples/example/test.in | $JOSHUA/bin/decoder -c examples/example/joshua.config -threads 5 The configuration file defines many additional parameters, all of which can be overridden on the command line by using the format -PARAMETER value. For example, to output the top 10 hypotheses instead of just the top 1 specified in the configuration file, use -top-n N: cat examples/example/test.in | $JOSHUA/bin/decoder -c examples/example/joshua.config -top-n 10 Parameters, whether in the configuration file or on the command line, are converted to a canonical internal representation that ignores hyphens, underscores, and case. So, for example, the following parameters are all equivalent: {top-n, topN, top_n, TOP_N, t-o-p-N} {poplimit, pop-limit, pop-limit, popLimit} and so on. For an example of parameters, see the Joshua configuration file template in $JOSHUA/scripts/training/templates/tune/joshua.config or the online documentation at joshua-decoder.org/4.0/decoder.html. There is a wealth of information in the online documentation. After you have successfully run the decoding example above, we recommend that you take a look at the Joshua pipeline script, which allows you to do full end-to-end training of a translation model. It is stored in $JOSHUA/examples
About
Joshua Statistical Machine Translation Toolkit
Resources
License
Unknown, LGPL-2.1 licenses found
Licenses found
Unknown
LICENSE
LGPL-2.1
COPYING
Stars
Watchers
Forks
Packages 0
No packages published
Languages
- Java 48.5%
- C++ 35.6%
- Shell 5.9%
- C 4.6%
- Perl 4.5%
- Python 0.9%