Legacy DoMT engines use an open source component IRSTLM, which does not run on Windows. Therefore, to run them on Windows, you must generate a new engine using Slate Desktop. This article explains how to convert your legacy DoMT corpus to generate a new Slate Desktop engine. Another benefit is that the underlying open source components in Slate Desktop are faster.

This article includes the instructions you need to

  1. Download and install the Slate Desktop 1.1.8 update or newer
  2. copy the old training corpora from the DoMT workspace to Slate Desktop's workspace,
  3. rename and prepare the copied files,
  4. configure Slate Desktop's script to generate the engine with only the transferred data, and
  5. run Slate Desktop with the configured script.

We tested these procedures with legacy bitext BUILD sets from DoMT. If you run into problems, I will be available to help fix things.

1. copy old BUILD sets (training data files)

Copy these files from your legacy DoMT BUILDS workspace into your new Slate Desktop workspace. In most cases $LMBUILDNAME = $TMBUILDNAME.



2. rename and prepare the copied files

Rename these files into new BUILDS folders in your Slate Desktop workspace (by default, $WORKSPACE = C:\PTTools or /opt/PTTools).
  • The value of $ABS_PAIR is your alphabetically sorted language codes separated by a dash. If $SRC=fr_fr and $TGT=en_gb, $ABS_PAIR=en_gb-fr_fr. There's dash after the $ABS_PAIR value.
  • All values for $SRC and and $TGT should be lowercase.
  • The eval1.$SRC and eval1.TGT files are missing from Slate Desktop's workspace because SD uses a different evaluation system.
  • You should rename the build folders so that $NEW_LMBUILDNAME = $NEW_TMBUILDNAME.


You can use your favorite text tool to combine the eval1.$SRC and eval1.TGT files with your bitext.$SRC and bitext.$TGT files.
  • On Windows, you can use this Command Prompt command (note use the proper paths):
C:> copy /b $BUILDNAME\bitext_orig.$TGT + $BUILDNAME\eval1.$TGT $NEW_TMBUILDNAME\bitext.$TGT
C:> copy /b $BUILDNAME\bitext_orig.$SRC + $BUILDNAME\eval1.$SRC $NEW_TMBUILDNAME\bitext.$SRC
  •  On Linux, you can use this Terminal command:
~$ cat $BUILDNAME/bitext_orig.$TGT $BUILDNAME/eval1.$TGT > $NEW_TMBUILDNAME\bitext.$TGT
~$ cat $BUILDNAME/bitext_orig.$SRC $BUILDNAME/eval1.$SRC > $NEW_TMBUILDNAME\bitext.$SRC


3. configure Slate Desktop's script to generate the engine from only the transferred data

Edit the Slate Desktop script file, C:\PTTools\Users\scripts\quick-create-engine.cfgscript, and save. If the lmgrams or tmgrams values in your original engine were different, set them to match our original values here.




4. run Slate Desktop with the configured script

On Windows, open a Command Prompt and type the command:
C:> slate-cli -t quick-create-engine
On Linux, open a Terminal and type the command:
~$ slate-cli -t quick-create-engine
Sit back and wait. The terminal display a series of progress bars similar to DoMT.

When finished, the engine name will be: $SRC-$TGT-$NEW_TMBUILDNAME

Final note: We don't have an easy way to add new TMs to this training corpus. If you want to do that, your best bet will be to copy the files from your old /opt/domy/CORPORA/sa/* folder tree to the new $WORKSPACE/CORPORA/sa/*. This adds them to your inventory. Then, from Slate Desktop UI, add them and other TMs to your new engine. Slate Desktop's new preparation tools will treat them like all other TMs added to the inventory.