In linguatools' words:


Are you looking for parallel texts to train your statistical translation engines? Do you want to find domain-relevant terminology? Do you want to boost matches in your TMs?


We can offer you 10 million German-English parallel sentences:

  • parallel sentence pairs crawled from the internet
  • elaborate multi-step quality filtering, including language identification filter, machine translation filter, grammaticality filter etc.
  • no duplicate sentence pairs
  • no overlap with existing publicly available corpora like europarl, DGT-TM, etc. (see full list)
  • web pages have been categorized for subject area (see distribution of subject areas)
  • crawled between 10/2013 and 05/2015 – includes up-to-date terminology
  • available in TMX and Moses format


http://linguatools.org/tools/corpora/webcrawl-parallel-corpus-german-english-2015/ 


Thanks to Patrick Roye