Amazon Machine Images (AMIs)

Moses Statistical Machine Translation

Click for a printer friendly version of this document Printer Friendly Save to del.icio.us
 

This image includes everything you need to train a Moses statistical machine translation (SMT) system on a corpus of parallel text in two languages. Once an SMT system is trained, the Moses decoder can be used to translate from one language to the other. Running Moses on EC2 allows for quick experimentation with a range of data and parameter combinations to build the optimal SMT system for a translation project.

Submitted By: achimruo  
US East AMI ID: ami-c2987bab
AMI Manifest: dsr-images/moses.x64.20091006.manifest.xml
License: Public
Operating System: Linux/Unix
Europe AMI ID: ami-f6517a82

Digital Silk Road built the Moses on EC2™ image based on the Ubuntu 8.04 LTS image from Canonical with some of the best open-source software packages for statistical machine translation:
Component Version
Moses Machine Translation System moses-2009-04-13.tgz
GIZA++ Statistical Translation Model Toolkit v1.0.3
IRSTLM Language Modeling Toolkit v5.22.01
Miscellaneous tools for corpus preparation and evaluation
Amazon S3 Authentication Tool for curl

As the system can be used for a broad range of language combinations, parallel text corpora are not included in the machine image. Once the image is instantiated corpora can be copied to it via scp or from Amazon S3. The EC2 large instance type only has a 10GB root partition, which is usually not enough for corpora and trained machine translation systems. So storing the corpora and systems on one of the two available 420GB partitions is advisable.

Once an instance of the image is running it is also recommendable to update the Ubuntu packages to the latest version.

A walkthrough with step-by-step instructions on how to use Moses on EC2™ is available on http://www.digitalsilkroad.net/software.html.

Digital Silk Road provides support for the Moses on EC2™ image, ranging from email support to consulting on how to best customize and integrate Moses on EC2™ into a complete translation workflow.

Thanks to the Canonical team for providing Ubuntu on EC2 and to the Moses, GIZA++ and IRSTLM teams for providing great components for SMT!

Known Issues

  1. Sometimes a started Amazon EC2 instance refuses to accept the SSH key from the SSH client (e.g. Putty). In this case it helps to terminate the instance and start a new one. This is due to a bug in the Ubuntu AMIs from Canonical (see http://developer.amazonwebservices.com/connect/message.jspa?messageID=131088 and https://bugs.launchpad.net/ubuntu-on-ec2/+bug/308530) This will hopefully go away with a rebuild of Moses on EC2 with a fixed Canonical image.

Disclaimer and Limitation of Liability

You agree that use of this Amazon Elastic Compute Cloud Machine Image ("AMI") and all information, software, products and services on it are at your sole risk. This AMI is provided on an "as is," "with all faults" and "as available" basis. Digital Silk Road expressly disclaims all warranties of any kind, whether express or implied, including but not limited to the implied warranties of merchantability, fitness for a particular purpose and non-infringement. Digital Silk Road makes no warranty that the AMI or its services will meet your requirements, or that the AMI or its services will be uninterrupted, timely, secure, or error free; nor does Digital Silk Road make any warranty as to the results that may be obtained from the use of the AMI or as to the accuracy or reliability of any information obtained through the AMI or its services. You agree that any material and/or data downloaded or otherwise obtained through the use of the AMI and its services is done at your own discretion and risk and that you will be solely responsible for any damage to your computer system or loss of data that results from the download of such material and/or data.

Digital Silk Road will not be liable to you or any of your employees, agents, customers or any third parties for any damages arising from use of the AMI or its services, including without limitation: punitive, exemplary, incidental, treble, special or consequential damages; loss of privacy or security damages; personal injury or property damages; copyright, trademark, patent, trade secret or other intellectual property damages; or any damages whatsoever resulting from interruption or failure of service, lost profits, loss of business, loss of data, loss due to unauthorized access or due to viruses or other harmful components, cost of replacement products and services, suspension, termination, or the inability to use the AMI or its services, the content of any data transmission, communication or message transmitted or received, or losses resulting from any messages received or transaction entered into through the AMI or its services.

You acknowledge that you bear sole responsibility for adequate security, protection and backup of your data, content and applications. Digital Silk Road strongly encourages you, where available and appropriate, to (a) use encryption technology to protect your data or content from unauthorized access, (b) routinely archive your data or content, and (c) keep your applications or any software that you use or run with the AMI current with the latest security patches or updates. Digital Silk Road will have no liability to you for any unauthorized access or use, corruption, deletion, destruction or loss of any of your data, content or applications.


Discussion
Click to start a discussion on this document Create a New Discussion
No discussion has been created for this document.

Welcome, Guest Help
Login Login