Facebook AI Model Translates Between 100 Languages Without English Data

Facebook AI this week announced they are open sourcing a deep learning model called M2M-100 that can translate any language pair, among 100 languages, without relying on English data. For example, when translating from Chinese to French, previous models would train on Chinese to English to French. M2M-100 directly trains on Chinese to French to better preserve meaning.

“Deploying M2M-100 will improve the quality of translations for billions of people, especially those who speak low-resource languages,” the Facebook researchers stated in a blog post.

The model was trained with 7.5 billion sentences from 100 languages and is made up of 15 billion parameters. The model is also trained on a total of 2,200 language directions, 10x higher than previous models.

Training was performed on a cluster of NVIDIA V100 GPUs with PyTorch. The model is also the first to use Fairscale, a PyTorch extension that supports pipeline and tensor parallelism.

We’re introducing M2M-100, the first multilingual machine translation model that translates between any pair of 100 languages without relying on English data. We’ve open sourced the model, training, & evaluation set up. Learn more https://t.co/9nszUF5nTj #t9n #machinetranslation pic.twitter.com/57kqbParp1
— Facebook AI (@facebookai) October 19, 2020

“We built this general infrastructure to accommodate large-scale models that don’t fit on a single GPU through model parallelism into Fairscale. We built on top of the ZeRO optimizer, intra-layer model parallelism, and pipeline model parallelism to train large-scale models,” the researchers stated.

The researchers have released the model, training, and evaluation setup to help others reproduce and advance multilingual models.

Facebook does not currently have plans to use the model in its products and is for research purposes only.

“We’ll continue to improve our model by incorporating such cutting-edge research, exploring ways to deploy MT systems responsibly, and creating the more specialized computation architectures necessary to bring this to production,” said Angela Fan, the lead researcher on the project.