¿ªÐÄ¹í´«Ã½

Skip to main content

APPM Department Colloquium - Yanping Huang

Yanping Huang, Staff Software Engineer, Google Brain

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

Ìý

ÌýNeural network scaling has been critical for improving the model quality in many real-world machine learning applications with vast amounts of training data and computation. Although this trend of scaling is affirmed to be a sure-fire approach for better model quality, there are challenges on the path such as the computation cost, ease of programming, and efficient implementation on parallel devices. GShard is a module composed of a set of lightweight annotation APIs and an extension to the XLA compiler. It provides an elegant way to express a wide range of parallel computation patterns with minimal changes to the existing model code. GShard enabled us to scale up multilingual neural machine translation Transformer models with Sparsely-Gated Mixture-of-Experts beyond 600 billion parameters using automatic sharding. We demonstrate that such a giant model can efficiently be trained on 2048 TPU v3 accelerators in 4 days to achieve far superior quality for translation from 100 languages to English compared to the prior art.

Bio:ÌýYanping Huang is a software engineer at Google Brain. He received his PhD from University of ¿ªÐÄ¹í´«Ã½ working on reinforcement learning and computational neuroscience. His main research interests include neural architecture search and large scale machine learning infrastructure. His work is published in top-tier machine learning and computer vision conferences and journals including NeurIPS, ICLR, CVPR, AAAI, Neural Computation. He also served as a program committee member for Ìýseveral top machine learning workshops and conferences, including NeurIPS, KDD, AAAI, ICML, WWW, Neural Computation.