The 1st Workshop on Automatic Simultaneous Translation

Challenges, Recent Advances, and Future Directions

Workshop at ACL 2020, Seattle, July 10, 2020
Contact: or

Conference Program

(Friday, July 10, or +1: Saturday, July 11)

Pacific Time
Eastern Time
Central European
Beijing Time
07:20-07:30 10:20-10:30 16:20-16:30 22:20-22:30 Opening Remarks
07:30-09:30 10:30-12:30 16:30-18:30 22:30-00:30 Session 1 (chair: Liang Huang)
07:30-08:00 10:30-11:00 16:30-17:00 22:30-23:00 Invited Talk 1: Colin Cherry [video]
08:00-08:30 11:00-11:30 17:00-17:30 23:00-23:30 Invited Talk 2: Barry Slaughter Olsen [video]
08:30-09:00 11:30-12:00 17:30-18:00 23:30-00:00 Invited Talk 3: Jordan Boyd-Graber [video]
09:00-09:30 12:00-12:30 18:00-18:30 00:00-00:30 +1Q&A
09:30-15:00 12:30-18:00 18:30-00:00 00:30-06:00 +1Break
15:00-16:10 18:00-19:10 00:00-01:10 +106:00-07:10 +1Session 2: Research Paper and System Description (chair: Zhongjun He)
15:00-15:10 18:00-18:10 00:00-00:10 +106:00-06:10 +1Dynamic Sentence Boundary Detection for Simultaneous Translation
Ruiqing Zhang and Chuanqiang Zhang
15:10-15:20 18:10-18:20 00:10-00:20 +106:10-06:20 +1End-to-End Speech Translation with Adversarial Training
Xuancai Li, Chen Kehai, Tiejun Zhao and Muyun Yang
15:20-15:30 18:20-18:30 00:20-00:30 +106:20-06:30 +1Robust Neural Machine Translation with ASR Errors
Haiyang Xue, Yang Feng, Shuhao Gu and Wei Chen
15:30-15:40 18:30-18:40 00:30-00:40 +106:30-06:40 +1Improving Autoregressive NMT with Non-Autoregressive Model
Long Zhou, Jiajun Zhang and Chengqing Zong
15:40-15:50 18:40-18:50 00:40-00:50 +106:40-06:50 +1Modeling Discourse Structure for Document-level Neural Machine Translation
Junxuan Chen, Xiang Li, Jiarui Zhang, Chulun Zhou, Jianwei Cui, Bin Wang and Jinsong Su
15:50-16:00 18:50-19:00 00:50-01:00 +106:50-07:00 +1BIT’s system for the AutoSimTrans 2020
Minqin Li, Haodong Cheng, Yuanjie Wang, Sijia Zhang, Liting Wu and Yuhang Guo
16:00-16:10 19:00-19:10 01:00-01:10 +107:00-07:10 +1Q&A
16:10-16:20 19:10-19:20 01:10-01:20 +107:10-07:20 +1Break
16:20-18:20 19:20-21:20 01:20-03:20 +107:20-09:20 +1Session 3 (chair: Colin Cherry)
16:20-16:50 19:20-19:50 01:20-01:50 +107:20-07:50 +1Invited Talk 4: Hua Wu [video]
16:50-17:20 19:50-20:20 01:50-02:20 +107:50-08:20 +1Invited Talk 5: Kay-Fan Cheung [video]
17:20-17:50 20:20-20:50 02:20-02:50 +108:20-08:50 +1Invited Talk 6: Qun Liu [video]
17:50-18:20 20:50-21:20 02:50-03:20 +108:50-09:20 +1Q&A
18:20-18:30 21:20-21:30 03:20-03:30 +109:20-09:30 +1Closing Remarks

Invited Talk 1 by Colin Cherry [video]

Title: Research stories from Google Translate’s Transcribe Mode
Abstract: Google Translate recently launched a Transcribe Mode feature for simultaneous translation of long-form speech. This required a lot of interesting work from many teams, but I’ll use this presentation to describe some of the more research-oriented subprojects. This will include the work that validated and improved upon the use of re-translation for simultaneous translation, as well as some work on adapting latency metrics to the long-form transcription scenario. Through these stories, I’ll try to offer some perspective on how research and production impact one another when a launch is looming.

Colin Cherry is a Research Scientist at Google Montreal, working with Translate. Previously, he was a Senior Research Officer at Canada’s National Research Council. His primary research area is machine translation, but he has also been known to venture into parsing, morphology and information extraction. He is currently chair of the executive board of the North American Association for Computational Linguistics (NAACL), an action editor at the Transactions of the Association for Computational Linguistics (TACL), and recently served as research track chair for the meeting of the Association for Machine Translation in the Americas (AMTA 2018).

Invited Talk 2 by Barry Slaughter Olsen [video]

Title: Human Interpreter Training and Practice: Insights for Simultaneous Machine Translation Research
Abstract: Interpreter training and machine translation research are two radically different worlds. Neither understands the other well. Even so, knowing the basic techniques employed by trained simultaneous interpreters to practice their craft can help researchers better comprehend the task of simultaneous machine translation, determine new approaches to that task , and have a clearer understanding of what the potential of the technology may be. In his address, Professor Olsen will provide an overview of the skills and techniques taught in a simultaneous interpreter training program and suggest possible parallels and limitations in their application to simultaneous machine translation.

Barry Slaughter Olsen is a veteran conference interpreter and technophile with over twenty-five years of experience interpreting, training interpreters and organizing language services. He is a professor at the Middlebury Institute of International Studies at Monterey (MIIS) and the Vice-President of Client Success at KUDO, a multilingual web conferencing platform. He was co-president of InterpretAmerica from 2009 to 2020. He is a member of the International Association of Conference Interpreters (AIIC). Barry has been interviewed frequently by international media (CNN,CBC, MSNBC, NPR and PBS) about interpreting and translation. For updates on interpreting, technology and training , follow him on Twitter @ProfessorOlsen.

Invited Talk 3 by Jordan Boyd-Graber [video]

Title: Evaluating Human-Computer Simultaneous Interpretation
Abstract: Human simultaneous interpretation is an amazing feat requiring skill and extensive training. Computers are simply nowhere close to expert interpreters---but perhaps they can help humans do a task with unique cognitive burdens more effectively. In this talk, I discuss previous work on computer assistance for human simultaneous interpreters and how it reveals the differences between humans' and computers' comparative skills. To focus on where computers can best help interpreters, we pilot an evaluation framework to prototype assistance for interpreters with proxy users. By breaking up interpretation into its constituent pieces, we can both test with a larger user population and pinpoint which assistance techniques are effective when.

Jordan Boyd-Graber is an associate professor in the University of Maryland’s Computer Science Department, iSchool, UMIACS, and Language Science Center. Jordan’s research focus is in applying machine learning and Bayesian probabilistic models to problems that help us better understand social interaction or the human cognitive process. He and his students have won “best of” awards at NIPS (2009, 2015), NAACL (2016), and CoNLL (2015), and Jordan won the British Computing Society’s 2015 Karen Spärk Jones Award and a 2017 NSF CAREER award.

Invited Talk 4 by Hua Wu [video]

Title: Baidu Simultaneous Translation: Research and Applications
Abstract: Simultaneous translation has been widely studied and used in recent years. In this talk, I will introduce the main challenges of simultaneous translation and our solutions. We proposed methods to get tradeoff between translation quality and latency, such as segmentation models to split ASR output into information units, the incremental TTS to reduce time latency. We also proposed end-to-end models that jointly learns ASR and speech-to-text translation. In order to facilitate research on simultaneous translation, we released BSTC, a Chinese-English simultaneous translation data set containing about 70 hours of Chinese speech audio, human transcripts, ASR results and English translations. In the last part of this talk, I will also introduce the applications of our simultaneous translation system, such as online meetings, lectures, and plugins for video translation.

Hua Wu is the Chief Scientist of Baidu NLP. Her research interests span a wide range of topics including machine translation, dialogue systems, knowledge graph, etc. She was a leading member of the machine translation project to win the second prize of the State Preeminent Science and Technology Award of China. She was the Program Co-Chair of ACL (the Association for Computational Linguistics) in 2014 and AACL in 2020 (Asia-Pacific Chapter of ACL).

Invited Talk 5 by Kay-Fan Cheung [video]

Title: Machine-aided simultaneous interpreting: An experiment
Abstract: The talk will report the results of an experiment investigating whether technology can improve the efficiency and quality of simultaneous interpreting (SI) by human interpreters. Unfamiliar accents are one factor that can negatively affect SI performance. The real-time transcription of accented speech by automatic speech recognition (ASR) technology may aid interpreters. However, SI performance may suffer because of the additional effort needed to read the ASR transcription while juggling the multiple sub-tasks of SI.
Twenty-four native Mandarin-speaking participants performed SI of a speech in English by a non-native speaker into Mandarin Chinese. Parts of the speech were subtitled by ASR technology while other parts were not. Raters scored the Mandarin SI renditions on two parameters: accuracy and fluency. Quantitative analysis of the scores indicated that the raters scored the subtitled parts higher for accuracy but lower for fluency than the un-subtitled parts. Qualitative analysis of post-test interviews with the participants suggested that correct ASR-generated subtitles can improve SI performance. However, having to read the subtitles, especially when incorrect, was perceived as a hindrance to the SI process.
The data suggest that correct subtitles generated by ASR technology may improve the performance of human interpreters. The SI curriculum should incorporate training on how to use subtitles generated by ASR technology.

Andrew K.F. Cheung is Associate Professor at the Department of Chinese and Bilingual Studies of the Hong Kong Polytechnic University. He completed his MA in Conference Interpreting and Translation at the Graduate Institute of Translation and Interpreting Studies of Fu-jen Catholic University and did his Ph.D. at the University of East Anglia. His research interests include cognitive aspects of multilingual and multimodal processing, corpus-based interpreting studies, quality perception of interpreting services and pedagogy of interpreting. He is also a member of AIIC.

Invited Talk 6 by Qun Liu [video]

Title: Research and Practice of Simultaneous Machine Translation in Huawei Noah's Ark Lab
Abstract: In this talk, I will introduce our research efforts and the development of Huawei simultaneous translation systems, both in cloud and in mobile phones. To get a good balance between translation quality and latency, we proposed a general framework for adapting neural machine translation to translate simultaneously. To enhance the robustness of the system with regard to the speech style input and ASR errors, we introduced various data augmentation techniques including GPT-based pretraining models for paraphrasing. We further conducted optimizations to improve the run-time performance on terminal devices. We finally obtained satisfactory performance on both platforms in the given scenarios.

Prof. Dr. Qun Liu is the Chief Scientist of Speech and Language Computing in Huawei Noah’s Ark Lab. He was a Full Professor in Dublin City University and the Theme Leader of the ADAPT Centre, Ireland during July 2012 and June 2018. Before that, he was as a Professor in the Institute of Computing Technology (ICT), Chinese Academy of Sciences for 20 years, where he founded and led the ICT NLP Research Group. He obtained his B.Sc., M.Sc. and Ph.D. in computer science in the University of Science and Technology of China, Chinese Academy of Sciences, and Peking University respectively. His research interests lie in the areas of Natural Language Processing and Machine Translation. His main academic contributions are on Chinese language processing, syntax-based statistical machine translation and neural methods for natural language processing. He has authored or co-authored more than 300 peer-reviewed research publications, which have been cited more than 7000 times. He has supervised more than 40 students to the completion of their M.Sc. or Ph.D. degrees.