Workshop at ACL 2020, Seattle, July 10, 2020
Contact: firstname.lastname@example.org or twitter.com/autosimtrans
Please register your team through this Platform.
Our challenge includes 4 tasks on Chinese-to-English translation (Zh->En) and English-to-Spanish translation (En->Es). Participants can choose to join one or more tasks.
There are three types of inputs involved:
An example of the three types of input is illustrated in Table 1. We process input data into streaming format to evaluate the system delay (refer to Evaluation).
|Streaming Transcript||Streaming ASR||Audio|
You need to train a baseline MT model using the text parallel corpus specified in the table below (
CWMT19 for Zh->En and
UN for En->Es, respectively), because the amount of speech translation data we provide is insufficient to support the training of a large translation model.
|En->Es||UN Parallel Corpus|
For Zh->En translation, our training set contains about 70 hours of Chinese speech audio, human transcripts, ASR results and English translations. To evaluate your system, we provide 16 talks with their corresponding streaming ASR and streaming transcripts as the development set.
For En->Es translation, we don’t provide additional speech translation dataset. You are required to use the UN dataset only to train your MT model. To evaluate your system, we will provide the streaming transcripts as the development set.
As shown in Table 2, we would provide 7 parts of speech translation data, among which the highlighted 5 will be sent to participants by email.
Our testset will not be released. You are required to upload your whole system and environment. Within 24 hours after you submitting your system, we’ll publish the evaluation results on the competition website.
At test time, the systems are expected to receive different file formats based on the system types. For text-to-text translation systems, the inputs are streaming source text (including gold transcripts and ASR results) and the outputs are corresponding simultaneous translation results. For speech-to-text translation systems, the inputs are speech audio files and the outputs are corresponding simultaneous translation results.
For all the four tasks, you output only one text file containing source sentences and translations. The Table 3 is an example with streaming ASR input. Your system needs to decide when to translate given the input, and to write the translation after corresponding source. Your translations will be concatenated with SPACEs to evaluate BLEU. Note that the left part (streaming source) should NOT be modified in streaming ASR and streaming transcription tasks.
For Zh-En translation with audio input, you also have to output this source-translation file, with the left part as your recognition source and right part as the corresponding translation.
You have two ways to submit your system:
There are some requirements for your uploaded system:
script run.sh: the interface to execute your translate program.
For text-to-text tasks, use
sh run.sh < streaming_asr.txt > output_asr/source_translation.txt or
sh run.sh < streaming_transcription.txt > output_transcript/source_translation.txt;
For audio-to-text task, use
sh run.sh < audioid.wav > output_audio/source_translation.txt
output_audio, depends on the task you involved in. You can also prepare multiple output directories of them, if you participate in multiple challenge tasks.
output_xxx/dev_translation.txt. This is very helpful for us to eliminate the execution problems of your system.
Unless coming across system execution error, each participant has only one chance to submit on each task.
Scientific and system description papers will be submitted through this Link by May 6, 2020 11.59 pm [UTC-12h]. Paper should be formatted according to the ACL 2020 format guidelines and be of 4 to 8 pages of content plus additional pages of references. Accepted papers will be published on-line in the ACL 2020 proceedings and will be presented at the conference either orally or as a poster.
Following previous work, we evaluate simultaneous translation results based on BLEU and AL (average lagging). BLEU is the measurement for translation quality and AL measures system delays. The evaluation results of all teams will be plotted on BLEU-AL two-dimensional coordinates.
We use multieval to calculate BLEU.
python gen_rw.py < output_xxx/source_translation.txt > sample_rw.txt && python metricAL.py to measure system delays.
Here is a baseline system for the Simultaneous Machine Translation based on PaddlePaddle 1.7 and STACL:
All submission deadlines are 11:59 PM GMT-12 (anywhere in the world) unless otherwise noted.
For any questions regarding our shared task, please use our twitter page, github issues, or email to email@example.com. We are here to answer your questions and looking forward to your submissions!