Workshop at NAACL 2021, Mexico City, June 10, 2021at ACL 2020, Seattle, July 10, 2020
Contact:
autosimtrans.workshop@gmail.com or
twitter.com/autosimtrans
Please register your team through this AI Studio Platform.
Time | Schedule |
---|---|
2020/12/28 00:00:00 | Registration and Release of data |
2021/01/31 23:59:59 | End of registration |
2021/02/20 00:00:00 | System Submission |
2021/03/07 23:59:59 | System submission close |
2021/03/15 00:00:00 | System Description Due |
2021/04/15 00:00:00 | Notification of Acceptance |
2021/04/26 00:00:00 | Camera-ready Papers Due |
All submission deadlines are 11:59 PM GMT-12 (anywhere in the world) unless otherwise noted.
This competition is opened to the whole society, and has no restriction on age, identity and nationality. Individuals, institutions of higher learning, research institutions, enterprises and start-up teams in related fields can register for this competition. Those who have access to the task and data in advance cannot participate in the competition. Other employees can participate in the ranking of the competition but cannot receive any award.
Individual or team participation is supported. The maximum number of participants in each team is 5. Cross-unit team is allowed, but each person can only join one team.
There are 3 tracks in this competition, and each track will generate one first, one second and one third prize as follows:
Award | Quantity | Bonus |
---|---|---|
First prize | 1 | $1,000 |
Second prize | 1 | $800 |
Third prize | 1 | $500 |
Fair competition: competitors are not allowed to copy others’ works, exchange answers or use more than one trumpet in the competition. If found guilty, the results will be cancelled and dealt with seriously.
Organization statement: the organizing committee reserves the right to adjust and modify the rules of the competition, the arrangement of the competition, the right to judge and deal with cheating in the competition, and the right to withdraw or refuse awards to the participating teams that affect the organization and fairness.
Baseline model: the baseline model is for the reference of competitors, and can be improved upon. Competitors cannot directly submit the prediction structure of the baseline model; If the submission structure is highly similar to the predicted results of the baseline model, the result will be cancelled.
Property: entries (including, but not limited to algorithm, model, etc.) the intellectual property rights owned by the competitors, the organizing committee reserves the right to the entries, work related, team information used for propaganda materials, publications, designated and authorized media release, the official web site to browse and download, exhibitions (including tour) activities such as projects, the priority right to cooperation organization unit.
For Chinese-to-English translation, we provide Baidu’s BSTC data set:
For English-to-Spanish translation, we provide the following data sets:
Our testset will not be released. You are required to upload your whole system and environment. Within 24 hours after you submitting your system, we’ll publish the evaluation results on the competition website.
At test time, the systems are expected to receive different file formats based on the system types. For text-to-text translation systems, the inputs are streaming source text of gold transcripts and the outputs are corresponding simultaneous translation results. For speech-to-text translation systems, the inputs are speech audio files and the outputs are corresponding simultaneous translation results.
Our challenge includes 3 tasks on Chinese-to-English translation (Zh->En) and English-to-Spanish translation (En->Es). Participants can choose to join one or more tasks.
There are two types of inputs involved:
An example of the three types of input is illustrated in Table 1. We process input data into streaming format to evaluate the system delay (refer to Evaluation).
Streaming Transcript | Audio |
---|---|
大 | |
大家 | |
大家好 | |
大家好! | |
欢 | |
欢迎 | |
欢迎大 | |
欢迎大家 | |
欢迎大家关 | |
欢迎大家关注 | |
欢迎大家关注UNIT | |
欢迎大家关注UNIT对 | |
欢迎大家关注UNIT对话 | |
欢迎大家关注UNIT对话系 | |
欢迎大家关注UNIT对话系统 | |
欢迎大家关注UNIT对话系统的 | |
欢迎大家关注UNIT对话系统的高 | |
欢迎大家关注UNIT对话系统的高级 | |
欢迎大家关注UNIT对话系统的高级课 | |
欢迎大家关注UNIT对话系统的高级课程 | |
欢迎大家关注UNIT对话系统的高级课程。 |
For all the three tasks, you output only one text file containing source sentences and translations. The Table 2 is an example with streaming transcript input. Your system needs to decide when to translate given the input, and to write the translation after corresponding source. Your translations will be concatenated with SPACEs to evaluate BLEU.
Note that:
Streaming Transcript | Translation |
---|---|
大家 | |
大家好 | |
大家好! | Hello everyone! |
欢 | |
欢迎 | |
欢迎大 | |
欢迎大家 | Welcome |
欢迎大家关 | |
欢迎大家关注 | to |
欢迎大家关注UNIT | |
欢迎大家关注UNIT对 | unit |
欢迎大家关注UNIT对话 | |
欢迎大家关注UNIT对话系 | |
欢迎大家关注UNIT对话系统 | |
欢迎大家关注UNIT对话系统的 | dialog system |
欢迎大家关注UNIT对话系统的高 | |
欢迎大家关注UNIT对话系统的高级 | |
欢迎大家关注UNIT对话系统的高级课 | |
欢迎大家关注UNIT对话系统的高级课程 | |
欢迎大家关注UNIT对话系统的高级课程。 | advanced courses. |
You have two ways to submit your system:
There are some requirements for your uploaded system:
script run.sh: the interface to execute your translate program.
For text-to-text tasks, use
sh run.sh < streaming_asr.txt > output_asr/source_translation.txt
or
sh run.sh < streaming_transcription.txt > output_transcript/source_translation.txt
;
For audio-to-text task, use
sh run.sh < audioid.wav > output_audio/source_translation.txt
output_transcript
or output_audio
, depends on the task you involved in. You can also prepare multiple output directories of them, if you participate in multiple challenge tasks.output_xxx/dev_translation.txt
. This is very helpful for us to eliminate the execution problems of your system.Unless coming across system execution error, each participant has only one chance to submit on each task.
Scientific and system description papers will be submitted through this Link by Monday, March 15, 2021, 11:59pm [UTC-12h]. Paper should be formatted according to the NACCL 2021 format guidelines and be of 4 to 8 pages of content plus additional pages of references. Accepted papers will be published on-line in the NACCL 2021 proceedings and will be presented at the conference either orally or as a poster.
There is only one chance for each team to submit this contest. Please submit your prediction carefully.
Following previous work, we evaluate simultaneous translation results based on BLEU and AL (average lagging). BLEU is the measurement for translation quality and AL measures system delays. The evaluation results of all teams will be plotted on BLEU-AL two-dimensional coordinates.
We use multieval to calculate BLEU.
We use python gen_rw.py < output_xxx/source_translation.txt > sample_rw.txt && python metricAL.py
to measure system delays.
Here is a baseline system for the Simultaneous Machine Translation based on PaddlePaddle and STACL:
For any questions regarding our shared task, please use our twitter page, github issues, or email to autosimtrans.workshop@gmail.com. We are here to answer your questions and looking forward to your submissions!