Workshop at NAACL 2022, Seattle, Friday July 15, 2022
Contact:
autosimtrans.workshop@gmail.com or
twitter.com/autosimtrans
conda_specfile.txt
and pip_requirement.txt
will be used in Dockerfile
conda activate <myenv> # activate your virtual env
conda list --explicit > conda_specfile.txt # export packages installed via conda
pip freeze > pip_requirement.txt # export packages installed via pip
cd
to your working dir, download this Dockerfile(named as Dockerfile_pt13.gpu
)
wget https://raw.githubusercontent.com/autosimtrans/autosimtrans.github.io/master/sp/Dockerfile_pt13.gpu
<code folder>
. Open the Dockerfile Dockerfile_pt13.gpu
, and replace OpenNMT-py
with the <code folder>
of your own code and model (the reason of copying code+model is that CP
in dockerfile only supports relative path). Update your own CUDA/CuDnn version in Dockerfile if necessary.docker build -t <ImageName:Tag> -f Dockerfile_pt13.gpu .
mkdir decode && chmod 777 decode
nvidia-docker run -it -v <abs_path_to_your_data_dir>:/data -v <abs_path_to_your_local_decode_result_dir>:/decode <ImageName:Tag> bash
python translate.py -model multi30k_model_step_100000.pt -src /data/wmt16-multi30k/test2016.en.atok -tgt /data/wmt16-multi30k/test2016.de.atok -replace_unk -verbose -output /decode/multi30k.test.pred.atok
<abs_path_to_your_local_decode_result_dir>
(local) or /decode
(in docker container), test your BLEU:
# in docker conatiner
perl tools/multi-bleu.perl /data/wmt16-multi30k/test2016.de.atok < /decode/multi30k.test.pred.atok
# result:
# BLEU = 35.50, 65.9/41.8/28.8/20.0 (BP=1.000, ratio=1.007, hyp_len=12323, ref_len=12242)
tar
file:
docker tag <local-image:loacl-tagname> <dockerhub_username/new-repo:repo-tagname>
docker login
docker push <dockerhub_username/new-repo:repo-tagname>
tar
file, then you can send it to anyone:
# save image to `tar` file
docker save -o <imageFile>.tar <local-image:loacl-tagname>
# load `tar` file to image
docker load < <imageFile>.tar
cd
to your working dir, and download data (for the case of blind test)
wget https://github.com/autosimtrans/autosimtrans.github.io/raw/master/sp/wmtdata.tar.gz
tar -zxvf wmtdata.tar.gz
mkdir decode && chmod 777 decode
nvidia-docker
is used as GPU version)
sudo nvidia-docker run -it -v <abs_path_to_your_data_dir>:/data -v <abs_path_to_your_local_decode_result_dir>:/decode kaiboliu/onmt-py_en2de:torch1.3-gpu bash
python translate.py -model multi30k_model_step_100000.pt -src /data/wmt16-multi30k/test2016.en.atok -replace_unk -verbose -output /decode/multi30k.test.pred.atok
<abs path to your local decode result dir>
(local) or /decode
(in docker container), test your BLEU:
# in docker conatiner
perl tools/multi-bleu.perl /data/wmt16-multi30k/test2016.de.atok < /decode/multi30k.test.pred.atok
# result:
# BLEU = 35.50, 65.9/41.8/28.8/20.0 (BP=1.000, ratio=1.007, hyp_len=12323, ref_len=12242)
docker cp <local_data_path>:[containerID]:<docker_data_path>
We will dive head-first into training a transformer model from scratch using a TensorFlow GPU Docker image.
Using Docker allows us to spin up a fully contained environment for our training needs. We always recommend using Docker, as it allows ultimate flexibility (and forgiveness) in our training environment. To begin we will open a terminal window and enter the following command to launch our NVIDIA CUDA powered container.
nvidia-docker run -it -p 6007:6006 -v /data:/datasets tensorflow/tensorflow:1.15.0-gpu-py3 bash
Note: A quick description about the key parameters of the above command (if you’re unfamiliar with Docker).
Docker Syntax | Description |
---|---|
nvidia-docker run | The ‘docker run’ command specifies the container from. Note in this case we use ‘nvidia-docker run’ to utilize CUDA powered NVIDIA GPUs. |
-p 6007:6006 | Expose port 6007 [HOST:CONTAINER] for Tensorboard, type localhost:6007 in your browser to view Tensorboard |
-v data:/datasets | Volume tag, This shares the folder ‘/data’ on the host machine to the /datasets folder in the container. Also explained as: -v /[host folder]:/[container folder] |
tensorflow/tensorflow:nightly-gpu | This the docker image that will run. The Docker Hub format is [PUBLISHER]/[IMAGE REPO]:[IMAGE TAG] |
You can also use the framework combination named Deepo
This may be necessary if you are running a fresh docker container.
apt-get install git
In case you do not have the latest up-to-date codebase for the models, the transformer network model is included here and the devs tend to update it quite frequently.
mkdir /tf-1.5; cd /tf-1.5
git clone https://github.com/tensorflow/models.git
As a necessary step,this will install the python package requirements for training TensorFlow models.
# cd to the models dir
pip install --user -r official/requirements.txt
Export PYTHONPATH to the folder where the models folder are located on your machine. The command below references where the models are located on our system. Be sure to replace the /transformer/models
syntax with the data path to the folder where you stored/downloaded your models.
export PYTHONPATH="$PYTHONPATH:/tf-1.5/models"
The data_download.py
command will download and preprocess the training and evaluation WMT datasets. Upon download and extraction, the training data is used to generate for what we will use as VOCAB_FILE
variables. Effectively, the eval and training strings are tokenized, and the results are processed and saved as TFRecords.
NOTE: (per the official requirements): 1.75GB of compressed data will be downloaded. In total, the raw files (compressed, extracted, and combined files) take up 8.4GB of disk space. The resulting TFRecord and vocabulary files are 722MB. The script takes around 40 minutes to run, with the bulk of the time spent downloading and ~15 minutes spent on preprocessing.
cd /tf-1.5/models/official/transformer
python data_download.py --data_dir=/datasets/transformer
Note: you can skip step 7-9 if you already finished training your model, just test it in the docker from Step 10
This specifies what model to train. big
or base
PARAM_SET=base
This variable should be set to where the training data is located.
DATA_DIR=/datasets/transformer
This variable specifies the model location based on what model is specified in the PARAM_SET
variable
MODEL_DIR=/transformer/model_$PARAM_SET
This variable expresses where the location of the preprocessed vocab files are located.
VOCAB_FILE=$DATA_DIR/vocab.ende.32768
This will specify the location when/where you export the model in Tensorflow SavedModel format. This is done when using the flag export_dir
when training in step 8.
EXPORT_DIR=/transformer/saved_model
PARAM_SET=base
DATA_DIR=/datasets/transformer
MODEL_DIR=/transformer/model_$PARAM_SET
VOCAB_FILE=$DATA_DIR/vocab.ende.32768
EXPORT_DIR=/transformer/saved_model
The following command ‘python transformer_main.py’ will train the transformer for a total of 260,000 steps. See how the flags are set up to reference the variables you set in the previous steps. You can train for less than 260,000 steps, it’s up to you.
NOTE: This will take a long time to train depending on your GPU resources. The official TensorFlow transformer model is under constant development, be sure to check periodically on their github for any latest optimizations and techniques to reduce training times.
python transformer_main.py --data_dir=$DATA_DIR --model_dir=$MODEL_DIR --vocab_file=$VOCAB_FILE --param_set=$PARAM_SET --bleu_source=$DATA_DIR/newstest2014.en --bleu_ref=$DATA_DIR/newstest2014.de --train_steps=260000 --steps_between_evals=1000 --export_dir=$EXPORT_DIR
As we noted earlier, we can check the status of training in the Tensorboard GUI. To check in real time, run the following command in a separate terminal (or TensorFlow container), and type localhost:6007 in your browser to view Tensorboard. You can also wait until training is complete to use the current container.
tensorboard --logdir=$MODEL_DIR
Now we’ve trained our transformer network, let’s enjoy the fruits of our labor using translate.py! In the command below, replace the text “hello world” with desired text to translate
python translate.py --model_dir=$MODEL_DIR --vocab_file=$VOCAB_FILE \
--param_set=$PARAM_SET --text="hello world"
docker ps -l
docker commit [containerID] [UserID]/[Repo]
docker images
docker push [UserID]/[Repo]