无标题文档

热点文献带您关注AI机器翻译 ——图书馆前沿文献专题推荐服务(11)

2020-05-17

 

 
       在上一期人工智能自然语言处理技术的推荐中,我们为您推荐了自然语言处理的生成式任务之一——自动文本摘要生成。在本期推荐中,我们将继续介绍自然语言处理的另一类生成式任务——机器翻译。
       神经机器翻译(NMT)借助长短时记忆、注意力机制等方法解决了捕获长距离依赖的问题。神经机器翻译技术的快速发展使机器翻译译文的质量得到不断提升,从而赋予了沟通交流全新的展现形式,使机器翻译在包括可穿戴技术、搜索、聊天机器人等应用领域中大有可为。
       本期选取了4篇文献,介绍机器翻译的最新动态,包括用于神经序列生成的同步双向编码、基于RNN的异步双向解码、将NMT网络与运动图相结合的手语生成模型、基于深度注意力模型的机器翻译等,推送给相关领域的科研人员。
 
Synchronous bidirectional inference for neural sequence generation
Zhang, Jiajun, etc.
ARTIFICIAL INTELLIGENCE, 2020, 281
In sequence to sequence generation tasks (e.g. machine translation and abstractive summarization), inference is generally performed in a le-to-right manner to produce the result token by token. The neural approaches, such as LSTM and self-attention networks, are now able to make full use of all the predicted history hypotheses from le side during inference, but cannot meanwhile access any future (right side) information and usually generate unbalanced outputs (e.g. le parts are much more accurate than right ones in Chinese-English translation). In this work, we propose a synchronous bidirectional inference model to generate outputs using both le-to-right and right-to-le decoding simultaneously and interactively. First, we introduce a novel beam search algorithm that facilitates synchronous bidirectional decoding. Then, we present the core approach which enables le-to-right and rightto-le decoding to interact with each other, so as to utilize both the history and future predictions simultaneously during inference. We apply the proposed model to both LSTM and self-attention networks. Furthermore, we propose a novel fine-tuning based parameter optimization algorithm in addition to the simple two-pass strategy. The extensive experiments on machine translation and abstractive summarization demonstrate that our synchronous bidirectional inference model can achieve remarkable improvements over the strong baselines.
 
Exploiting reverse target-side contexts for neural machine translation via asynchronous bidirectional decoding
Su, Jinsong, etc.
ARTIFICIAL INTELLIGENCE, 2019, 277
Based on a unified encoder-decoder framework with attentional mechanism, neural machine translation (NMT) models have attracted much attention and become the mainstream in the community of machine translation. Generally, the NMT decoders produce translation in a le-to-right way. As a result, only le-to-right target-side contexts from the generated translations are exploited, while the right-to-le target-side contexts are completely unexploited for translation. In this paper, we extend the conventional attentional encoder-decoder NMT framework by introducing a backward decoder, in order to explore asynchronous bidirectional decoding for NMT. In the first step aer encoding, our backward decoder learns to generate the target-side hidden states in a right-to-le manner. Next, in each timestep of translation prediction, our forward decoder concurrently considers both the source-side and the reverse target-side hidden states via two attention models. Compared with previous models, the innovation in this architecture enables our model to fully exploit contexts from both source side and target side, which improve translation quality altogether. We conducted experiments on NIST Chinese-English, WMT English-German and Finnish-English translation tasks to investigate the effectiveness of our model. Experimental results show that (1) our improved RNNbased NMT model achieves significant improvements over the conventional RNNSearch by 1.44/-3.02, 1.11/-1.01, and 1.23/-1.27 average BLEU and TER points, respectively; and (2) our enhanced Transformer outperforms the standard Transformer by 1.56/-1.49, 1.76/-2.49, and 1.29/-1.33 average BLEU and TER points, respectively.
 
Text2Sign: Towards Sign Language Production Using Neural Machine Translation and Generative Adversarial Networks
Stoll, Stephanie, etc.
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128(4) SI: 891-908
We present a novel approach to automatic Sign Language Production using recent developments in Neural Machine Translation (NMT), Generative Adversarial Networks, and motion generation. Our system is capable of producing sign videos from spoken language sentences. Contrary to current approaches that are dependent on heavily annotated data, our approach requires minimal gloss and skeletal level annotations for training. We achieve this by breaking down the task into dedicated sub-processes. We first translate spoken language sentences into sign pose sequences by combining an NMT network with a Motion Graph. The resulting pose information is then used to condition a generative model that produces photo realistic sign language video sequences. This is the first approach to continuous sign video generation that does not use a classical graphical avatar. We evaluate the translation abilities of our approach on the PHOENIX14T Sign Language Translation dataset. We set a baseline for text-to-gloss translation, reporting a BLEU-4 score of 16.34/15.26 on dev/test sets. We further demonstrate the video generation capabilities of our approach for both multi-signer and high-definition settings qualitatively and quantitatively using broadcast quality assessment metrics.
 
Neural Machine Translation with Deep Attention
Zhang, Biao, etc.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42(1): 154-163
Deepening neural models has been proven very successful in improving the models capacity when solving complex learning tasks, such as the machine translation task. Previous efforts on deep neural machine translation mainly focus on the encoder and the decoder, while little on the attention mechanism. However, the attention mechanism is of vital importance to induce the translation correspondence between different languages where shallow neural networks are relatively insufficient, especially when the encoder and decoder are deep. In this paper, we propose a deep attention model (DeepAtt). Based on the low-level attention information, DeepAtt is capable of automatically determining what should be passed or suppressed from the corresponding encoder layer so as to make the distributed representation appropriate for high-level attention and translation. We conduct experiments on NIST ChineseEnglish, WMT English-German, and WMT English-French translation tasks, where, with five attention layers, DeepAtt yields very competitive performance against the state-of-the-art results. We empirically find that with an adequate increase of attention layers, DeepAtt tends to produce more accurate attention weights. An in-depth analysis on the translation of important context words further reveals that DeepAtt significantly improves the faithfulness of system translations.