Query Mask Generation for Multi-Head Attention in TensorFlow
This code segment creates a mask for the query input to the multi-head attention layer.
The first line calculates the sum of absolute values of the embeddings and applies the sign function, resulting in a binary mask where 0 indicates a padded position.
'query_masks = tf.sign(tf.abs(tf.reduce_sum(emb, axis=-1))) # (N, T_q)'
The second line duplicates this mask for each head of the attention layer.
'query_masks = tf.tile(query_masks, [num_heads, 1]) # (h*N, T_q)'
The third line expands the mask to cover all keys and values in the input sequence.
'query_masks = tf.tile(tf.expand_dims(query_masks, -1), [1, 1, tf.shape(keys)[1]]) # (h*N, T_q, T_k)'
Finally, the output tensor is multiplied by this mask, effectively setting the attention scores to 0 for any padded positions in the input sequence.
'outputs *= query_masks # broadcasting. (N, T_q, C)'
原文地址: https://www.cveoy.top/t/topic/obCr 著作权归作者所有。请勿转载和采集!