query_masks = tfsigntfabstfreduce_sumemb axis=-1 # N T_q query_masks = tftilequery_masks num_heads 1 # hN T_q query_masks = tftiletfexpand_dimsquery_masks -1 1 1 tfshapekeys1 # hN T_q T_

日期: 2028-12-11

标签: 教育

This code segment creates a mask for the query input to the multi-head attention layer.

The first line calculates the sum of absolute values of the embeddings and applies the sign function, resulting in a binary mask where 0 indicates a padded position.

The second line duplicates this mask for each head of the attention layer.

The third line expands the mask to cover all keys and values in the input sequence.

Finally, the output tensor is multiplied by this mask, effectively setting the attention scores to 0 for any padded positions in the input sequence

query_masks = tfsigntfabstfreduce_sumemb axis=-1 # N T_q query_masks = tftilequery_masks num_heads 1 # hN T_q query_masks = tftiletfexpand_dimsquery_masks -1 1 1 tfshapekeys1 # hN T_q T_

原文地址: http://www.cveoy.top/t/topic/fkx6 著作权归作者所有。请勿转载和采集!