Question about Self Attn Layer #7

shawei3000 · 2018-10-04T00:43:18Z

In the SelfAttnMatch Layer, you manually modified the score tensor (line# 362), as below, can I ask what's the rationale behind this? is this step important to model performance?

if not self.diag:
x_len = x.size(1)
for i in range(x_len):
scores[:, i, i] = 0

seanliu96 · 2018-10-04T07:44:34Z

Hi shawei3000, we do not want to add attention to the word itself, so the diag of the score tensor is masked as 0. You can see the detail and Eq.(5) in https://arxiv.org/pdf/1705.02798v3.pdf

shawei3000 · 2018-10-04T13:32:29Z

Got it, Thank you! seanliu96! will close this question shortly...
"the diagonal of selfcoattention matrix is set to be zero in case of the word being aligned with itself"

Updating the diagonal of the matrix/tensor in tensorflow (slice update) appears not that easy or possible as in Pytorch... just in case someone might have the experience, any suggestion how to accomplish this with tensorflow tensor?

shawei3000 closed this as completed Oct 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Self Attn Layer #7

Question about Self Attn Layer #7

Question about Self Attn Layer #7

Question about Self Attn Layer #7

Comments

In the SelfAttnMatch Layer, you manually modified the score tensor (line# 362), as below, can I ask what's the rationale behind this? is this step important to model performance?

if not self.diag: x_len = x.size(1) for i in range(x_len): scores[:, i, i] = 0

if not self.diag:
x_len = x.size(1)
for i in range(x_len):
scores[:, i, i] = 0