You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the SelfAttnMatch Layer, you manually modified the score tensor (line# 362), as below, can I ask what's the rationale behind this? is this step important to model performance?
if not self.diag:
x_len = x.size(1)
for i in range(x_len):
scores[:, i, i] = 0
The text was updated successfully, but these errors were encountered:
Hi shawei3000, we do not want to add attention to the word itself, so the diag of the score tensor is masked as 0. You can see the detail and Eq.(5) in https://arxiv.org/pdf/1705.02798v3.pdf
Got it, Thank you! seanliu96! will close this question shortly... "the diagonal of selfcoattention matrix is set to be zero in case of the word being aligned with itself"
Updating the diagonal of the matrix/tensor in tensorflow (slice update) appears not that easy or possible as in Pytorch... just in case someone might have the experience, any suggestion how to accomplish this with tensorflow tensor?
In the SelfAttnMatch Layer, you manually modified the score tensor (line# 362), as below, can I ask what's the rationale behind this? is this step important to model performance?
if not self.diag:
x_len = x.size(1)
for i in range(x_len):
scores[:, i, i] = 0
The text was updated successfully, but these errors were encountered: