Deep Learning

Attention

Naranjito 2023. 3. 31. 16:18

The idea behind the attention mechanism is that the decoder refer to entire input statement of encoder at every steps.

It will focus on the word(from encoder) which has more related to the word(decoder) to be predicted.

The result from the Softmax helps when Decoder predict the output word. The size of red rectangle represents how it helps to predict. The larger the rectangle, more helpful to predict. When each input word is quantified, it is sent to the Decoder as a single piece of informaton(the green triangle). As a result, the Decoder has a higher probability of predicting the output word more accurately.

 

Steps

 

1. Attention Score

In order to get the attention value, it need to be get first. 

It presents how much resemble between each of the Encoder's hidden states and the Decoder's hidden state at this point(st).

 

2. Attention Distribution

The result which is obtained by applying the attention score for all time points of the Encoder to Softmax(between 0 and 1). It obtained 'The probability distribution'. 

 

3. Attention Weight

Respective value of Attention Distribution.

 

4. Attention Value

The value after multiply each Encoder's hidden state and Attention Weight, and finally add them all.

It is also called the Context Vector as it contains the context of the Encoder.

 

5. Concatenate

Make a Vector after concatenate Attention Value on the Decoder's hidden state at this point(st).

 

6. Hyperbolic tangent function

After Concatenate, multiply by the weiht matrix, pass through the Hyperbolic Tangent Function to obtain a new Vector.

y ̂ is the final Vector.

'Deep Learning' 카테고리의 다른 글

2. Fundamentals of CNNs and RNNs  (0) 2023.01.31
1. Fundamentals of CNNs and RNNs  (0) 2023.01.31
AutoEncoder  (0) 2022.12.12
encoding vs embedding  (0) 2022.12.08
BPE(Byte Pair Encoding)  (0) 2022.11.30