The idea behind the attention mechanism is that the decoder refer to entire input statement of encoder at every steps. It will focus on the word(from encoder) which has more related to the word(decoder) to be predicted. The result from the Softmax helps when Decoder predict the output word. The size of red rectangle represents how it helps to predict. The larger the rectangle, more helpful to pr..