- Activation function
It takes the output of a neuron and decide whether this neuron is going to fire or not, in other words, "should this neuron 'fire' or not?"
| Activation function | |||
| Step function | ![]() |
Hardly used. | |
| Sigmoid function |
![]() |
Used in Binary Classification. Used in output layer usually. When calculating the slope of the orange part, a very small value close to zero comes out. Therefore, if a very small slope close to zero is multiplied during the backpropagation process, the slope is not well transmitted to the front end. ![]() ↓ Vanishing Gradient occured. ![]() In other words, if the slope close to zero is continuously multiplied, the slope can hardly be propagated at the front end. Therefore, parameter W is not updated and learning is not possible. |
Used in the output layer nn.BCELoss() |
| Hyperbolic tangent function | ![]() |
output : between -1 and 1 Unlike the Sigmoid function, it is centered on zero, which causes the wider range of return value compared to the sigmoid function. And fewer vanishing slope than the sigmoid function. |
Used in the hidden layer |
| ReLU(Rectified Linear Unit) function | ![]() f(x) = max(0, x) |
If input : negative -> output : 0 input : positive -> output : input If the input value is negative, the slope is zero(dying ReLU). |
Used in the hidden layer |
| Leaky ReLU | ![]() f(x) = max(ax, x) |
If input : negative -> output : very small number such as 0.0001 input : positive -> output : input Unlike ReLU, If the input value is negative, the slope is not zero. |
Used in the hidden layer |
| Softmax function | ![]() |
Used in MultiClass Classification. Used in output layer usually. |
Used in the output layer nn.CrossEntropyLoss() |
- Sigmoid
- It will limit the value, such as an output of a neuron reaches really high values, which could create a problem for the optimizer.
- But it has a steep slope between -2 and 2 on the x coordinate. So small changes of x in this region will lead to to significant changes in y.

- The gradient will be very small, so it could lead to issues when the Sigmoid is used, as the updates of the model weights, depends on the value of the gradient.

- ReLU(Rectified Linear Unit)
- Dying ReLU-neurons, permanently deactivated : When x is negative, it will output zero, meaning it deactivatese neuron outputting negative values. So the gradient descent algorithm will not perform any updates nor neuron weights anymore.
- When x is positive, it will act as a linear function in x so it can lead to very high values.
- It is differentiable except in zero and nonlinear.
- Leaky ReLU
- It behaves linearly for negative value of x, but the slope coefficient is very small. By doing so, it is not fully deactivating the neuron. We can give the neuron the opportunity to be reactivated.
'Deep Learning' 카테고리의 다른 글
| Word2Vec (0) | 2022.03.29 |
|---|---|
| How to learning of DL (0) | 2022.03.17 |
| sklearn (0) | 2022.03.08 |
| The ways to avoid the model Overfitting-Dropout, Gradient Clipping (0) | 2021.04.08 |
| Forward Propagation, Forward Propagation Computation (0) | 2021.04.07 |







