Deep Learning

Activation function

Naranjito 2022. 3. 17. 13:33
  • Activation function

It takes the output of a neuron and decide whether this neuron is going to fire or not, in other words, "should this neuron 'fire' or not?"

Activation function  
Step function
Hardly used.  
Sigmoid function





Used in Binary Classification.
Used in output layer 
usually.

When calculating the slope of the orange part, a very small value close to zero comes out. Therefore, if a very small slope close to zero is multiplied during the backpropagation process, the slope is not well transmitted to the front end. 



Vanishing Gradient occured.



In other words, if the slope close to zero is continuously multiplied, the slope can hardly be propagated at the front end. Therefore, parameter W is not updated and learning is not possible.
Used in the output layer

nn.BCELoss()
Hyperbolic tangent function
output : between -1 and 1

Unlike the Sigmoid function, it is centered on zero, which causes the wider range of return value compared to the sigmoid function. And fewer vanishing slope than the sigmoid function. 
Used in the hidden layer
ReLU(Rectified Linear Unit) function


f(x) = max(0, x)
If
input : negative -> output : 0
input : positive -> output : input

If the input value is negative, the slope is zero(dying ReLU).
Used in the hidden layer
Leaky ReLU


f(x) = max(ax, x)
If
input : negative -> output : very small number such as 0.0001
input : positive -> output : input

Unlike ReLU, If the input value is negative, the slope is not zero.
Used in the hidden layer
Softmax function
Used in MultiClass Classification.
Used in output layer 
usually.
Used in the output layer

nn.CrossEntropyLoss()

 

  • Sigmoid

 

- It will limit the value, such as an output of a neuron reaches really high values, which could create a problem for the optimizer. 

- But it has a steep slope between -2 and 2 on the x coordinate. So small changes of x in this region will lead to to significant changes in y.

 

a  steep slope between -2 and 2


- The gradient will be very small, so it could lead to issues when the Sigmoid is used, as the updates of the model weights, depends on the value of the gradient.

The gradient will be very small


  • ReLU(Rectified Linear Unit)

 

- Dying ReLU-neurons, permanently deactivated : When x is negative, it will output zero, meaning it deactivatese neuron outputting negative values. So the gradient descent algorithm will not perform any updates nor neuron weights anymore.

- When x is positive, it will act as a linear function in x so it can lead to very high values.

- It is differentiable except in zero and nonlinear.


  • Leaky ReLU

 

- It behaves linearly for negative value of x, but the slope coefficient is very small. By doing so, it is not fully deactivating the neuron. We can give the neuron the opportunity to be reactivated.

 

https://wikidocs.net/60021

'Deep Learning' 카테고리의 다른 글

Word2Vec  (0) 2022.03.29
How to learning of DL  (0) 2022.03.17
sklearn  (0) 2022.03.08
The ways to avoid the model Overfitting-Dropout, Gradient Clipping  (0) 2021.04.08
Forward Propagation, Forward Propagation Computation  (0) 2021.04.07