Deep Learning

Activation function

Naranjito 2022. 3. 17. 13:33
  • Activation function

It takes the output of a neuron and decide whether this neuron is going to fire or not, in other words, "should this neuron 'fire' or not?"

Activation function
Step function
Hardly used.
Sigmoid function




Used in Binary Classification.
Used in output layer 
usually.
Vanishing Gradient possible.

Hyperbolic tangent function
 
ReLU(Rectified Linear Unit) function


f(x) = max(0,x)
If
input : negative -> output : 0
input : positive -> output : input
Leaky ReLU
If
input : negative -> output : very small number such as 0.0001
input : positive -> output : input
Softmax function
Used in MultiClass Classification.
Used in output layer 
usually.

 

  • Sigmoid

- It will limit the value, such as an output of a neuron reaches really high values, which could create a problem for the optimizer. 

- But it has a steep slope between -2 and 2 on the x coordinate. So small changes of x in this region will lead to to significant changes in y.

 

a  steep slope between -2 and 2

 

- The gradient will be very small, so it could lead to issues when the Sigmoid is used, as the updates of the model weights, depends on the value of the gradient.

The gradient will be very small


  • ReLU(Rectified Linear Unit)

- Dying ReLU-neurons, permanently deactivated : When x is negative, it will output zero, meaning it deactivatese neuron outputting negative values. So the gradient descent algorithm will not perform any updates nor neuron weights anymore.

- When x is positive, it will act as a linear function in x so it can lead to very high values.

- It is differentiable except in zero and nonlinear.


  • Leaky ReLU

- It behaves linearly for negative value of x, but the slope coefficient is very small. By doing so, it is not fully deactivating the neuron. We can give the neuron the opportunity to be reactivated.