분류 전체보기 359

The ways to avoid Gradient Vanishing and Exploding-Weight Initialization, Xavier, He, Batch Normalization, Internal Covariate Shift, Layer Normalization

Use ReLU, Leaky ReLU - Instead of sigmoid or hyperbolic tangent function for the hidden layer.- Do not use the sigmoid function on the hidden layer.- Leaky ReLU solves the problem of dead ReLU because the slope does not converge to zero for all inputs.- In the hidden layer, use variations of the ReLU function, such as ReLU or Leaky ReLU.Weight Initialization The aim of weight initialization is t..

Deep Learning 2026.01.22