How to learning of DL

Deep Learning

How to learning of DL

Naranjito 2022. 3. 17. 16:34

Loss function

A gradient-based optimization strategy to train a model

f (x)

using some loss function

l (f (x_{i}), y_{i})

where

(x_{i}, y_{i})

are some input-output pair. It is used to help the model determine how "wrong" it is and, based on that "wrongness," improve itself. It's a measure of error. Our goal throughout training is to minimize this error/loss.

https://wandb.ai/sauravmaheshkar/cross-entropy/reports/What-Is-Cross-Entropy-Loss-A-Tutorial-With-Code--VmlldzoxMDA5NTMx

Gradient Descent

Reducing the value of the loss function.

	Gradient Descent by batch size
Gradient Descent		Training : all data → : by epochs
Stochastic Gradient Descent(SGD)		Training : random data → : by batch
Mini-batch Gradient Descent		Training : designated data → : by batch

optimizer

Linear regression is the task of finding one straight line that best fits the training data. At this time, the hypothesis of linear regression has the following format.

H (x) = W x + b

W

: Weight

b

: bias

Optimizer is the method to find w, b that minimizes the value of the Cost Function.

Optimizer
Momentum
Adagrad		Parameters with many changes set a small learning rate, few changes set a high learning rate.
RMSprop		Improve Adagrad
Adam		Combine RMSprop and momentum.

https://amber-chaeeunk.tistory.com/23

https://onevision.tistory.com/entry/Optimizer-%EC%9D%98-%EC%A2%85%EB%A5%98%EC%99%80-%ED%8A%B9%EC%84%B1-Momentum-RMSProp-Adam

Epochs

How many time train all data.

Batch size

Data unit

Let's say one data size is 256. For instance, it consists of [3,1,2,5, ...] and length is 256.

In other words, one data size = vector dimension = 256

If number of data is 3,000, total data size is 3,000 * 256.

Computer processes the data in chunks rather than processing them one by one.

If you take out 64 pieces of 3,000, then the batch size is 64.

Therefore the computer processes at once is (batch size × dim) = 64 × 256

- One data

[3,1,2,5, ...]
length = 256

- Number of data

[3,1,2,5, ...]
length = 256

... 3,000

[3,1,2,5, ...]
length = 256

저작자표시

'Deep Learning' 카테고리의 다른 글

Terms-text encoding, text decoding, embedding (0)	2022.09.23
Word2Vec (0)	2022.03.29
Activation function (0)	2022.03.17
sklearn (0)	2022.03.08
Dropout, Gradient Clipping, Weight Initialization, Xavier, He, Batch Normalization, Internal Covariate Shift, Layer Normalization (0)	2021.04.08

현재글How to learning of DL

Filter, abstractmethod, textdistance, yield from, zeros, Sigmoid function, randn, cross-entropy, forward propagation, docker-compose, d3js, Regular Expression, batch size, global variable, axis, nvidia-smi, Step Function, selectall, classmethod, kafka,

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

¡Hola, Mundo!