Entropy, Cross-Entropy, Binary cross entropy, SparseCategoricalCrossentropy

Machine Learning

Entropy, Cross-Entropy, Binary cross entropy, SparseCategoricalCrossentropy

Naranjito 2023. 12. 1. 18:56

Entropy

The level of uncertainty. It must range between 0 and 1.

0 < entropy < 1
certain uncertain

The greater the value of entropy, the greater the uncertainty for probability,

the smaller the value the less the uncertainty.

reference : towardsdatascience.com/cross-entropy-loss-function-f38c4ec8643e

Cross-Entropy

How far it is from the actual expected value, is used when adjusting model weights during training. The smaller the loss the better the model.

For example, for a fair coin, there are two outcomes. The cross entropy between two discrete probability distributions is a metric that captures how similar the two distributions are.

BinaryCrossentropy

Computes the cross-entropy loss between true labels and predicted labels.

Use this cross-entropy loss for binary (0 or 1) classification applications. The loss function requires the following inputs:

y_true (true label): This is either 0 or 1.
y_pred (predicted value): This is the model's prediction, i.e, a single floating-point value which either represents a logit, (i.e, value in [-inf, inf] when from_logits=True) or a probability (i.e, value in [0., 1.] when from_logits=False).
In other words, from_logits=True means the loss function that the output values generated by the model are not normalized, the softmax function has not been applied on them to produce a probability distribution. Therefore, the output layer in this case does not have a softmax activation function:

keras.losses.BinaryCrossentropy(
    from_logits=False,
    label_smoothing=0.0,
    axis=-1,
    reduction="sum_over_batch_size",
    name="binary_crossentropy",
)

https://datascience.stackexchange.com/questions/73093/what-does-from-logits-true-do-in-sparsecategoricalcrossentropy-loss-function

Binary cross entropy loss (or log loss)

It stores only one value, for example, if you were looking for the odds in a coin toss, it would store that information at 0.5 and 0.5 (heads and tails).

That means it would store only 0.5, with the other 0.5 assumed in a different problem. If the first probability was 0.7, it would assume the other was 0.3). It is used in scenarios where there are only two possible outcomes.

l = - y \log (p) + (1 - y) \log (1 - p)

where

p is the predicted probability, and

y is the indicator ( or 1 in the case of binary classification)

Let's walk through what happens for a particular data point. Let's say the correct indicator is i.e, y=1y = 1. In this case,

l = - (1 \log (p) + (1 - 1) \log (1 - p))

l = - (1 \log (p))

the value of loss l depends on the probability p. Therefore, our loss function will reward the model for giving a correct prediction (high value of p) with a low loss.

However, if the probability is lower, the value of the error will be high (bigger negative value), and therefore it penalizes the model for a wrong outcome.

https://wandb.ai/sauravmaheshkar/cross-entropy/reports/What-Is-Cross-Entropy-Loss-A-Tutorial-With-Code--VmlldzoxMDA5NTMx

SparseCategoricalCrossentropy

An extension of the categorical cross-entropy loss function that is used when the training data labels are represented in integer.

Categorical cross-entropy is used when we have to deal with the labels that are one-hot encoded, for example, we have the following values for 3-class classification problem [1,0,0], [0,1,0] and [0,0,1].
In sparse categorical cross-entropy , labels are integer encoded, for example, [0], [1] and [2] for 3-class problem.

Categorical cross-entropy

The labels that are one-hot encoded : [0, 0, 1], [0, 1, 0]

categorical_loss = tf.keras.losses.CategoricalCrossentropy()
y_target_categorical = tf.convert_to_tensor([[0, 0, 1], [0, 1, 0]])
y_predction = tf.convert_to_tensor([[0, 0.1, 0.9], [0.1, 0.8, 0.2]])
categorical_loss(y_target_categorical, y_predction).numpy()

If we use integer encoded([2, 1]), it gives an error.

y_target_sparse = tf.convert_to_tensor([2, 1])
categorical_loss(y_target_sparse, y_predction).numpy()

>>>
InvalidArgumentError: Incompatible shapes: [2] vs. [2,3] [Op:Mul] name: categorical_crossentropy/mul/

SparseCategoricalCrossentropy

tf.keras.losses.SparseCategoricalCrossentropy(
    from_logits=False,
    ignore_class=None,
    reduction=losses_utils.ReductionV2.AUTO,
    name='sparse_categorical_crossentropy')

In the case above,

The labels are integer encoded : [2, 1]

sparse_categorical_loss = tf.keras.losses.SparseCategoricalCrossentropy()
sparse_categorical_loss(y_target_sparse, y_predction).numpy()

https://jins-sw.tistory.com/16

https://freedium.cfd/https://rmoklesur.medium.com/what-you-need-to-know-about-sparse-categorical-cross-entropy-9f07497e3a6f

from_Logits = True

- If we didn’t use a SoftMax layer in the final layer, we should say from_logits=True when defining the Loss function.

- It is not using the softmax function separately and wants to include it in the calculation of the loss function.

- This means that whatever inputs you are providing to the loss function is not scaled (means inputs are just the number from -inf to +inf and not the probabilities).

from_Logits = False(default)

- The softmax function would be automatically applied on the output values by the loss function.

- It is assuming that whatever the input that you will be feeding to the loss function are the probabilities, so no need to apply the softmax function.

https://stackoverflow.com/questions/57253841/from-logits-true-and-from-logits-false-get-different-training-result-for-tf-loss

저작자표시

'Machine Learning' 카테고리의 다른 글

Backpropagation, chain rule (0)	2023.12.14
Sigmoid, Softmax, Cross entropy (0)	2023.12.05
Variance VS Bias, Bias&Variance Trade-off (0)	2023.10.31
model.parameters (0)	2023.01.17
Supervised vs Unsupervised Learning (0)	2022.12.12

현재글Entropy, Cross-Entropy, Binary cross entropy, SparseCategoricalCrossentropy

Filter, kafka, global variable, docker-compose, yield from, forward propagation, axis, Sigmoid function, textdistance, zeros, selectall, batch size, d3js, nvidia-smi, Step Function, randn, abstractmethod, cross-entropy, classmethod, Regular Expression,

일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

¡Hola, Mundo!