Machine Learning

Entropy, Cross-Entropy, Binary cross entropy, SparseCategoricalCrossentropy

Naranjito 2023. 12. 1. 18:56
  • Entropy
The level of uncertainty. It must range between 0 and 1.

0      <      entropy      <      1
certain                          uncertain

The greater the value of entropy, the greater the uncertainty for probability,

the smaller the value the less the uncertainty.

 

reference : towardsdatascience.com/cross-entropy-loss-function-f38c4ec8643e

 

  • Cross-Entropy

How far it is from the actual expected value, is used when adjusting model weights during training. The smaller the loss the better the model.

For example, for a fair coin, there are two outcomes. The cross entropy between two discrete probability distributions is a metric that captures how similar the two distributions are.

 

  • BinaryCrossentropy

Computes the cross-entropy loss between true labels and predicted labels.

 

Use this cross-entropy loss for binary (0 or 1) classification applications. The loss function requires the following inputs:

  • y_true (true label): This is either 0 or 1.
  • y_pred (predicted value): This is the model's prediction, i.e, a single floating-point value which either represents a logit, (i.e, value in [-inf, inf] when from_logits=True) or a probability (i.e, value in [0., 1.] when from_logits=False).
  • In other words, from_logits=True means the loss function that the output values generated by the model are not normalized, the softmax function has not been applied on them to produce a probability distribution. Therefore, the output layer in this case does not have a softmax activation function:
keras.losses.BinaryCrossentropy(
    from_logits=False,
    label_smoothing=0.0,
    axis=-1,
    reduction="sum_over_batch_size",
    name="binary_crossentropy",
)

 

https://datascience.stackexchange.com/questions/73093/what-does-from-logits-true-do-in-sparsecategoricalcrossentropy-loss-function

 

  • Binary cross entropy loss (or log loss)

It stores only one value, for example, if you were looking for the odds in a coin toss, it would store that information at 0.5 and 0.5 (heads and tails). 

That means it would store only 0.5, with the other 0.5 assumed in a different problem. If the first probability was 0.7, it would assume the other was 0.3). It is used in scenarios where there are only two possible outcomes.


 

l = - y log ( p ) + ( 1 - y ) log ( 1 - p )
where
p is the predicted probability, and
y is the indicator (  or 1 in the case of binary classification)

Let's walk through what happens for a particular data point. Let's say the correct indicator is i.e, y=1y = 1. In this case,

l = - ( 1 log ( p ) + ( 1 - 1 ) log ( 1 - p ) )

 

l = - ( 1 log ( p ) )
the value of loss l depends on the probability p. Therefore, our loss function will reward the model for giving a correct prediction (high value of p) with a low loss.
However, if the probability is lower, the value of the error will be high (bigger negative value), and therefore it penalizes the model for a wrong outcome.

 

  • SparseCategoricalCrossentropy

An extension of the categorical cross-entropy loss function that is used when the training data labels are represented in integer.

 

  • Categorical cross-entropy is used when we have to deal with the labels that are one-hot encoded, for example, we have the following values for 3-class classification problem [1,0,0], [0,1,0] and [0,0,1].
  • In sparse categorical cross-entropy , labels are integer encoded, for example, [0], [1] and [2] for 3-class problem.

  • Categorical cross-entropy

The labels that are one-hot encoded : [0, 0, 1], [0, 1, 0]

categorical_loss = tf.keras.losses.CategoricalCrossentropy()
y_target_categorical = tf.convert_to_tensor([[0, 0, 1], [0, 1, 0]])
y_predction = tf.convert_to_tensor([[0, 0.1, 0.9], [0.1, 0.8, 0.2]])
categorical_loss(y_target_categorical, y_predction).numpy()

 

If we use integer encoded([2, 1]), it gives an error.

y_target_sparse = tf.convert_to_tensor([2, 1])
categorical_loss(y_target_sparse, y_predction).numpy()

>>>
InvalidArgumentError: Incompatible shapes: [2] vs. [2,3] [Op:Mul] name: categorical_crossentropy/mul/

  • SparseCategoricalCrossentropy
tf.keras.losses.SparseCategoricalCrossentropy(
    from_logits=False,
    ignore_class=None,
    reduction=losses_utils.ReductionV2.AUTO,
    name='sparse_categorical_crossentropy')

 

In the case above,

The labels are integer encoded : [2, 1] 

sparse_categorical_loss = tf.keras.losses.SparseCategoricalCrossentropy()
sparse_categorical_loss(y_target_sparse, y_predction).numpy()

 

https://jins-sw.tistory.com/16

https://freedium.cfd/https://rmoklesur.medium.com/what-you-need-to-know-about-sparse-categorical-cross-entropy-9f07497e3a6f

 


  • from_Logits = True

 

- If we didn’t use a SoftMax layer in the final layer, we should say from_logits=True when defining the Loss function.

- It is not using the softmax function separately and wants to include it in the calculation of the loss function.

- This means that whatever inputs you are providing to the loss function is not scaled (means inputs are just the number from -inf to +inf and not the probabilities).


  • from_Logits = False(default)

 

- The softmax function would be automatically applied on the output values by the loss function.

- It is assuming that whatever the input that you will be feeding to the loss function are the probabilities, so no need to apply the softmax function.

 

https://stackoverflow.com/questions/57253841/from-logits-true-and-from-logits-false-get-different-training-result-for-tf-loss

'Machine Learning' 카테고리의 다른 글

Backpropagation, chain rule  (0) 2023.12.14
Sigmoid, Softmax, Cross entropy  (0) 2023.12.05
Variance VS Bias, Bias&Variance Trade-off  (0) 2023.10.31
model.parameters  (0) 2023.01.17
Supervised vs Unsupervised Learning  (0) 2022.12.12