Deep Learning

LSA, SVD, Orthogonal matrix, Transposed matrix, Identity matrix, Inverse matrix, Diagonal matrix, Truncated SVD

Naranjito 2021. 3. 11. 16:29
  • LSA

Latent Semantic Analysis, substitute for DTM, TF-IDF(2021.03.10 - [Deep Learning] - BoW, CountVectorizer, fit_transform, vocabulary_, DTM, TDM, TF-IDF, TfidfVectorizer, isnull, fillna, pd.Series) which has not consider meaning of terms. It applies SVD based on DTM, TF-IDF and reduce dimensions, eliciting potential meaning of words.


1. SVD

 

Singular Value Decomposition, it refers to the decomposition of these 3 matrix when A is a matrix of m * n

U : orthogonal matrix of m * m

: diagonal matrix, all elements is zero except the main diagonal

V : orthogonal matrix of n * n


- Orthogonal matrix

The result of multiplication of transposed matrix or reverse transposed matrix should be identity matrix.

The matrix A should be satisfied all of below.


A = [ 1 0 0 1 ]

A T = [ 1 0 0 1 ]

 A  A T = [ ( 1 ) ( 1 ) ( 0 ) ( 0 ) ( 0 ) ( 0 ) ( 1 ) ( 1 ) ] = [ 1 0 0 1 ]

- Transposed matrix

The matrix transposed of row and column.


- Identity matrix

Square matrix, all elements is zero except the main diagonal which is 1.


- Inverse matrix

A matrix(A-1) makes identity matrix after multiply.


- Diagonal matrix

Rectangular matrix, all elements is zero except the main diagonal which is something.


2. Truncated SVD

 

Dimensionality reduction, de-noising and compression rather than SVD.

Let's assume t(hyperparameter) is 2, and then truncate by 2. This is Truncated SVD, Ut × S(=)t × VTt


3. LDA

 

Latent Ditichlet Allocation, is a process to find the topic(hidden meaning) from the set of documents. It is used to classify text in a document to a particular topic. 

 

 

- The first criterion is observe which topic is belong the words in doc1, following this criterion, it is high chance 'apple' belongs to both of topic A and B because topic A and B are allocated same proportion.

- The second criterion is observe 'apple' in which topic is belong to. According to this criterion, it is high chance to allocates to topic B.