- LSA
Latent Semantic Analysis, substitute for DTM, TF-IDF(2021.03.10 - [Deep Learning] - BoW, CountVectorizer, fit_transform, vocabulary_, DTM, TDM, TF-IDF, TfidfVectorizer, isnull, fillna, pd.Series) which has not consider meaning of terms. It applies SVD based on DTM, TF-IDF and reduce dimensions, eliciting potential meaning of words.
1. SVD
Singular Value Decomposition, it refers to the decomposition of these 3 matrix when A is a matrix of m * n
U : orthogonal matrix of m * m
∑ : diagonal matrix, all elements is zero except the main diagonal
V : orthogonal matrix of n * n
- Orthogonal matrix
The result of multiplication of transposed matrix or reverse transposed matrix should be identity matrix.
The matrix A should be satisfied all of below.
- Transposed matrix
The matrix transposed of row and column.
- Identity matrix
Square matrix, all elements is zero except the main diagonal which is 1.
- Inverse matrix
A matrix(A-1) makes identity matrix after multiply.
- Diagonal matrix
Rectangular matrix, all elements is zero except the main diagonal which is something.
2. Truncated SVD
Dimensionality reduction, de-noising and compression rather than SVD.
Let's assume t(hyperparameter) is 2, and then truncate by 2. This is Truncated SVD, Ut × S(=∑)t × VTt
3. LDA
Latent Ditichlet Allocation, is a process to find the topic(hidden meaning) from the set of documents. It is used to classify text in a document to a particular topic.
- The first criterion is observe which topic is belong the words in doc1, following this criterion, it is high chance 'apple' belongs to both of topic A and B because topic A and B are allocated same proportion.
- The second criterion is observe 'apple' in which topic is belong to. According to this criterion, it is high chance to allocates to topic B.
'Deep Learning' 카테고리의 다른 글
FFNN, RNN, FCNNs (0) | 2021.04.05 |
---|---|
Perceptron, Step function, Single-Layer Perceptron, Multi-Layer Perceptron, DNN (0) | 2021.03.31 |
Bag of words(BoW), DTM, TDM, TF-IDF (0) | 2021.03.10 |
LM, Language Model, Language Modeling, Conditional Probability, Statistical Language Model, n-gram (0) | 2021.03.09 |
normalization, WordNetLemmatizer, PorterStemmer, LancasterStemmer, Storword (0) | 2021.03.05 |