분류 전체보기 337

normalization, WordNetLemmatizer, PorterStemmer, LancasterStemmer, Storword

normalization Integrate different words to make them the same word-such as US is same as USA integrate them as US. 1. WordNetLemmatizer If words have different forms, find the root word-such as the root of 'am, are, is' is 'be'. from nltk.stem import WordNetLemmatizer lemmatizer=WordNetLemmatizer() words=[ 'have', 'going', 'loves', 'lives', 'flies', 'dies', 'watched', 'has', 'starting'] print('b..

Deep Learning 2021.03.05

gensim, Scikit-learn, NLTK, TreebankWordTokenizer, WordPunctTokenizer, sent_tokenize, pos_tag, word_tokenize, NLP, text_to_word_sequence, Corpus

Corpus Natural Language Data NLP - Natural Language Processing gensim - It is an open source library for unsupervised topic modeling and natural language processing, using modern statistical machine learning. Scikit-learn - SciPy Toolkit. It features various classification, regression and clustering algorithms including support vector machines. NLTK - The Natural Language ToolKit, is a suite of ..

pandas-1. Series, reindex, isnull, notnull, fillna, drop, dropna, randn, describe, nan, value_counts, map, apply, concat

pandas It is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. 1. Series One-dimensional array with values and index can be granted to each values. import pandas as pd sr=pd.Series([1000,2000,3000,4000],index=['aaa','bbb','ccc','ddd']) sr >>>..