keras-Tokenizer

Deep Learning/Tensorflow

keras-Tokenizer

Naranjito 2021. 3. 8. 17:25

word_index

Grand index to each words.

print(tokenizer.word_index)

>>>
{'barber': 1, 'secret': 2, 'huge': 3, 'kept': 4, 'person': 5, 'word': 6, 'keeping': 7, 'good': 8, 'knew': 9, 'driving': 10, 'crazy': 11, 'went': 12, 'mountain': 13}

word_counts

Count the each words.

print(tokenizer.word_counts)

>>>
OrderedDict([('barber', 8), ('person', 3), ('good', 1), ('huge', 5), ('knew', 1), ('secret', 6), ('kept', 4), ('word', 2), ('keeping', 2), ('driving', 1), ('crazy', 1), ('went', 1), ('mountain', 1)])

texts_to_sequences

Convert the words to index.

encoded=tokenizer.texts_to_sequences(preprocessed_sentences)
print(encoded)

>>>
[[1, 5], [1, 8, 5], [1, 3, 5], [9, 2], [2, 4, 3, 2], [3, 2], [1, 4, 6], [1, 4, 6], [1, 4, 2], [7, 7, 3, 2, 10, 1, 11], [1, 12, 3, 13]]

pad_sequences

Padding to 2D numpy array.

pad_sequences(encoded)

>>>
array([[ 0,  0,  0,  0,  0,  1,  5],
       [ 0,  0,  0,  0,  1,  8,  5], dtype=int32)

- padding : pad either before or after

- truncating : remove values either at the beginning or at the end

pad_sequences(encoded, padding='post', truncating='post', value=len(tokenizer.word_index)+1)

>>>
array([[ 1,  5, 14, 14, 14, 14, 14],
       [ 1,  8,  5, 14, 14, 14, 14],dtype=int32)

num_words

The maximum number of words to keep, based on word frequency.

Tokenizer(num_words=10)

fit_on_texts

Assign lower integer to word frequency.

tokenizer.fit_on_texts(sentences)

>>>
{'barber': 1, 'secret': 2, 'huge': 3, 'kept': 4, 'person': 5, 'word': 6, 'keeping': 7, 'good': 8, 'knew': 9, 'driving': 10, 'crazy': 11, 'went': 12, 'mountain': 13}

texts_to_matrix

Convert a list of texts to a Numpy matrix.

- mode : one of "binary", "count", "tfidf", "freq"

저작자표시

'Deep Learning > Tensorflow' 카테고리의 다른 글

GradientTape (0)	2023.12.12
reduce_sum, cast, argmax, image_dataset_from_directory, one_hot, reduce_mean, assign_sub, boolean_mask, random.normal, zeros (0)	2023.12.12
LSTM (0)	2022.09.15
Keras-Preprocessing, One-hot encoding, Word Embedding , Modeling, Compile (0)	2021.04.09
gensim, Scikit-learn, NLTK, TreebankWordTokenizer, WordPunctTokenizer, sent_tokenize, pos_tag, word_tokenize, NLP, text_to_word_sequence, Corpus (0)	2021.03.05

현재글keras-Tokenizer

batch size, yield from, classmethod, zeros, Sigmoid function, forward propagation, Step Function, docker-compose, Filter, cross-entropy, axis, global variable, textdistance, Regular Expression, randn, abstractmethod, kafka, d3js, nvidia-smi, selectall,

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

¡Hola, Mundo!

keras-Tokenizer

'Deep Learning > Tensorflow' 카테고리의 다른 글

'Deep Learning/Tensorflow'의 다른글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

keras-Tokenizer

'Deep Learning > Tensorflow' 카테고리의 다른 글

'Deep Learning/Tensorflow'의 다른글

관련글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역