keras-Tokenizer

Deep Learning/Tensorflow

keras-Tokenizer

Naranjito 2021. 3. 8. 17:25

word_index

Grand index to each words.

print(tokenizer.word_index)

>>>
{'barber': 1, 'secret': 2, 'huge': 3, 'kept': 4, 'person': 5, 'word': 6, 'keeping': 7, 'good': 8, 'knew': 9, 'driving': 10, 'crazy': 11, 'went': 12, 'mountain': 13}

word_counts

Count the each words.

print(tokenizer.word_counts)

>>>
OrderedDict([('barber', 8), ('person', 3), ('good', 1), ('huge', 5), ('knew', 1), ('secret', 6), ('kept', 4), ('word', 2), ('keeping', 2), ('driving', 1), ('crazy', 1), ('went', 1), ('mountain', 1)])

texts_to_sequences

Convert the words to index.

encoded=tokenizer.texts_to_sequences(preprocessed_sentences)
print(encoded)

>>>
[[1, 5], [1, 8, 5], [1, 3, 5], [9, 2], [2, 4, 3, 2], [3, 2], [1, 4, 6], [1, 4, 6], [1, 4, 2], [7, 7, 3, 2, 10, 1, 11], [1, 12, 3, 13]]

pad_sequences

Padding to 2D numpy array.

pad_sequences(encoded)

>>>
array([[ 0,  0,  0,  0,  0,  1,  5],
       [ 0,  0,  0,  0,  1,  8,  5], dtype=int32)

- padding : pad either before or after

- truncating : remove values either at the beginning or at the end

pad_sequences(encoded, padding='post', truncating='post', value=len(tokenizer.word_index)+1)

>>>
array([[ 1,  5, 14, 14, 14, 14, 14],
       [ 1,  8,  5, 14, 14, 14, 14],dtype=int32)

num_words

The maximum number of words to keep, based on word frequency.

Tokenizer(num_words=10)

fit_on_texts

Assign lower integer to word frequency.

tokenizer.fit_on_texts(sentences)

>>>
{'barber': 1, 'secret': 2, 'huge': 3, 'kept': 4, 'person': 5, 'word': 6, 'keeping': 7, 'good': 8, 'knew': 9, 'driving': 10, 'crazy': 11, 'went': 12, 'mountain': 13}

texts_to_matrix

Convert a list of texts to a Numpy matrix.

- mode : one of "binary", "count", "tfidf", "freq"

저작자표시

'Deep Learning > Tensorflow' 카테고리의 다른 글

GradientTape (0)	2023.12.12
reduce_sum, cast, argmax, image_dataset_from_directory, one_hot, reduce_mean, assign_sub, boolean_mask, random.normal, zeros (0)	2023.12.12
LSTM (0)	2022.09.15
Keras-Preprocessing, One-hot encoding, Word Embedding , Modeling, Compile (0)	2021.04.09
gensim, Scikit-learn, NLTK, TreebankWordTokenizer, WordPunctTokenizer, sent_tokenize, pos_tag, word_tokenize, NLP, text_to_word_sequence, Corpus (0)	2021.03.05

현재글keras-Tokenizer

nvidia-smi, batch size, forward propagation, kafka, classmethod, cross-entropy, Regular Expression, yield from, zeros, Sigmoid function, Step Function, docker-compose, randn, Filter, abstractmethod, global variable, selectall, d3js, textdistance, axis,

일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

¡Hola, Mundo!

keras-Tokenizer

'Deep Learning > Tensorflow' 카테고리의 다른 글

'Deep Learning/Tensorflow'의 다른글

티스토리툴바

keras-Tokenizer

'Deep Learning > Tensorflow' 카테고리의 다른 글

'Deep Learning/Tensorflow'의 다른글

관련글

티스토리툴바