Deep Learning/Tensorflow

Keras-Preprocessing, One-hot encoding, Word Embedding , Modeling, Compile

Naranjito 2021. 4. 9. 17:12
  • Keras

1. Preprocessing

from tensorflow.keras.preprocessing.text import Tokenizer
t=Tokenizer()
fit_text='The earth is an awesome place live'
t.fit_on_texts([fit_text])
test_text='The earth is an great place live'
sequences=t.texts_to_sequences([test_text])[0]
sequences
>>>[1, 2, 3, 4, 6, 7]

t.word_index
>>>{'an': 4, 'awesome': 5, 'earth': 2, 'is': 3, 'live': 7, 'place': 6, 'the': 1}

Tokenizer.fit_on_texts() : It returns text to list. Updates internal vocabulary based on a list of texts. In the case where texts contains lists, we assume each entry of the lists to be a token. 

 

Tokenizer.texts_to_sequences() : Transforms each text in texts to a sequence of integers. The most frequent words will be taken into account.

 

from tensorflow.keras.preprocessing.sequence import pad_sequences
pad_sequences([[1,2,3],[3,4,5,6],[7,8]],maxlen=3, padding='pre')
>>>
array([[1, 2, 3],
       [4, 5, 6],
       [0, 7, 8]], dtype=int32)

pad_sequences() : pads sequences to the same length.

- maxlen= : Maximum length of all sequences.

- padding= : 'pre' or 'post', pad either before or after each wequence. If 'pre', pad 0 before, otherwise, pad 0 later

 

2. One-hot encoding

Transform words to one-hot vector

i.g. one-hot vector : sparse vector, it has only one 1 value, rest of them are 0. i.g. [0 1 0 0 0 0 ... 0 0 0 0 0 0 0] 

 

3. Word Embedding 

Transform words to dense vector

 

i.g. dense vector : embedding vector, all float. i.g. [0.1 -1.2 0.8 0.2 1.8]

 

Embedding(number of samples, input_length) : 2-D integer tensor input

- number of samples : the result from integer encoding, in other words, it is integer sequence

Imprementing Embedding

Embedding(number of samples, input_length, embedding word dimensionality) : 3-D integer tensor output

text=[['Hope', 'to', 'see', 'you', 'soon'],['Nice', 'to', 'see', 'you', 'again']] 
text=[[0, 1, 2, 3, 4],[5, 1, 2, 3, 6]]
Embedding(7, 2, input_length=5)

7 words in total, 2 dimensions, length of sequence is 5

 

4. Modeling

  • Sequential 

- Using it in order to compose input layer-hidden layer-output layer.

- It lets stack the layers one by one.

- It connects several functions such as Wx+b, Sigmoid.


  • model.add()

- adding the layer step by step

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
model = Sequential()
model.add(Dense(1, input_dim=3, activation='relu'))

 

1 is number of output neurons, input_dim is number of input neurons, 'relu' is activation function(linear, sigmoid, softmax, relu)


from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
model = Sequential()
model.add(Dense(8, input_dim=4, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

 

5. Compile

 

To set the model's loss function, optimizer, and metrics before training the model.

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])