- vocabulary_
Mapping all the words with integers.
text=["Don't be fooled by the dark sounding name, Mr. Jone's Orphanage is as cheery as cheery goes for a pastry-shop."]
vector=CountVectorizer() # Or TfidfVectorizer()
print(vector.vocabulary_)
>>>
{'don': 5, 'be': 1, 'fooled': 6, 'by': 2, 'the': 17, 'dark': 4, 'sounding': 16, 'name': 12, 'mr': 11, 'jone': 10, 'orphanage': 13, 'is': 9, 'as': 0, 'cheery': 3, 'goes': 8, 'for': 7, 'pastry': 14, 'shop': 15}
- fit_transform
Return the frequency of each words.
print(vector.fit_transform(text).toarray())
>>>
[[2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1]]
- preprocessing.LabelEncoder
Replaces characters with integer numbers starting with 0.
lee=preprocessing.LabelEncoder()
arr=[1,2,2,5]
lee.fit(arr)
lee.transform(arr)
>>>
array([0, 1, 1, 2])
- classes_
The label for each class.
LabelEncoder().classes_