Python ile NLP için Özellik Mühendisliği
Rounak Banik
Data Scientist
Her ML algoritması için,
Derlem
"The lion is the king of the jungle"
"Lions have lifespans of a decade"
"The lion is an endangered species"
Söz varlığı → a, an, decade, endangered, have, is, jungle, king, lifespans, lion, Lions, of, species, the, The
"The lion is the king of the jungle"
[0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 2, 1]
"Lions have lifespans of a decade"
[1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0]
"The lion is an endangered species"
[0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1]
Lions, lion → lionThe, the → thecorpus = pd.Series([
'The lion is the king of the jungle',
'Lions have lifespans of a decade',
'The lion is an endangered species'
])
# Import CountVectorizer from sklearn.feature_extraction.text import CountVectorizer# Create CountVectorizer object vectorizer = CountVectorizer()# Generate matrix of word vectors bow_matrix = vectorizer.fit_transform(corpus) print(bow_matrix.toarray())
array([[0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 3],
[0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0],
[1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1]], dtype=int64)
Python ile NLP için Özellik Mühendisliği