How to Answer: Demonstrate the 'bag of words' model and state the benefits and limitations of using this model.
Advice and answer examples written specifically for an Advanced Machine Learning Engineer - Python job interview.
4. Demonstrate the 'bag of words' model and state the benefits and limitations of using this model.
This question shows the developer's knowledge of machine learning terminology and its purpose.
One of the benefits of using the "Bag of words'' model is that it simplifies some NLP algorithms. Some of the possible limitations of the model are related to the sparsity and the meaning of the text.
The sparsity of the text refers to the bag of word models creating "sparse" vectors. This increases the spatial complexity of the algorithm.
The Meaning refers to the context of the text. The bag of word model does not take into consideration the order of the words in the text nor does it "understand" the context of the text. The "meaning" of the sentence is lost in this model.
Ensure that you have installed all the packages required for your algorithm.
Below is an example of the implementation of the "bag of words" model for a given sample text. The output shows the sample text along with the frequency calculation of each of the words.
import numpy as np
import nltk
from nltk import word_tokenize,sent_tokenize
from nltk.tokenize import word_tokenize
from collections import defaultdict
data = ['I really love pizza, it is delicious. I think it is the best', 'She is a good person','good people are the best' ]
sentences = []
vocab = []
for sent in data:
x = word_tokenize(sent)
sentence = [w.lower() for w in x if w.isalpha()]
sentences.append(sentence)
for word in sentence:
if word not in vocab:
vocab.append(word)
len_vector = len(vocab)
index_word = {}
i=0
for word in vocab:
index_word[word] = i
i += 1
def bag_of_words(sent):
count_dict = defaultdict(int)
vec = np.zeros(len_vector)
for item in sent:
count_dict[item] += 1
for key , item in count_dict.items():
vec[index_word[key]] = item
return vec
vector = bag_of_words(sentences[0])
print(sentences[0])
print(vector)
Written by Tiarnan Brady on June 13th, 2021