Python 如何对频率进行过滤并在代码中添加bigram?
该输出给出一个条形图,其中单词位于x轴上,频率位于y轴上。但是,我想添加两个增强功能: 1) 仅显示频率大于2的值 2) 包括大人物Python 如何对频率进行过滤并在代码中添加bigram?,python,pandas,numpy,histogram,countvectorizer,Python,Pandas,Numpy,Histogram,Countvectorizer,该输出给出一个条形图,其中单词位于x轴上,频率位于y轴上。但是,我想添加两个增强功能: 1) 仅显示频率大于2的值 2) 包括大人物 import pandas as pd df = pd.DataFrame(['my big dog', 'my lazy cat']) df # 0 #0 my big dog #1 my lazy cat value_list = [row[0] for row in df.itertuples(index=False, name=None)]
import pandas as pd
df = pd.DataFrame(['my big dog', 'my lazy cat'])
df
# 0
#0 my big dog
#1 my lazy cat
value_list = [row[0] for row in df.itertuples(index=False, name=None)]
value_list
#['my big dog', 'my lazy cat']
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer()
x_train = cv.fit_transform(value_list)
x_train.toarray()
x_train_sum = x_train.sum(axis=0)
x_train_sum
result = []
for word, col in cv.vocabulary_.items():
result.append((word, x_train_sum[0,col]))
word = []
frequency = []
for i in range(len(result)):
word.append(result[i][0])
frequency.append(result[i][1])
indices = np.arrange(len(results))
plt.bar(indices, frequency, color 'r')
plt.xticks(indices, word, rotation = 'vertical')
plt.tight_layout()
plt.show()
我不知道你所说的“包括大人物”是什么意思,但你问题第一部分的答案如下:
indices = [i for i in range(len(frequency)) if frequency[i] >= 2]
frequency = [frequency[i] for i in indices]
word = [word[i] for i in indices]
在创建图像之前,添加这3行将过滤大于2的频率