Python Ngram的计数低于所需的输出_Python_N Gram

Python Ngram的计数低于所需的输出

python

Python Ngram的计数低于所需的输出,python,n-gram,Python,N Gram,以下内容使我获得了以下输出： words freq 0 hello 5 1 yes 10 I would like the above output to be same for ngrams(4). The results is only showing freq with "1". Can someone help me tune the codes for ngrams and as per the above output.

以下内容使我获得了以下输出：

        words   freq
0        hello   5
1        yes     10


I would like the above output to be same for ngrams(4). The results is only showing freq with "1". Can someone help me tune the codes for ngrams and as per the above output. The requirement is Ngrams with freqencies and output in excel(xlsx).

示例如下：

 (('benito', 'kanchan'), 1),
 (('kanchan', 'tata'), 1),
 (('tata', 'arora'), 1),

So far the code:

df = pd.read_excel(r"Filename")

#Converting to lovercase
df['Body'] = df['Body'].apply(lambda x: " ".join(x.lower() for x in x.split()))
df['Body'].head()

#Count of Words
df['word_count'] = df['Body'].apply(lambda x: len(str(x).split(" ")))
df[['Body','word_count']].head()

#Removing Punctuation
df['Body'] = df['Body'].str.replace('[^\w\s]','')
df['Body'].head()

#Removing Stop Words
from nltk.corpus import stopwords
stop = stopwords.words('english')

df['Body'] = df['Body'].apply(lambda x: " ".join(x for x in x.split() if x not in stop))
df['Body'].head()

#df['Body'] = df['Body'].astype('|S')

# Word Count
tf1 = (df['Body']).apply(lambda x: pd.value_counts(x.split(" "))).sum(axis = 0).reset_index()
print (tf1)
tf1.columns = ['words','tf']
tf1

英格拉姆从收款进口柜台从textblob导入textblob a=TextBlob（tf1['words'][0]）.ngrams（4） a=['，'.join（映射（str，l））表示a中的l] 印刷品（a）计数器=（计数器（a））最常见的计数器（150） counter.columns=['ngram'，'tf'] 柜台