Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/287.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何构建数据帧中项目的频率计数表?_Python_Pandas_Indexing_Word Frequency_Frequency Distribution - Fatal编程技术网

Python 如何构建数据帧中项目的频率计数表?

Python 如何构建数据帧中项目的频率计数表?,python,pandas,indexing,word-frequency,frequency-distribution,Python,Pandas,Indexing,Word Frequency,Frequency Distribution,假设我在csv文件中有以下数据,example.csv: Word Score Dog 1 Bird 2 Cat 3 Dog 2 Dog 3 Dog 1 Bird 3 Cat 1 Bird 1 Cat 3 我想计算每个分数中每个单词的频率。预期输出如下: 1 2 3 Dog 2 1 1 Bird 0 1 1 Cat 1 0 2 我的代码如下所示

假设我在csv文件中有以下数据,
example.csv

Word    Score
Dog     1
Bird    2
Cat     3
Dog     2
Dog     3
Dog     1
Bird    3
Cat     1
Bird    1
Cat     3
我想计算每个分数中每个单词的频率。预期输出如下:

        1   2   3
Dog     2   1   1
Bird    0   1   1
Cat     1   0   2
我的代码如下所示:

作为pd进口熊猫

x1 = pd.read_csv(r'path\to\example.csv')

def getUniqueWords(allWords) :
    uniqueWords = [] 
    for i in allWords:
        if not i in uniqueWords:
            uniqueWords.append(i)
    return uniqueWords

unique_words = getUniqueWords(x1['Word'])
unique_scores = getUniqueWords(x1['Score'])

scores_matrix = [[0 for x in range(len(unique_words))] for x in range(len(unique_scores)+1)]   
# The '+1' is because Python indexing starts from 0; so if a score of 0 is present in the data, the 0 index will be used for that. 

for i in range(len(unique_words)):
    temp = x1[x1['Word']==unique_words[i]]
    for j, word in temp.iterrows():
        scores_matrix[i][j] += 1  # Supposed to store the count for word i with score j
但这会产生以下错误:

IndexError                                Traceback (most recent call last)
<ipython-input-123-141ab9cd7847> in <module>()
     19     temp = x1[x1['Word']==unique_words[i]]
     20     for j, word in temp.iterrows():
---> 21         scores_matrix[i][j] += 1

IndexError: list index out of range
那么,我如何解决/修复这两个问题呢?

与sort=False和/或一起使用:


如果订单不重要,请使用:

最后使用以下标签选择:


好的,行了。我根本不知道这些!顺便问一下,我如何访问特定单词的特定分数的计数?类似于,
df['Dog']['1']
应该返回
2
df['Cat']['2']
应该返回
0
,等等。
scores_matrix['Dog'][1]
>>> 2

scores_matrix['Cat'][2]
>>> 0
df1 = df.groupby('Word', sort=False)['Score'].value_counts().unstack(fill_value=0)
df1 = df.groupby(['Word','Score'], sort=False).size().unstack(fill_value=0)

print (df1)
Score  1  2  3
Word          
Dog    2  1  1
Bird   1  1  1
Cat    1  0  2
df1 = pd.crosstab(df['Word'], df['Score'])
print (df1)
Score  1  2  3
Word          
Bird   1  1  1
Cat    1  0  2
Dog    2  1  1
print (df.loc['Cat', 2])
0