Python计数与概率_Python_Numpy_Pandas

Python计数与概率

python numpy pandas

Python计数与概率,python,numpy,pandas,Python,Numpy,Pandas,我有以下数据： Name Item peter apple peter apple Ben banana peter banana 我想打印 frequency of what peter eat : apple 2 banana 1 这是我的密码 u, count = np.unique(data['Item'], return_counts=True) process = u[np.where(data['Name']= 'peter')[0]] pro

我有以下数据：

Name    Item
peter   apple
peter   apple
Ben     banana
peter   banana

我想打印

frequency of what peter eat :
apple 2 
banana 1

这是我的密码

u, count = np.unique(data['Item'], return_counts=True)

process = u[np.where(data['Name']= 'peter')[0]]

process2 = dict(Counter(process))
print "Item\frequency"

for k, v in process2.items():
print '{0:.0f}\t{1}'.format(k,v)

但它有错误我还想计算彼得下次吃苹果的概率

但是我没有任何想法，没有任何建议？

我对熊猫或NumPy不太熟悉，但我看到的一个问题是：

data['Name'] = 'peter'

是一个赋值语句

然而，您可能需要检查平等性：

data['Name'] == 'peter'

另外，除非您的缩进在粘贴代码时出错，否则您需要缩进for语句的主体，否则在清除此错误后，您将发现另一个错误

for k, v in process2.items():
    print '{0:.0f}\t{1}'.format(k,v)

您得到的错误正如另一个答案所示，您不能使用

data['Name']='peter'

作为函数参数，您实际上打算使用-

np.where（data['Name']='peter'）

但是，考虑到您使用的是pandas

DataFrame

，我猜

data

是pandas

DataFrame

。在这种情况下，您真正想要的东西可以使用

DataFrame.groupby

实现。范例-

data[data['Name']=='peter'].groupby('Item').count()

In [30]: df
Out[30]:
   0  1
0  1  2

In [31]: df[0]=='asd'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-31-e7bacd79d320> in <module>()
----> 1 df[0]=='asd'

C:\Anaconda3\lib\site-packages\pandas\core\ops.py in wrapper(self, other, axis)
    612
    613             # scalars
--> 614             res = na_op(values, other)
    615             if np.isscalar(res):
    616                 raise TypeError('Could not compare %s type with Series'

C:\Anaconda3\lib\site-packages\pandas\core\ops.py in na_op(x, y)
    566                 result = getattr(x, name)(y)
    567                 if result is NotImplemented:
--> 568                     raise TypeError("invalid type comparison")
    569             except (AttributeError):
    570                 result = op(x, y)

TypeError: invalid type comparison

演示-

In [7]: data[data['Name']=='peter'].groupby('Item').count()
Out[7]:
        Name
Item
apple      2
banana     1

In [24]: df = data[data['Name']=='peter'].groupby('Item').count()

In [25]: for fruit,count in df['Name'].iteritems():
   ....:     print('{0}\t{1}'.format(fruit,count))
   ....:
apple   2
banana  1

如果您想将其打印成循环，可以使用-

df = data[data['Name']=='peter'].groupby('Item').count()
for fruit,count in df['Name'].iteritems():
    print('{0}\t{1}'.format(fruit,count))

演示-

In [7]: data[data['Name']=='peter'].groupby('Item').count()
Out[7]:
        Name
Item
apple      2
banana     1

In [24]: df = data[data['Name']=='peter'].groupby('Item').count()

In [25]: for fruit,count in df['Name'].iteritems():
   ....:     print('{0}\t{1}'.format(fruit,count))
   ....:
apple   2
banana  1

对于OP得到的更新版本，他得到了以下错误-

TypeError:类型比较无效

这种情况下会出现问题，因为在OP的实际数据中，列具有数字值（float/int），但OP将这些值与字符串进行比较，因此得到了错误。范例-

data[data['Name']=='peter'].groupby('Item').count()

In [30]: df
Out[30]:
   0  1
0  1  2

In [31]: df[0]=='asd'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-31-e7bacd79d320> in <module>()
----> 1 df[0]=='asd'

C:\Anaconda3\lib\site-packages\pandas\core\ops.py in wrapper(self, other, axis)
    612
    613             # scalars
--> 614             res = na_op(values, other)
    615             if np.isscalar(res):
    616                 raise TypeError('Could not compare %s type with Series'

C:\Anaconda3\lib\site-packages\pandas\core\ops.py in na_op(x, y)
    566                 result = getattr(x, name)(y)
    567                 if result is NotImplemented:
--> 568                     raise TypeError("invalid type comparison")
    569             except (AttributeError):
    570                 result = op(x, y)

TypeError: invalid type comparison

[30]中的

：df
出[30]：
0  1
0  1  2
在[31]：df[0]=“asd”
---------------------------------------------------------------------------
TypeError回溯（最近一次调用上次）
在（）
---->1 df[0]=“asd”
包装中的C:\Anaconda3\lib\site packages\pandas\core\ops.py（self、other、axis）
612
613#标量
-->614 res=na_op（值，其他）
615如果np.isscalar（res）：
616 raise TypeError（'无法将%s类型与序列进行比较'
C:\Anaconda3\lib\site packages\pandas\core\ops.py在na_op（x，y）中
566结果=getattr（x，name）（y）
567如果结果未执行：
-->568 raise type错误（“无效类型比较”）
569除外（属性错误）：
570结果=op（x，y）
TypeError:类型比较无效

如果列是数字，则应与数字值进行比较，而不是与字符串进行比较。

如果您没有死心使用numpy：

import collections
import csv

data = collections.defaultdict(lambda: collections.defaultdict(int))
with open('path/to/file') as infile:
    infile.readline()  # fet rid of the header
    for name, food in csv.reader(infile):
        data[name][food] += 1

for name, d in data.iteritems():
    print("frequency of what" name, "ate:")
    total = float(sum(d.values()))
    for food, count in d.iteritems():
        print(food, count, "probability:", count/total)

您可以按名称分组并使用

值\u计数

：

In [11]: df.groupby("Name")["Item"].value_counts()
Out[11]:
Name
Ben    banana    1
peter  apple     2
       banana    1
dtype: int64

您可能会将这些内容取消堆叠到列中：

In [12]: df.groupby("Name")["Item"].value_counts().unstack(1)
Out[12]:
       apple  banana
Name
Ben      NaN       1
peter      2       1

In [13]: res = df.groupby("Name")["Item"].value_counts().unstack(1).fillna(0)

In [13]: res
Out[13]:
       apple  banana
Name
Ben        0       1
peter      2       1

要将概率除以总和，请执行以下操作：

In [14]: res = res.div(res.sum(axis=1), axis=0)

In [15]: res
Out[15]:
          apple    banana
Name
Ben    0.000000  1.000000
peter  0.666667  0.333333

彼得下次吃苹果的概率：

In [16]: res.loc["peter", "apple"]
Out[16]: 0.66666666666666663

错误：关键字不能是表达式什么是

数据

？Pandas数据帧？Numpy记录数组？您得到的错误是哪一行？我使用Pandas读取数据文件Process=u[np.where（data['Name'='peter'）[0]]这有错误谢谢您的回答，我更改了，它得到了“无效类型比较”错误数据的类型是什么[“Name”]？尝试打印typeof（data[“Name”]）。感谢您的回答，但它有“无效类型比较”错误。给出了什么？您的方法？如果是，为什么要使用它？如果您使用pandas，您应该使用

pandas

方法和函数。就像我上面给出的。是的，因此它是一个pandas数据帧，请使用我给出的方法。谢谢，但是“无效类型比较”第行错误“数据[data['Name']=='peter']”。groupby（'Item'）。count（）“可能是，使用真实数据而不是

'peter'

。感谢回答，但我需要使用numpy。无论如何，谢谢：）