Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/279.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫在多列上计数_Python_Pandas_Graphlab - Fatal编程技术网

Python 熊猫在多列上计数

Python 熊猫在多列上计数,python,pandas,graphlab,Python,Pandas,Graphlab,我有一个像这样的数据框 Measure1 Measure2 Measure3 ... 0 1 3 1 3 2 3 0 我想计算列上出现的值,以生成: Measure Count Percentage 0 2 0.25 1 2 0.25 2 1 0.125 3 3 0.373 与 我只得到第一列(实际上使用graphlab包,

我有一个像这样的数据框

Measure1 Measure2 Measure3 ...
0        1         3
1        3         2
3        0        
我想计算列上出现的值,以生成:

Measure Count Percentage
0       2     0.25
1       2     0.25
2       1     0.125
3       3     0.373

我只得到第一列(实际上使用graphlab包,但我更喜欢pandas)


有人能帮我吗

您可以通过使用
ravel
value\u counts
展平df来生成计数,由此您可以构建最终df:

In [230]:
import io
import pandas as pd
​
t="""Measure1 Measure2 Measure3
0        1         3
1        3         2
3        0        0"""
​
df = pd.read_csv(io.StringIO(t), sep='\s+')
df

Out[230]:
   Measure1  Measure2  Measure3
0         0         1         3
1         1         3         2
2         3         0         0

In [240]:    
count = pd.Series(df.squeeze().values.ravel()).value_counts()
pd.DataFrame({'Measure': count.index, 'Count':count.values, 'Percentage':(count/count.sum()).values})

Out[240]:
   Count  Measure  Percentage
0      3        3    0.333333
1      3        0    0.333333
2      2        1    0.222222
3      1        2    0.111111

我插入了一个
0
,只是为了使df形状正确,但是你应该得到点

,当这个部分是更大df的一部分时?所以我需要指定列?当使用:count=pd.Series(cdss_数据['measure1','measure2'].squage().values.ravel()).value_counts()时,我得到一个错误(cdss_数据是我的df),你需要双下标
count=pd.Series(cdss_数据['measure1','measure2'].squage().values.ravel()).value_count‌​s()
太棒了!有没有一种方法可以强制列的顺序和行的顺序?你可以使用奇特的索引,例如:
desired\u col\u list=[a,b,c,d]
df=df.ix[:,desired\u col\u list]
In [68]: df=DataFrame({'m1':[0,1,3], 'm2':[1,3,0], 'm3':[3,2, np.nan]})

In [69]: df
Out[69]:
   m1  m2   m3
0   0   1  3.0
1   1   3  2.0
2   3   0  NaN

In [70]: df=df.apply(Series.value_counts).sum(1).to_frame(name='Count')

In [71]: df
Out[71]:
     Count
0.0    2.0
1.0    2.0
2.0    1.0
3.0    3.0

In [72]: df.index.name='Measure'

In [73]: df
Out[73]:
         Count
Measure
0.0        2.0
1.0        2.0
2.0        1.0
3.0        3.0

In [74]: df['Percentage']=df.Count.div(df.Count.sum())

In [75]: df
Out[75]:
         Count  Percentage
Measure
0.0        2.0       0.250
1.0        2.0       0.250
2.0        1.0       0.125
3.0        3.0       0.375
In [68]: df=DataFrame({'m1':[0,1,3], 'm2':[1,3,0], 'm3':[3,2, np.nan]})

In [69]: df
Out[69]:
   m1  m2   m3
0   0   1  3.0
1   1   3  2.0
2   3   0  NaN

In [70]: df=df.apply(Series.value_counts).sum(1).to_frame(name='Count')

In [71]: df
Out[71]:
     Count
0.0    2.0
1.0    2.0
2.0    1.0
3.0    3.0

In [72]: df.index.name='Measure'

In [73]: df
Out[73]:
         Count
Measure
0.0        2.0
1.0        2.0
2.0        1.0
3.0        3.0

In [74]: df['Percentage']=df.Count.div(df.Count.sum())

In [75]: df
Out[75]:
         Count  Percentage
Measure
0.0        2.0       0.250
1.0        2.0       0.250
2.0        1.0       0.125
3.0        3.0       0.375