Python 熊猫：如何获得每行的百分比_Python_Pandas_Sum_Percentage_Series

Python 熊猫：如何获得每行的百分比

python pandas

Python 熊猫：如何获得每行的百分比,python,pandas,sum,percentage,series,Python,Pandas,Sum,Percentage,Series,当我使用pandasvalue\u count方法时，我得到以下数据： new_df['mark'].value_counts() 1 1349110 2 1606640 3 175629 4 790062 5 330978 我怎样才能得到像这样的每一行的百分比 1 1349110 31.7% 2 1606640 37.8% 3 175629 4.1% 4 790062 18.6% 5 330978 7.8% 我需要将每行除以这些数据的总和。我

当我使用pandas

value\u count

方法时，我得到以下数据：

new_df['mark'].value_counts()

1   1349110
2   1606640
3   175629
4   790062
5   330978

我怎样才能得到像这样的每一行的百分比

1   1349110 31.7%
2   1606640 37.8%
3   175629  4.1%
4   790062  18.6%
5   330978  7.8%

我需要将每行除以这些数据的总和。

我认为您需要：

#if output is Series, convert it to DataFrame
df = df.rename('a').to_frame()

df['per'] = (df.a * 100 / df.a.sum()).round(1).astype(str) + '%'

print (df)
         a    per
1  1349110  31.7%
2  1606640  37.8%
3   175629   4.1%
4   790062  18.6%
5   330978   7.8%

计时：

使用

sum

似乎比使用

value\u计数的两倍更快：
In [184]: %timeit (jez(s))
10 loops, best of 3: 38.9 ms per loop

In [185]: %timeit (pir(s))
10 loops, best of 3: 76 ms per loop

计时代码：
np.random.seed([3,1415])
s = pd.Series(np.random.choice(list('ABCDEFGHIJ'), 1000, p=np.arange(1, 11) / 55.))
s = pd.concat([s]*1000)#.reset_index(drop=True)

def jez(s):
    df = s.value_counts()
    df = df.rename('a').to_frame()
    df['per'] = (df.a * 100 / df.a.sum()).round(1).astype(str) + '%'
    return df

def pir(s):
    return pd.DataFrame({'a':s.value_counts(), 
                         'per':s.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'})

print (jez(s))
print (pir(s))


百分比
s.value_counts(normalize=True)

I    0.176
J    0.167
H    0.136
F    0.128
G    0.111
E    0.085
D    0.083
C    0.052
B    0.038
A    0.024
dtype: float64


我认为这里有一个更像蟒蛇的片段
def aspercent（列，小数=2）：
断言小数>=0
返回值（四舍五入（列*100，小数）.astype（str）+“%”）
aspercent（df['mark'].值计数（normalize=True），小数=1）

这将输出：
1   1349110 31.7%
2   1606640 37.8%
3   175629  4.1%
4   790062  18.6%
5   330978  7.8%

这还允许调整小数位数，然后调整pd.DataFrame（{'a'：s.value_counts（），'per'：s.value_counts（normalize=True））.mul（100）.round（1）.astype（str）+'%}）

？

counts = s.value_counts()
percent = counts / counts.sum()
fmt = '{:.1%}'.format
pd.DataFrame({'counts': counts, 'per': percent.map(fmt)})

   counts    per
I     176  17.6%
J     167  16.7%
H     136  13.6%
F     128  12.8%
G     111  11.1%
E      85   8.5%
D      83   8.3%
C      52   5.2%
B      38   3.8%
A      24   2.4%

1   1349110 31.7%
2   1606640 37.8%
3   175629  4.1%
4   790062  18.6%
5   330978  7.8%