Python 计算列中的特定值并将结果列成表格_Python_Pandas

Python 计算列中的特定值并将结果列成表格

python pandas

Python 计算列中的特定值并将结果列成表格,python,pandas,Python,Pandas,我有一个有1000万行的数据集。我想计算某些数字在“值”列中出现的次数，同时创建一列结果。具体地说，我想计算在值列中出现0和所有数字（直到100000）的次数。以前我使用excel并使用公式=Countif（A:A，第（a1）行））使用以下代码计算特定数字非常简单： df.loc[df.Values == '21288', 'Values'].count() 我害怕在尝试以下代码之前使我的计算机崩溃，因此我会要求您告诉我它是否正确 import pandas as pd df = pd.re

我有一个有1000万行的数据集。我想计算某些数字在“值”列中出现的次数，同时创建一列结果。具体地说，我想计算在

值

列中出现0和所有数字（直到100000）的次数。以前我使用excel并使用公式

=Countif（A:A，第（a1）行））

使用以下代码计算特定数字非常简单：

df.loc[df.Values == '21288', 'Values'].count()

我害怕在尝试以下代码之前使我的计算机崩溃，因此我会要求您告诉我它是否正确

import pandas as pd
df = pd.read_csv('Hello world')

for index in df.index:
    df['Counts'] = df.loc[df.New_Value == df.loc[index,'New_Value'], 'New_Value'].count()

您可以使用：

输入数据：

值\u计数
：

>>> df.Values.value_counts()
#output
26972    2
55795    2
28446    1
78957    1
54796    1
32698    1
75894    1
78469    1
28784    1
Name: Values, dtype: int64

value_df = df.Values.value_counts().to_frame().astype(int)
#results only below 40000
value_df[value_df.index < 40000]

    Values  Count
0    54796      1
1    78957      1
2    75894      1
3    78469      1
4    26972      2
5    28446      1
6    28784      1
7    55795      2
8    32698      1
9    55795      2
10   26972      2

df = pd.read_csv('V100.csv',delimiter=',')
df = df.apply(pd.to_numeric, args=('coerce',)).dropna()
df = df.astype(int)

print(df['Fives'].value_counts())
print(df.loc[df.Fives == 9100, 'Fives'].count())

过滤
值\u计数
结果：

>>> df.Values.value_counts()
#output
26972    2
55795    2
28446    1
78957    1
54796    1
32698    1
75894    1
78469    1
28784    1
Name: Values, dtype: int64

value_df = df.Values.value_counts().to_frame().astype(int)
#results only below 40000
value_df[value_df.index < 40000]

    Values  Count
0    54796      1
1    78957      1
2    75894      1
3    78469      1
4    26972      2
5    28446      1
6    28784      1
7    55795      2
8    32698      1
9    55795      2
10   26972      2

df = pd.read_csv('V100.csv',delimiter=',')
df = df.apply(pd.to_numeric, args=('coerce',)).dropna()
df = df.astype(int)

print(df['Fives'].value_counts())
print(df.loc[df.Fives == 9100, 'Fives'].count())

如果要向原始数据帧添加另一列

Count

#creating a dictionary based on the value counts
>>> d = df.Values.value_counts().to_dict()

#mapping the count to the Values columns
>>> df['Count'] = df.Values.map(d)

输出：

>>> df.Values.value_counts()
#output
26972    2
55795    2
28446    1
78957    1
54796    1
32698    1
75894    1
78469    1
28784    1
Name: Values, dtype: int64

value_df = df.Values.value_counts().to_frame().astype(int)
#results only below 40000
value_df[value_df.index < 40000]

    Values  Count
0    54796      1
1    78957      1
2    75894      1
3    78469      1
4    26972      2
5    28446      1
6    28784      1
7    55795      2
8    32698      1
9    55795      2
10   26972      2

df = pd.read_csv('V100.csv',delimiter=',')
df = df.apply(pd.to_numeric, args=('coerce',)).dropna()
df = df.astype(int)

print(df['Fives'].value_counts())
print(df.loc[df.Fives == 9100, 'Fives'].count())

使用您的方法进行确认：

>>> df.loc[df.Values == 26972, 'Values'].count()
2
>>> df.loc[df.Values == 55795, 'Values'].count()
2

对于您的
V100.csv
：

>>> df.Values.value_counts()
#output
26972    2
55795    2
28446    1
78957    1
54796    1
32698    1
75894    1
78469    1
28784    1
Name: Values, dtype: int64

value_df = df.Values.value_counts().to_frame().astype(int)
#results only below 40000
value_df[value_df.index < 40000]

    Values  Count
0    54796      1
1    78957      1
2    75894      1
3    78469      1
4    26972      2
5    28446      1
6    28784      1
7    55795      2
8    32698      1
9    55795      2
10   26972      2

df = pd.read_csv('V100.csv',delimiter=',')
df = df.apply(pd.to_numeric, args=('coerce',)).dropna()
df = df.astype(int)

print(df['Fives'].value_counts())
print(df.loc[df.Fives == 9100, 'Fives'].count())

请注意，

的计数相同。

检查值\u计数不确定。让我以我给出的示例中的计数

为例。它只有一个，因此应该返回1。但是如果我使用

df.loc[df.Fives=='54796'，'Fives'].count（）

，结果就不一样了。这两个函数在我的真实数据中返回不同的计数。不知道元素类型是否出现问题。我确信

value\u counts

不会给出错误的结果。我将在另一个小测试df上尝试您的方法。添加了一个更新以显示结果是相同的。依靠名为

Fives

的列，我需要访问该链接。您可以尝试几件事：首先，尝试使用

df=df.astype（int）

将df转换为int数据类型。另外，尝试按df.loc[df.Fives==54796，'Fives'].count（）计数（）。请注意，我删除了数字的引号。