Python—使用标准计算数据帧行数的更快方法？_Python_Python 2.7_Pandas

Python—使用标准计算数据帧行数的更快方法？

python python-2.7 pandas

Python—使用标准计算数据帧行数的更快方法？,python,python-2.7,pandas,Python,Python 2.7,Pandas,我想计算每个箱子中数据帧行的数量，并列出一个计数列表我认为应该有比我更快的方法。你能给我一些建议吗 script.py import pandas binwidth = 10 data = pandas.read_csv('sample.csv', sep=' ', names=['time', 'value'], header=None, comment='#') mylist = [] for item in data.iterrows(): index = item[1][

我想计算每个箱子中数据帧行的数量，并列出一个计数列表

我认为应该有比我更快的方法。你能给我一些建议吗

script.py

import pandas

binwidth = 10
data = pandas.read_csv('sample.csv', sep=' ', names=['time', 'value'], header=None, comment='#')

mylist = []

for item in data.iterrows():
    index = item[1]['time']/binwidth
    if len(mylist) <= index:
        mylist.append(1)
    else:
        mylist[index] += 1

print mylist # which outputs [8, 4, 4]

我想这就行了：

# set the time column as index for the groupby function
df = pandas.read_csv('sample.csv', sep=' ', names=['time', 'value'], 
    header=None, comment='#', index_col=['time'])  

binwidth = 10
groupped_df = df.groupby(lambda x: int(x/binwidth)).count()
mylist = groupped_df['value'].tolist()

我想这就行了：

# set the time column as index for the groupby function
df = pandas.read_csv('sample.csv', sep=' ', names=['time', 'value'], 
    header=None, comment='#', index_col=['time'])  

binwidth = 10
groupped_df = df.groupby(lambda x: int(x/binwidth)).count()
mylist = groupped_df['value'].tolist()

使用

或者

或者，Numpy版本

In [1096]: np.bincount(df.time//10).tolist()
Out[1096]: [8L, 4L, 4L]

细节

In [1087]: df    
Out[1087]:       
    time value   
0      1     a   
1      2     b   
2      3     c   
3      4     d   
4      6     e   
5      7     f   
6      8     g   
7      9     h   
8     10     i   
9     12     j   
10    15     k   
11    17     l   
12    21     m   
13    22     n   
14    26     o   
15    29     p

使用

或者

或者，Numpy版本

In [1096]: np.bincount(df.time//10).tolist()
Out[1096]: [8L, 4L, 4L]

细节

In [1087]: df    
Out[1087]:       
    time value   
0      1     a   
1      2     b   
2      3     c   
3      4     d   
4      6     e   
5      7     f   
6      8     g   
7      9     h   
8     10     i   
9     12     j   
10    15     k   
11    17     l   
12    21     m   
13    22     n   
14    26     o   
15    29     p

您可以使用

pandas.cut执行此操作
import pandas

binwidth = 10
data = pandas.read_csv('sample.csv', sep=' ', names=['time', 'value'], header=None, comment='#')

max_bin_edge = int(np.ceil(data['time'].max()/binwidth)*binwidth) + 1
bin_edges = list(range(0, max_bin_edge, binwidth))

bins = pd.cut(data['time'], bins=bin_edges, right=False)

bin_counts = bins.groupby(bins).count()

print(bin_counts)

这也会给你垃圾箱的边缘
time
[0, 10)     8
[10, 20)    4
[20, 30)    4
Name: time, dtype: int64

您可以使用pandas.cut执行此操作
import pandas

binwidth = 10
data = pandas.read_csv('sample.csv', sep=' ', names=['time', 'value'], header=None, comment='#')

max_bin_edge = int(np.ceil(data['time'].max()/binwidth)*binwidth) + 1
bin_edges = list(range(0, max_bin_edge, binwidth))

bins = pd.cut(data['time'], bins=bin_edges, right=False)

bin_counts = bins.groupby(bins).count()

print(bin_counts)

这也会给你垃圾箱的边缘
time
[0, 10)     8
[10, 20)    4
[20, 30)    4
Name: time, dtype: int64

@对不起，描述不完整。我正在根据“时间”列合并数据。@BradSolomon抱歉描述不完整。我正在根据“时间”列对数据进行分类。