Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/329.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用熊猫按天分离数据集_Python_Pandas_Dataset - Fatal编程技术网

Python 使用熊猫按天分离数据集

Python 使用熊猫按天分离数据集,python,pandas,dataset,Python,Pandas,Dataset,我有一个像这样的数据集 "2018-05-30 21:26:43",20.61129150,-100.40933971 "2018-05-30 21:26:43",20.61127415,-100.41146822 "2018-06-02 21:56:12",21.15633228,-100.93766080 "2018-06-05 22:57:40",20.59734201,-100.38091286 "2018-06-05 22:57:40",20.59875096,-100.3782142

我有一个像这样的数据集

"2018-05-30 21:26:43",20.61129150,-100.40933971
"2018-05-30 21:26:43",20.61127415,-100.41146822
"2018-06-02 21:56:12",21.15633228,-100.93766080
"2018-06-05 22:57:40",20.59734201,-100.38091286
"2018-06-05 22:57:40",20.59875096,-100.37821426
"2018-06-06 20:56:22",20.61278120,-100.38446619
"2018-06-06 20:56:22",20.59865452,-100.37827264
"2018-06-06 21:57:15",20.59862012,-100.37817348
"2018-06-06 21:57:15",20.59864713,-100.37821263
"2018-06-06 21:57:15",20.59862915,-100.37825902
"2018-06-07 15:54:29",20.61280757,-100.39768857
"2018-06-07 15:54:29",20.61276216,-100.39769379
我想将我的数据分为几组,这样我就可以计算距离,并得出每天平均行驶的距离

我现在用我的日期栏来分隔它,如下所示:

col_names = ['date', 'latitude', 'longitude']
df = pd.read_csv('marco.csv', names=col_names, sep=',', skiprows=1)

# merge
m = df.reset_index().merge(df.reset_index(), on='date')
但是我想把它按天分开,这样我就可以得到

2018-05-30, 2018-06-05, 2018-06-06, 2018-06-07

我将如何处理这个问题?

正如尤卡所提到的,分组应该做到这一点。我将创建一个名为“day”的新列,其中只包含时间戳中的日期,按日期排序,按“日期”分组,然后计算每组中的行程

import pandas as pd

a = pd.DataFrame(
    [["2018-05-30 21:26:43",20.61129150,-100.40933971],
    ["2018-05-30 21:26:43",20.61127415,-100.41146822],
    ["2018-06-02 21:56:12",21.15633228,-100.93766080],
    ["2018-06-05 22:57:40",20.59734201,-100.38091286]], 
    columns=['date', 'lat', 'lng'])

a['date'] = pd.to_datetime(a['date'])


a['day'] = a['date'].dt.date

b = a.groupby('day')

# Loop over the groups and do whatever calculation you need
for tup in b:
    group = tup[0]
    df = tup[1]
    print df['lat'].sum()

你试过分组吗?看看pd-to-datetime