Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/316.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫基于多行和多个条件进行计算_Python_Pandas - Fatal编程技术网

Python 熊猫基于多行和多个条件进行计算

Python 熊猫基于多行和多个条件进行计算,python,pandas,Python,Pandas,我是熊猫队的诺富斯。需要计算每个人、每个位置的时间,并删除没有配对日期的行。 我的数据如下所示: Unit Name Location Date Time 0 K1 Somebody1 LOC1 2020-05-12 07:00 1 K1 Somebody1 LOC1 2020-05-12 20:10 2 K1 Somebody1 LOC1 2020-05-13 06:00 3 K1 Somebody1 LOC1 20

我是熊猫队的诺富斯。需要计算每个人、每个位置的时间,并删除没有配对日期的行。 我的数据如下所示:

Unit    Name    Location    Date    Time
0  K1  Somebody1    LOC1  2020-05-12  07:00
1  K1  Somebody1    LOC1  2020-05-12  20:10
2  K1  Somebody1    LOC1  2020-05-13  06:00
3  K1  Somebody1    LOC1  2020-05-13  20:00
4  K1  Somebody1    LOC1  2020-05-14  06:37
5  K1  Somebody1    LOC2  2020-05-15  07:00
6  K1  Somebody1    LOC2  2020-05-15  20:10
7  K1  Somebody1    LOC2  2020-05-16  06:00
8  K1  Somebody1    LOC2  2020-05-16  20:00
9  K1  Somebody1    LOC2  2020-05-17  06:37
10  K1  Somebody2    LOC2  2020-05-13  07:00
11  K1  Somebody2    LOC2  2020-05-14  10:10
12  K1  Somebody2    LOC2  2020-05-14  16:50
13  K1  Somebody2    LOC2  2020-05-15  05:36
14  K1  Somebody3    LOC1  2020-05-13  07:00
15  K1  Somebody3    LOC1  2020-05-14  10:10
16  K1  Somebody3    LOC1  2020-05-14  16:50
17  K1  Somebody3    LOC1  2020-05-15  05:36
我只想通过以下方式将时间转换为日期时间对象

df['Time'] = df['Time'].apply(lambda x: datetime.strptime(x,'%H:%M').time())
尝试使用数据透视表,分组方式,循环,我没有主意了。 我希望输出像这样:

LOC1
      Somebody1  2020-05-12  13h 10m
                 2020-05-13  14h 00m
TOTAL                        27h 00m
      Somebody2  date        hours
                 date        hours
TOTAL                        sum for somebody2
      Somebody3  date        hours
                 date        hours
TOTAL                        sum for somebody3

LOC2
      Somebody1  date        hours
                 date        hours
TOTAL                        sum for somebody1
      Somebody2  date        hours   
                 date        hours
TOTAL                        sum for somebody2

或者类似的东西

IIUC
groupby
先组合

import numpy as np
df['datetime'] = pd.to_datetime(df['Date'] + ' ' +  df['Time'])

df1 = df.groupby(['Name','Location', df['datetime'].dt.normalize()])\
                                  .agg(start=('datetime','first'),
                                   end=('datetime','last'))

df1['timespent'] = (df1['end'] - df1['start']) / np.timedelta64(1,'h')

# create total row.
m = df1.unstack(['Name','Location'])['timespent'].sum().unstack()
m = m.assign(TOTAL=m.sum(1)).stack().to_frame('timespent')



final = df1.drop(['start','end'],axis=1).combine_first(m)

#if you want to remove single entry days
final[final['timespent'] > 0]


您可以从grep开始收集每两行的时间,然后计算时间差。例如,将人名解析为一个列表,然后使用grep do:

for i in $(cat list-names);do grep $i a.csv | awk '{print$6}';done 
其中a.csv:

0  K1  Somebody1    LOC1  2020-05-12  17:00
1  K1  Somebody1    LOC1  2020-05-12  20:10
此外,要抓住小时数的差异,请执行以下操作:

awk '
    NR == 1{old = $6; next}     
    {print $6 - old; old = $6}  
' a.csv

编辑,因为我给出了错误的例子。你的建议真的很好,我试着改变groupby的顺序,把位置放在第一位,但结果不太好。它对一个人的所有工作进行了汇总,但我只需要对位置进行汇总
awk '
    NR == 1{old = $6; next}     
    {print $6 - old; old = $6}  
' a.csv