Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/360.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何透视数据帧,但将重复项保留为重复项?_Python_Pandas_Dataframe_Pivot - Fatal编程技术网

Python 如何透视数据帧,但将重复项保留为重复项?

Python 如何透视数据帧,但将重复项保留为重复项?,python,pandas,dataframe,pivot,Python,Pandas,Dataframe,Pivot,我有一个数据帧,其中包含一些时钟输入/时钟输出时间,如下所示: 日期输入/输出Id 0 2020-11-04 14:25:25在912907 112020-11-0414:25:43在1111111 2020-11-04 14:26:20出1111111 3 2020-11-04 14:26:29出912907 4 2020-11-05 14:25:25在912907 52020-11-0514:26:29出912907 我想把它改成这样: In/Out Id In-Out 0 1

我有一个数据帧,其中包含一些时钟输入/时钟输出时间,如下所示:

日期输入/输出Id
0 2020-11-04 14:25:25在912907
112020-11-0414:25:43在1111111
2020-11-04 14:26:20出1111111
3 2020-11-04 14:26:29出912907
4 2020-11-05 14:25:25在912907
52020-11-0514:26:29出912907
我想把它改成这样:

In/Out Id In-Out
0       1111111  2020-11-04 14:25:43  2020-11-04 14:26:20
1        912907  2020-11-04 14:25:25  2020-11-04 14:26:29
2        912907  2020-11-05 14:25:25  2020-11-05 14:26:29
我已尝试以Id为轴,但重复的Id导致ValueError。我怎样才能做到这一点

import pandas

df = pandas.DataFrame(data={
    'Date':['2020-11-04 14:25:25','2020-11-04 14:25:43','2020-11-04 14:26:20','2020-11-04 14:26:29','2020-11-05 14:25:25','2020-11-05 14:26:29'],
    'In/Out':['In','In','Out','Out','In','Out'],
    'Id':['912907','1111111','1111111','912907','912907','912907']
})

print(df)

print(df.drop(index=[4,5]).pivot(index='Id',columns='In/Out',values='Date').reset_index())

try:
    print(df.pivot(index='Id',columns='In/Out',values='Date').reset_index())
except ValueError:
    print('ValueError: Index contains duplicate entries, cannot reshape')

一种方法是为每个输入/输出会话分配一个唯一的编号。我们可以通过按人员和时间对访问进行排序来实现这一点,这样每个“In”后面都跟着它的“Out”。然后我们可以取“In”的行号,取“Out”的行号减去1,并以此为中心

df = df.sort_values(by=['Id','Date']).reset_index(drop=True).reset_index()
# now the column named 'index' is equal to the row number
df.loc[ df['In/Out']=='Out' , 'index' ] -= 1

print(df.pivot(index=['index','Id'],columns='In/Out',values='Date').reset_index())
印刷品:

In/Out索引Id In-Out
0           0  1111111  2020-11-04 14:25:43  2020-11-04 14:26:20
1           2   912907  2020-11-04 14:25:25  2020-11-04 14:26:29
2           4   912907  2020-11-05 14:25:25  2020-11-05 14:26:29

一种方法是为每个输入/输出会话分配一个唯一的编号。我们可以通过按人员和时间对访问进行排序来实现这一点,这样每个“In”后面都跟着它的“Out”。然后我们可以取“In”的行号,取“Out”的行号减去1,并以此为中心

df = df.sort_values(by=['Id','Date']).reset_index(drop=True).reset_index()
# now the column named 'index' is equal to the row number
df.loc[ df['In/Out']=='Out' , 'index' ] -= 1

print(df.pivot(index=['index','Id'],columns='In/Out',values='Date').reset_index())
印刷品:

In/Out索引Id In-Out
0           0  1111111  2020-11-04 14:25:43  2020-11-04 14:26:20
1           2   912907  2020-11-04 14:25:25  2020-11-04 14:26:29
2           4   912907  2020-11-05 14:25:25  2020-11-05 14:26:29

您还可以使用
groupby().cumcount
枚举相关的行顺序:

# chain with `reset_index` if you want
(df.assign(index=df.sort_values(['Date']).groupby(['Id','In/Out']).cumcount())
   .pivot(index=['Id','index'], columns='In/Out', values="Date")
)
输出:

In/Out                          In                  Out
Id      index                                          
912907  0      2020-11-04 14:25:25  2020-11-04 14:26:29
        1      2020-11-05 14:25:25  2020-11-05 14:26:29
1111111 0      2020-11-04 14:25:43  2020-11-04 14:26:20

您还可以使用
groupby().cumcount
枚举相对行顺序:

# chain with `reset_index` if you want
(df.assign(index=df.sort_values(['Date']).groupby(['Id','In/Out']).cumcount())
   .pivot(index=['Id','index'], columns='In/Out', values="Date")
)
输出:

In/Out                          In                  Out
Id      index                                          
912907  0      2020-11-04 14:25:25  2020-11-04 14:26:29
        1      2020-11-05 14:25:25  2020-11-05 14:26:29
1111111 0      2020-11-04 14:25:43  2020-11-04 14:26:20

不错,我更喜欢
cumcount
unstack
一起使用,但这感觉更明确
df.set_index([df.groupby(['Id','In/Out']).cumcount(),'Id','In/Out']).unstack(-1)
nice,我更喜欢
cumcount
unstack
一起使用,但这感觉更明确<代码>df.set_索引([df.groupby(['Id','In/Out']).cumcount(),'Id','In/Out'])。取消堆栈(-1)请参见问题
10
10.1
请参见问题
10
10.1