Python 将多行分配给表中的一个索引
我在Pandas中有一个数据框,看起来像这样:Python 将多行分配给表中的一个索引,python,pandas,dataframe,Python,Pandas,Dataframe,我在Pandas中有一个数据框,看起来像这样: Activity Name Activity Start Activity End 0 Phone 04:00 08:00 1 Lunch 08:00 08:30 2 Coffee 08:30 08:45 3
Activity Name Activity Start Activity End
0 Phone 04:00 08:00
1 Lunch 08:00 08:30
2 Coffee 08:30 08:45
3 Phone 08:45 10:30
4 WrittenSupport 10:30 12:30
5 Phone 04:00 08:00
6 Lunch 08:00 08:30
7 Coffee 08:30 08:45
8 Phone 08:45 09:00
9 Phone 06:00 09:00
Activity Name Activity Start Activity End
0 Phone 04:00 08:00
Lunch 08:00 08:30
Coffee 08:30 08:45
Phone 08:45 10:30
WrittenSupport 10:30 12:30
1 Phone 04:00 08:00
Lunch 08:00 08:30
Coffee 08:30 08:45
Phone 08:45 09:00
Phone 06:00 09:00
import pandas as pd
# This is the dataframe data with activities you got from a single agent
agent_1 = [['Phone', 'Phone', 'Coffee', 'Lunch', 'Phone', 'Phone', 'Lunch', 'Lunch'],
['04:00', '08:30', '10:30', '04:00', '10:30', '04:00', '08:30', '10:30']]
# This is the dataframe data from a second agent
agent_2 = [['Phone', 'Pooping', 'Coffee', 'Lunch', 'Phone', 'Meeting', 'Lunch', 'Lunch'],
['08:45', '08:50', '10:30', '04:00', '10:30', '04:00', '08:30', '10:30']]
# We create the dataframe for agent 1
df1 = pd.DataFrame(agent_1).T
df1.columns = ['activity', 'time']
# We create the dataframe for agent 2
df2 = pd.DataFrame(agent_2).T
df2.columns = ['activity', 'time']
# Now we have to dataframes we can't really put together
print(df1)
print("----")
print(df2)
print("----")
# So we should give each dataframe a column with its agent.
df1['agent'] = "Agent_1"
df2['agent'] = "Agent_2"
# Now each dataframe has data on its agent
print(df1)
print("----")
print(df2)
print("----")
# Let's combine them
overview = pd.concat([df1, df2])
print(overview)
print("----")
# To make it even better, we could make a multi-index so we can index both agents AND activities
overview.set_index(['agent', 'activity'], inplace=True)
print(overview)
times = [int(x[1][:2]) for x in your_array]
previous = 0
index=[1]
next_agent= 2
for time in times:
if time >= previous:
index.append(‘´)
else:
index.append(next_agent)
next_agent+=1
previous = time
我的数据框中的数据描述了在轮班期间分配给代理的不同活动。问题是,另一个带有代理的数据框只有57个名称,而通常有4-5个活动分配给一个人。当我合并我的数据帧时,我最终得到57个代理和265个活动,它们显然与指定的人不匹配
有帮助的:每人工作8小时
如何将其转换为如下所示:
Activity Name Activity Start Activity End
0 Phone 04:00 08:00
1 Lunch 08:00 08:30
2 Coffee 08:30 08:45
3 Phone 08:45 10:30
4 WrittenSupport 10:30 12:30
5 Phone 04:00 08:00
6 Lunch 08:00 08:30
7 Coffee 08:30 08:45
8 Phone 08:45 09:00
9 Phone 06:00 09:00
Activity Name Activity Start Activity End
0 Phone 04:00 08:00
Lunch 08:00 08:30
Coffee 08:30 08:45
Phone 08:45 10:30
WrittenSupport 10:30 12:30
1 Phone 04:00 08:00
Lunch 08:00 08:30
Coffee 08:30 08:45
Phone 08:45 09:00
Phone 06:00 09:00
import pandas as pd
# This is the dataframe data with activities you got from a single agent
agent_1 = [['Phone', 'Phone', 'Coffee', 'Lunch', 'Phone', 'Phone', 'Lunch', 'Lunch'],
['04:00', '08:30', '10:30', '04:00', '10:30', '04:00', '08:30', '10:30']]
# This is the dataframe data from a second agent
agent_2 = [['Phone', 'Pooping', 'Coffee', 'Lunch', 'Phone', 'Meeting', 'Lunch', 'Lunch'],
['08:45', '08:50', '10:30', '04:00', '10:30', '04:00', '08:30', '10:30']]
# We create the dataframe for agent 1
df1 = pd.DataFrame(agent_1).T
df1.columns = ['activity', 'time']
# We create the dataframe for agent 2
df2 = pd.DataFrame(agent_2).T
df2.columns = ['activity', 'time']
# Now we have to dataframes we can't really put together
print(df1)
print("----")
print(df2)
print("----")
# So we should give each dataframe a column with its agent.
df1['agent'] = "Agent_1"
df2['agent'] = "Agent_2"
# Now each dataframe has data on its agent
print(df1)
print("----")
print(df2)
print("----")
# Let's combine them
overview = pd.concat([df1, df2])
print(overview)
print("----")
# To make it even better, we could make a multi-index so we can index both agents AND activities
overview.set_index(['agent', 'activity'], inplace=True)
print(overview)
times = [int(x[1][:2]) for x in your_array]
previous = 0
index=[1]
next_agent= 2
for time in times:
if time >= previous:
index.append(‘´)
else:
index.append(next_agent)
next_agent+=1
previous = time
如果代理和活动有单独的行,则可以创建如下多索引:
Activity Name Activity Start Activity End
0 Phone 04:00 08:00
1 Lunch 08:00 08:30
2 Coffee 08:30 08:45
3 Phone 08:45 10:30
4 WrittenSupport 10:30 12:30
5 Phone 04:00 08:00
6 Lunch 08:00 08:30
7 Coffee 08:30 08:45
8 Phone 08:45 09:00
9 Phone 06:00 09:00
Activity Name Activity Start Activity End
0 Phone 04:00 08:00
Lunch 08:00 08:30
Coffee 08:30 08:45
Phone 08:45 10:30
WrittenSupport 10:30 12:30
1 Phone 04:00 08:00
Lunch 08:00 08:30
Coffee 08:30 08:45
Phone 08:45 09:00
Phone 06:00 09:00
import pandas as pd
# This is the dataframe data with activities you got from a single agent
agent_1 = [['Phone', 'Phone', 'Coffee', 'Lunch', 'Phone', 'Phone', 'Lunch', 'Lunch'],
['04:00', '08:30', '10:30', '04:00', '10:30', '04:00', '08:30', '10:30']]
# This is the dataframe data from a second agent
agent_2 = [['Phone', 'Pooping', 'Coffee', 'Lunch', 'Phone', 'Meeting', 'Lunch', 'Lunch'],
['08:45', '08:50', '10:30', '04:00', '10:30', '04:00', '08:30', '10:30']]
# We create the dataframe for agent 1
df1 = pd.DataFrame(agent_1).T
df1.columns = ['activity', 'time']
# We create the dataframe for agent 2
df2 = pd.DataFrame(agent_2).T
df2.columns = ['activity', 'time']
# Now we have to dataframes we can't really put together
print(df1)
print("----")
print(df2)
print("----")
# So we should give each dataframe a column with its agent.
df1['agent'] = "Agent_1"
df2['agent'] = "Agent_2"
# Now each dataframe has data on its agent
print(df1)
print("----")
print(df2)
print("----")
# Let's combine them
overview = pd.concat([df1, df2])
print(overview)
print("----")
# To make it even better, we could make a multi-index so we can index both agents AND activities
overview.set_index(['agent', 'activity'], inplace=True)
print(overview)
times = [int(x[1][:2]) for x in your_array]
previous = 0
index=[1]
next_agent= 2
for time in times:
if time >= previous:
index.append(‘´)
else:
index.append(next_agent)
next_agent+=1
previous = time
输出:
activity time
0 Phone 04:00
1 Phone 08:30
2 Coffee 10:30
3 Lunch 04:00
4 Phone 10:30
5 Phone 04:00
6 Lunch 08:30
7 Lunch 10:30
----
activity time
0 Phone 08:45
1 Pooping 08:50
2 Coffee 10:30
3 Lunch 04:00
4 Phone 10:30
5 Meeting 04:00
6 Lunch 08:30
7 Lunch 10:30
----
activity time agent
0 Phone 04:00 Agent_1
1 Phone 08:30 Agent_1
2 Coffee 10:30 Agent_1
3 Lunch 04:00 Agent_1
4 Phone 10:30 Agent_1
5 Phone 04:00 Agent_1
6 Lunch 08:30 Agent_1
7 Lunch 10:30 Agent_1
----
activity time agent
0 Phone 08:45 Agent_2
1 Pooping 08:50 Agent_2
2 Coffee 10:30 Agent_2
3 Lunch 04:00 Agent_2
4 Phone 10:30 Agent_2
5 Meeting 04:00 Agent_2
6 Lunch 08:30 Agent_2
7 Lunch 10:30 Agent_2
----
activity time agent
0 Phone 04:00 Agent_1
1 Phone 08:30 Agent_1
2 Coffee 10:30 Agent_1
3 Lunch 04:00 Agent_1
4 Phone 10:30 Agent_1
5 Phone 04:00 Agent_1
6 Lunch 08:30 Agent_1
7 Lunch 10:30 Agent_1
0 Phone 08:45 Agent_2
1 Pooping 08:50 Agent_2
2 Coffee 10:30 Agent_2
3 Lunch 04:00 Agent_2
4 Phone 10:30 Agent_2
5 Meeting 04:00 Agent_2
6 Lunch 08:30 Agent_2
7 Lunch 10:30 Agent_2
----
time
agent activity
Agent_1 Phone 04:00
Phone 08:30
Coffee 10:30
Lunch 04:00
Phone 10:30
Phone 04:00
Lunch 08:30
Lunch 10:30
Agent_2 Phone 08:45
Pooping 08:50
Coffee 10:30
Lunch 04:00
Phone 10:30
Meeting 04:00
Lunch 08:30
Lunch 10:30
也许可以尝试创建一个不同索引的列表,如下所示:
Activity Name Activity Start Activity End
0 Phone 04:00 08:00
1 Lunch 08:00 08:30
2 Coffee 08:30 08:45
3 Phone 08:45 10:30
4 WrittenSupport 10:30 12:30
5 Phone 04:00 08:00
6 Lunch 08:00 08:30
7 Coffee 08:30 08:45
8 Phone 08:45 09:00
9 Phone 06:00 09:00
Activity Name Activity Start Activity End
0 Phone 04:00 08:00
Lunch 08:00 08:30
Coffee 08:30 08:45
Phone 08:45 10:30
WrittenSupport 10:30 12:30
1 Phone 04:00 08:00
Lunch 08:00 08:30
Coffee 08:30 08:45
Phone 08:45 09:00
Phone 06:00 09:00
import pandas as pd
# This is the dataframe data with activities you got from a single agent
agent_1 = [['Phone', 'Phone', 'Coffee', 'Lunch', 'Phone', 'Phone', 'Lunch', 'Lunch'],
['04:00', '08:30', '10:30', '04:00', '10:30', '04:00', '08:30', '10:30']]
# This is the dataframe data from a second agent
agent_2 = [['Phone', 'Pooping', 'Coffee', 'Lunch', 'Phone', 'Meeting', 'Lunch', 'Lunch'],
['08:45', '08:50', '10:30', '04:00', '10:30', '04:00', '08:30', '10:30']]
# We create the dataframe for agent 1
df1 = pd.DataFrame(agent_1).T
df1.columns = ['activity', 'time']
# We create the dataframe for agent 2
df2 = pd.DataFrame(agent_2).T
df2.columns = ['activity', 'time']
# Now we have to dataframes we can't really put together
print(df1)
print("----")
print(df2)
print("----")
# So we should give each dataframe a column with its agent.
df1['agent'] = "Agent_1"
df2['agent'] = "Agent_2"
# Now each dataframe has data on its agent
print(df1)
print("----")
print(df2)
print("----")
# Let's combine them
overview = pd.concat([df1, df2])
print(overview)
print("----")
# To make it even better, we could make a multi-index so we can index both agents AND activities
overview.set_index(['agent', 'activity'], inplace=True)
print(overview)
times = [int(x[1][:2]) for x in your_array]
previous = 0
index=[1]
next_agent= 2
for time in times:
if time >= previous:
index.append(‘´)
else:
index.append(next_agent)
next_agent+=1
previous = time
然后设置df:
df= DataFrame(your_array, index=index, columns=column)
考虑以下数据(添加一些用于验证):
使用以下命令:
df['index_col']=df[~df.duplicated('Activity Name',keep=False)].expanding().count().iloc[:,1]
df_new=df.set_index(df.index_col.ffill().fillna(0)).rename_axis(None).drop('index_col',1)
print(df_new)
Activity Name Activity Start Activity End
0.0 Phone 04:00:00 08:00:00
0.0 Lunch 08:00:00 08:30:00
0.0 Coffee 08:30:00 08:45:00
0.0 Phone 08:45:00 10:30:00
1.0 WrittenSupport 10:30:00 12:30:00
1.0 Phone 04:00:00 08:00:00
1.0 Lunch 08:00:00 08:30:00
1.0 Coffee 08:30:00 08:45:00
1.0 Phone 08:45:00 09:00:00
1.0 Phone 06:00:00 09:00:00
2.0 Someother Name 10:30:00 12:30:00
2.0 Phone 04:00:00 08:00:00
2.0 Lunch 08:00:00 08:30:00
2.0 Coffee 08:30:00 08:45:00
2.0 Phone 08:45:00 09:00:00
2.0 Phone 06:00:00 09:00:00
示例中的数字是代理的ID吗?@NoSplitSherlock否,这些是行索引。但这些也可能是特工的名字。我们的想法是在数据框中有一个代理和4-5个活动明确分配给他们。是否有额外的活动显示轮班日期?区分发生在不同日子但在同一天的两次轮班是有问题的hour@OhadChaet整个数据帧只覆盖一天。这必须自动发生。我解析一个HTML页面,得到代理、活动名称和活动时间跨度的列表。@anyplane我不确定问题出在哪里。你能解释一下这个解决方案与你的目标有什么不同吗?也许我不理解你的解决方案,但我只知道每个代理的名字和很多不同的活动。唯一可以帮助我们的两件事情是:每个代理工作8小时,活动开始时间递增,然后重置(例如04:00、06:00、08:00、12:00、04:00)。12:00到04:00之间的时间段是轮班到下一个代理的时间。我编辑了解决方案。我认为这更清楚地说明了它的工作原理。