Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/linq/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用pandas将两个dfs的参数转换为新参数_Python_Pandas_Dataframe_Pandas Groupby - Fatal编程技术网

Python 使用pandas将两个dfs的参数转换为新参数

Python 使用pandas将两个dfs的参数转换为新参数,python,pandas,dataframe,pandas-groupby,Python,Pandas,Dataframe,Pandas Groupby,我有两个数据帧,它们都引用相同的事件(由id标记)。一个df是离散的,在几个月内以某种分辨率显示事件的过程(df1仅显示摘录),另一个df总结每个事件的参数(df_事件) 简化数据: df(原来的df有更多的行!) 输出: id date numb 2020-01-01 12:00:00 1 2020-01-01 12:00:00 1 2020-01-01 13:00:00 1 2020-01-01 12:00:00

我有两个数据帧,它们都引用相同的事件(由
id
标记)。一个df是离散的,在几个月内以某种分辨率显示事件的过程(df1仅显示摘录),另一个df总结每个事件的参数(df_事件)

简化数据: df(原来的df有更多的行!)

输出:

                    id                 date numb
2020-01-01 12:00:00 1   2020-01-01 12:00:00 1
2020-01-01 13:00:00 1   2020-01-01 12:00:00 5
2020-01-01 14:00:00 1   2020-01-01 12:00:00 8
2020-01-01 15:00:00 2   2020-01-05 15:00:00 0
2020-01-01 16:00:00 2   2020-01-05 15:00:00 4
2020-01-01 17:00:00 2   2020-01-05 15:00:00 11
2020-01-01 18:00:00 2   2020-01-05 15:00:00 25
                   date numb_total  timedelta
id          
1   2020-01-01 12:00:00          8   00:55:00
2   2020-01-01 15:00:00         25   01:00:00
3   2020-01-08 07:00:00         11   00:45:00
4   2020-01-15 13:00:00         14   00:15:00
5   2020-01-22 12:00:00          8   00:30:00
df_事件:

df_event = pd.DataFrame({'id':[1,2,3,4,5],
                         'date':['2020-01-01 12:00:00','2020-01-01 15:00:00','2020-01-08 07:00:00','2020-01-15 13:00:00','2020-01-22 12:00:00'],
                         'numb_total':[8,25,11,14,8],
                         'timedelta': [55,60,45,15,30]})

df_event = df_event.set_index('id')
df_event['date'] = pd.to_datetime(df_event['date'])
df_event['timedelta'] = pd.to_timedelta(df_event['timedelta'], unit='T')
输出:

                    id                 date numb
2020-01-01 12:00:00 1   2020-01-01 12:00:00 1
2020-01-01 13:00:00 1   2020-01-01 12:00:00 5
2020-01-01 14:00:00 1   2020-01-01 12:00:00 8
2020-01-01 15:00:00 2   2020-01-05 15:00:00 0
2020-01-01 16:00:00 2   2020-01-05 15:00:00 4
2020-01-01 17:00:00 2   2020-01-05 15:00:00 11
2020-01-01 18:00:00 2   2020-01-05 15:00:00 25
                   date numb_total  timedelta
id          
1   2020-01-01 12:00:00          8   00:55:00
2   2020-01-01 15:00:00         25   01:00:00
3   2020-01-08 07:00:00         11   00:45:00
4   2020-01-15 13:00:00         14   00:15:00
5   2020-01-22 12:00:00          8   00:30:00
现在,我想将两个dfs链接在一起,以便获得一个日/周配置文件。df应按小时/天排序。然后应在此处显示该时间段的
numb
timedelta
的平均值

周配置文件应显示哪个
numb
timedelta
(来自df_事件)是相应
时刻=天+时间的平均值(有趣的是任何时刻的最小值和最大值)

例如,创建一个新的df2,如:

df['day'] = df['date'].dt.day_name()
df['time'] = df['date'].dt.time   
df_event = df.groupby(['day', 'time'])...
然后添加'df_事件的数据,得到如下结果:

                       timedelta  numb_total
day             time    
Monday      00:00:00    00:00:00          0
Monday      01:00:00    00:00:00          0 
...
Wednesday   11:00:00    00:00:00          0
Wednesday   12:00:00    00:55:00          8
...
Sunday      14:00:00    00:00:00          0
Sunday      15:00:00    01:00:00         25
Sunday      16:00:00    00:00:00          0
...
Sunday      23:00:00    00:00:00          0

IIUC首先聚合两个数据帧,然后合并在一起:

df_event = df_event.set_index('id')
df_event['date'] = pd.to_datetime(df_event['date'])

df_event['day'] = df_event['date'].dt.day_name()
df_event['time'] = df_event['date'].dt.time   
df_event1 = df_event.groupby(['day', 'time'])[['timedelta', 'numb_total']].mean()
print (df_event1)
                    timedelta  numb_total
day       time                           
Wednesday 07:00:00       45.0        11.0
          12:00:00       42.5         8.0
          13:00:00       15.0        14.0
          15:00:00       60.0        25.0
          
df['day'] = df['date'].dt.day_name()
df['time'] = df['date'].dt.time   
df_event2 = df.groupby(['day', 'time'])['numb'].mean()
print (df_event2)
day        time    
Sunday     15:00:00    10.000000
Wednesday  12:00:00     4.666667
Name: numb, dtype: float64

df = df_event1.join(df_event2, how='inner' )
df['timedelta'] = pd.to_timedelta(df['timedelta'], unit='T')
print (df)
                         timedelta  numb_total      numb
day       time                                          
Wednesday 12:00:00 0 days 00:42:30         8.0  4.666667
#df中的索引和日期之间的关系是什么?都是日期。哪个与df_事件日期有关

很高兴在你澄清后再复习

#Generate column key in each datframe extracting hour. Merge the two dataframes on key. Drop columns not required

df2=pd.merge(df.assign(key=df.index.hour),df_event.assign(key=df_event.set_index('date')\
.index.hour),on=['key','date'],how='right').dropna().drop_duplicates(keep='last')[['date','numb_total','timedelta']]


#Extract time and  day_name 


df2['time']=df2.date.dt.strftime('%H:%M:%S')
df2['day']=df2.date.dt.day_name()



    date  n             umb_total    timedelta      time        day
0 2020-01-01 12:00:00           8      00:55:00     12:00:00  Wednesday