Python 获取行索引并将其写入数据帧_Python_Pandas_Dataframe

Python 获取行索引并将其写入数据帧

python pandas dataframe

Python 获取行索引并将其写入数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,我有两个具有以下结构的数据帧 DF1 DF2 AA、BB、CC列存储有关每个站点的一些测量信息，可以包含数值，也可以为空。AA、BB、CC中的记录取决于日期和地点。因此，基本上我的步骤是： 1.获取DF1中在AA中有记录的行（然后是BB，然后是CC） 2.将时间间隔和站点名称用作在DF2中查找行的键 3.将来自DF1的行的索引放入DF3中相应的AA/BB/CC列中，其中包含与提供的时间间隔和站点名称匹配的时间、错误和站点记录预计将实现最终DF3 为了便于使用，我将DF1中的时间列拆分为两列—开

我有两个具有以下结构的数据帧

DF1 DF2 AA、BB、CC列存储有关每个站点的一些测量信息，可以包含数值，也可以为空。AA、BB、CC中的记录取决于日期和地点。因此，基本上我的步骤是：
1.获取DF1中在AA中有记录的行（然后是BB，然后是CC）
2.将时间间隔和站点名称用作在DF2中查找行的键
3.将来自DF1的行的索引放入DF3中相应的AA/BB/CC列中，其中包含与提供的时间间隔和站点名称匹配的时间、错误和站点记录

预计将实现最终DF3 为了便于使用，我将DF1中的时间列拆分为两列—开始和结束

df1['Start']=df1['Time'].str.split（'-'）.str[0]
df1['End']=df1['Time'].str.split（'-'）.str[1]
df1['Start']=pd.to_日期时间（df1['Start']）
df1['End']=pd.to_datetime（df1['End']）
cols=['AA'，'BB'，'CC']
对于df1[cols]中的列：
df1=df1[（df1[列]！='NS'）和（df1[列]！='0'）]
对于df1['Site'].unique（）中的名称：
如果df2['Site'].str.contains（name）&df2['Time'].between（df1['Start'].values[0]，df2['End'].values[0]）：
values=df1.index.values.tolist（）
df3[列]=[值]

第1步和第2步没问题，但第3步我无能为力。问题是我无法获得如何获取索引来构建具有我想要的结构的df3，因为每个AA、BB、CC列都可能有重复的多个索引

有可能达到预期的结果吗？如果是，需要一些线索我需要做什么

提前感谢

使用：

df1['Start'] = df1['Time'].str.split(' - ').str[0]
df1['End'] = df1['Time'].str.split(' - ').str[1]
df1['Start'] = pd.to_datetime(df1['Start'])
df1['End'] = pd.to_datetime(df1['End'])
df2['Time'] = pd.to_datetime(df2['Time'])

#reset_index for avoid lost indices in both, merge together
df = df2.reset_index().merge(df1.reset_index(), on='Site', how='left', suffixes=('','_'))
#filter by condition
df = df[df['Time'].between(df['Start'],df['End'])]

cols = ['AA','BB','CC']
#filter values not matching - not sure if 0 number or 0 string, so added both
m = ~df[cols].isin(['NS', 0, '0'])
#get values from index to cols
df[cols] = m.astype(int).mul(df.pop('index_'), axis=0)

#join together with filter out `0` values
f = lambda x: ','.join(x[x!=0].astype(str))

c = df2.columns.tolist()
#aggregate join
df = df.groupby(['index'] + c)[cols].agg(f).reset_index(level=c)
print (df)
                     Time  Error  Site  AA     BB     CC
index                                                   
10    2019-04-20 09:25:15    401  AR25  58            58
11    2019-04-20 11:00:10    401  AR25  58            58
15    2019-04-21 23:25:16    404  DP88  60  59,60  59,60

您需要那些行，其中

df1['Time']==df2['Time']和df1['Site']==df2['Site']]

？您能解释一下添加列的逻辑吗？因为例如在final in

BB

中没有值，为什么？@jezrael编辑了这篇文章。并非每个站点都可以在AA-CC列中有记录。例如，site1可以有AA和CC数据，site2可以有BB数据，Site3可以有这三种数据中的记录columns@ShanAli有点df1['Time']有一个时间间隔记录，而df2['Time']-精确时间。因此，df2['Time']必须落在df1['Time']的时间间隔内。这就是为什么实际上我将df1['Time']分为开始和结束。

.... |     Time            | Error | Site |     
  10 | 20-04-2019 09:25:15 | 401   | AR25 |  
  11 | 20-04-2019 11:00:10 | 401   | AR25 |    
  15 | 21-04-2019 23:25:16 | 404   | DP88 |

.... |  Time               | Error |Site |    AA   |   BB   |   CC  |    
  1  | 20-04-2019 09:25:15 |  401  |AR25 |  58     |        |    58 |  
  2  | 20-04-2019 11:00:10 |  401  |AR25 |  58     | 58     |       |  
  2  | 21-04-2019 23:25:16 |  404  |DP88 |  59,60  |  59,60 | 59,60 |

df1['Start'] = df1['Time'].str.split(' - ').str[0]
df1['End'] = df1['Time'].str.split(' - ').str[1]
df1['Start'] = pd.to_datetime(df1['Start'])
df1['End'] = pd.to_datetime(df1['End'])
df2['Time'] = pd.to_datetime(df2['Time'])

#reset_index for avoid lost indices in both, merge together
df = df2.reset_index().merge(df1.reset_index(), on='Site', how='left', suffixes=('','_'))
#filter by condition
df = df[df['Time'].between(df['Start'],df['End'])]

cols = ['AA','BB','CC']
#filter values not matching - not sure if 0 number or 0 string, so added both
m = ~df[cols].isin(['NS', 0, '0'])
#get values from index to cols
df[cols] = m.astype(int).mul(df.pop('index_'), axis=0)

#join together with filter out `0` values
f = lambda x: ','.join(x[x!=0].astype(str))

c = df2.columns.tolist()
#aggregate join
df = df.groupby(['index'] + c)[cols].agg(f).reset_index(level=c)
print (df)
                     Time  Error  Site  AA     BB     CC
index                                                   
10    2019-04-20 09:25:15    401  AR25  58            58
11    2019-04-20 11:00:10    401  AR25  58            58
15    2019-04-21 23:25:16    404  DP88  60  59,60  59,60