Python 有条件地为其他数据帧元素创建数据帧_Python_Pandas

Python 有条件地为其他数据帧元素创建数据帧

python pandas

Python 有条件地为其他数据帧元素创建数据帧,python,pandas,Python,Pandas,2020年快乐！我想创建一个基于其他两个数据帧的数据帧。我有以下两个数据帧： df1 = pd.DataFrame({'date':['03.05.1982','04.05.1982','05.05.1982','06.05.1982','07.05.1982','10.05.1982','11.05.1982'],'A': [63.63,64.08,64.19,65.11,65.36,65.25,65.36], 'B': [63.83, 64.10, 64.19, 65.08, 65.33,

2020年快乐！我想创建一个基于其他两个数据帧的数据帧。我有以下两个数据帧：

df1 = pd.DataFrame({'date':['03.05.1982','04.05.1982','05.05.1982','06.05.1982','07.05.1982','10.05.1982','11.05.1982'],'A': [63.63,64.08,64.19,65.11,65.36,65.25,65.36], 'B': [63.83, 64.10, 64.19, 65.08, 65.33, 65.28, 65.36], 'C':[63.99, 64.22, 64.30, 65.16, 65.41, 65.36, 65.44]})

df2 = pd.DataFrame({'Name':['A','B','C'],'Notice': ['05.05.1982','07.05.1982','12.05.1982']})

其思想是创建df3，使此数据帧在到达A的通知日期（在df2中找到）之前获取A的值，然后df3切换到B的值，直到到达B的通知日期，依此类推。当我们在通知日期内时，应取当前列和下一列之间的平均值

在上述示例中，df3应如下所示（用公式说明）：

我的想法是首先创建一个与df1具有相同维度的临时数据框，并在索引日期在通知之前和之后时用1填充它。使用窗口1进行滚动平均会为每列提供一系列1，直到我达到0.5（表示开关）。 不确定是否有更好的方法获得df3？

我尝试了以下方法：

def fill_rule(df_p,df_t):
     return np.where(df_p.index > df_t[df_t.Name==df_p.name]['Notice'][0], 0, 1)

df1['date'] = pd.to_datetime(df1['date'])
df2['notice'] = pd.to_datetime(df2['notice'])
df1.set_index("date", inplace = True)

temp = df1.apply(lambda x: fill_rule(x, df2), axis = 0)

我得到了以下错误：

KeyError:（0，'发生在索引B'）

您可以使用

between

方法选择两个数据帧中的特定日期范围，然后使用

iloc

替换特定值

#Initializing the output
df3 = df1.copy()
df3.drop(['B','C'], axis = 1, inplace = True)
df3.columns = ['date','Result']
df3['Result'] = 0.0
df3['count'] = 0


#Modifying df2 to add a dummy sample at the beginning
temp = df2.copy()
temp = temp.iloc[0]
temp = pd.DataFrame(temp).T
temp.Name ='Z'
temp.Notice = pd.to_datetime("05-05-1980")
df2 = pd.concat([temp,df2])


for i in range(len(df2)-1):
    startDate = df2.iloc[i]['Notice']
    endDate = df2.iloc[i+1]['Notice']

    name = df2.iloc[i+1]['Name']


    indices = [df1.date.between(startDate, endDate, inclusive=True)][0]


    df3.loc[indices,'Result'] += df1[indices][name]
    df3.loc[indices,'count'] += 1


df3.Result = df3.apply(lambda x : x.Result/x['count'], axis = 1)

您可以使用

between

方法选择两个数据帧中的特定日期范围，然后使用

iloc

替换特定值

#Initializing the output
df3 = df1.copy()
df3.drop(['B','C'], axis = 1, inplace = True)
df3.columns = ['date','Result']
df3['Result'] = 0.0
df3['count'] = 0


#Modifying df2 to add a dummy sample at the beginning
temp = df2.copy()
temp = temp.iloc[0]
temp = pd.DataFrame(temp).T
temp.Name ='Z'
temp.Notice = pd.to_datetime("05-05-1980")
df2 = pd.concat([temp,df2])


for i in range(len(df2)-1):
    startDate = df2.iloc[i]['Notice']
    endDate = df2.iloc[i+1]['Notice']

    name = df2.iloc[i+1]['Name']


    indices = [df1.date.between(startDate, endDate, inclusive=True)][0]


    df3.loc[indices,'Result'] += df1[indices][name]
    df3.loc[indices,'count'] += 1


df3.Result = df3.apply(lambda x : x.Result/x['count'], axis = 1)

您的df2没有日期？它是“通知”抱歉，不是日期，它现在被修改您的df2没有日期？它是“通知”抱歉，不是日期，它现在被修改这太棒了。。！谢谢-有一件事，如果我没有弄错的话，当我们在通知日期时，它不会取平均值（分别是82年5月5日A和B之间的值，然后是82年5月7日B和C之间的值），这太棒了。。！谢谢-有一件事，如果我没有弄错的话，当我们在通知日期Hi Roshan时，它不会取平均值（分别是82年5月5日A和B之间的值，然后是82年5月7日B和C之间的值）-非常感谢！它看起来像是在工作，我不知道。然而，我有与上述相同的评论（Akhilesh的回答）：如果我没有弄错的话，当我们到达通知日期时，它不会取平均值（分别是82年5月5日A和B之间的值，然后是82年5月7日B和C之间的值）。我不知道你需要这些天的平均值。这可以通过修改来实现。我已经更新了我的回答Hi Roshan-非常感谢！它看起来像是在工作，我不知道。然而，我有与上述相同的评论（Akhilesh的回答）：如果我没有弄错的话，当我们到达通知日期时，它不会取平均值（分别是82年5月5日A和B之间的值，然后是82年5月7日B和C之间的值）。我不知道你需要这些天的平均值。这可以通过修改来实现。我已经更新了我的答案

df1['t'] = df1['date'].map(df2.set_index(["Notice"])['Name'])
df1['t'] =df1['t'].fillna(method='bfill').fillna("C")

df3 = pd.DataFrame()
df3['Result'] = df1.apply(lambda row: row[row['t']],axis =1)
df3['date'] = df1['date']