Python:迭代不同长度的数据帧,并使用重复值计算新值
编辑: 我意识到我的示例设置不正确,更正的版本如下: 我有两个数据帧:Python:迭代不同长度的数据帧,并使用重复值计算新值,python,pandas,dataframe,comparison,iteration,Python,Pandas,Dataframe,Comparison,Iteration,编辑: 我意识到我的示例设置不正确,更正的版本如下: 我有两个数据帧: df1 = pd.DataFrame({'x values': [11, 12, 13], 'time':[1,2.2,3.5}) df2 = pd.DataFrame({'x values': [11, 21, 12, 43], 'time':[1,2.1,2.6,3.1}) 我需要做的是迭代这两个数据帧,并计算一个新值,即df1和df2中x值的比率。困难在于这些数据帧的长度不同 如果我只想计算这两个函数中的值,我知道我
df1 = pd.DataFrame({'x values': [11, 12, 13], 'time':[1,2.2,3.5})
df2 = pd.DataFrame({'x values': [11, 21, 12, 43], 'time':[1,2.1,2.6,3.1})
我需要做的是迭代这两个数据帧,并计算一个新值,即df1和df2中x值的比率。困难在于这些数据帧的长度不同
如果我只想计算这两个函数中的值,我知道我可以使用zip,甚至map之类的东西。不幸的是,我不想删除任何值。相反,我需要能够比较两个帧之间的时间列,以确定是否将上一时间的值复制到下一时间段的计算
例如,我会计算第一个比率:
df1["x values"][0]/df2["x values"][0]
然后在第二次更新中,我检查下一次发生的更新,在本例中是对df2的更新,因此df1[“time”]df1["x values"][1]/df2["x values"][1]
如果两个数据帧中的时间相等,则应使用这两个值从同一“位置”计算比率
等等。我很困惑是否可以使用lambda函数或itertools之类的东西来执行。我做了一些尝试,但大多数都出现了错误。任何帮助都将不胜感激。您可以及时合并两个数据帧,然后计算比率
new_df = df1.merge(df2, on = 'time', how = 'outer')
new_df['ratio'] = new_df['x values_x'] / new_df['x values_y']
你得到
time x values_x x values_y ratio
0 1 11 11 1.000000
1 2 12 21 0.571429
2 2 12 12 1.000000
3 3 13 43 0.302326
这就是我最后要做的。希望这有助于澄清我的问题是什么。此外,如果有人能想出一个更具蟒蛇风格的方法来做这件事,我将非常感谢您的反馈
#add a column indicating which 'type' of dataframe it is
df1['type']=pd.Series('type1',index=df1.index)
df2['type']=pd.Series('type2',index=df2.index)
#concatenate the dataframes
df = pd.concat((df1, df2),axis=0, ignore_index=True)
#sort by time
df = df.sort_values(by='time').reset_index()
#we create empty arrays in order to track records
#in a way that will let us compute ratios
x1 = []
x2 = []
#we will iterate through the dataframe line by line
for i in range(0,len(df)):
#if the row contains data from df1
if df["type"][i] == "type1":
#we append the x value for that type
x1.append(df[df["type"]=="type1"]["x values"][i])
#if the x2 array contains exactly 1 value
if len(x2) == 1:
#we add it to match the number of x1
#that we have recorded up to that point
#this is useful if one time starts before the other
for j in range(1, len(x1)-1):
x2.append(x2[0])
#if the x2 array contains more than 1 value
#add a copy of the previous x2 record to correspond
#to the new x1 record
if len(x2) > 0:
x2.append(x2[len(x2)-1])
#if the row contains data from df2
if df["type"][i] == "type2":
#we append the x value for that type
x2.append(df[df["type"]=="type2"]["x values"][i])
#if the x1 array contains exactly 1 value
if len(x1) == 1:
#we add it to match the number of x2
#that we have recorded up to that point
#this is useful if one time starts before the other
for j in range(1, len(x2)-1):
x1.append(x2[0])
#if the x1 array contains more than 1 value
#add a copy of the previous x1 record to correspond
#to the new x2 record
if len(x1) > 0:
x1.append(x1[len(x1)-1])
#combine the records
new__df = pd.DataFrame({'Type 1':x1, 'Type 2': x2})
#compute the ratio
new_df['Ratio'] = new_df['x1']/f_df['x2']
问题是我简化了我的示例,所以时间实际上并不在两个帧之间对齐-它们有不同的日期。那么,找到比率的基础是什么?我基本上只需要得到任何“更新”值的比率。所以如果你认为时间是线性的,任何时候我有一个与新时间段相关联的x值,我需要更新比率计算。所以我在跟踪比率随时间的变化。我现在意识到我的示例设置有点不正确。@mildlyillogical你应该提供一个可复制的示例我可能已经找到了解决方案,我将在稍后发布。
#add a column indicating which 'type' of dataframe it is
df1['type']=pd.Series('type1',index=df1.index)
df2['type']=pd.Series('type2',index=df2.index)
#concatenate the dataframes
df = pd.concat((df1, df2),axis=0, ignore_index=True)
#sort by time
df = df.sort_values(by='time').reset_index()
#we create empty arrays in order to track records
#in a way that will let us compute ratios
x1 = []
x2 = []
#we will iterate through the dataframe line by line
for i in range(0,len(df)):
#if the row contains data from df1
if df["type"][i] == "type1":
#we append the x value for that type
x1.append(df[df["type"]=="type1"]["x values"][i])
#if the x2 array contains exactly 1 value
if len(x2) == 1:
#we add it to match the number of x1
#that we have recorded up to that point
#this is useful if one time starts before the other
for j in range(1, len(x1)-1):
x2.append(x2[0])
#if the x2 array contains more than 1 value
#add a copy of the previous x2 record to correspond
#to the new x1 record
if len(x2) > 0:
x2.append(x2[len(x2)-1])
#if the row contains data from df2
if df["type"][i] == "type2":
#we append the x value for that type
x2.append(df[df["type"]=="type2"]["x values"][i])
#if the x1 array contains exactly 1 value
if len(x1) == 1:
#we add it to match the number of x2
#that we have recorded up to that point
#this is useful if one time starts before the other
for j in range(1, len(x2)-1):
x1.append(x2[0])
#if the x1 array contains more than 1 value
#add a copy of the previous x1 record to correspond
#to the new x2 record
if len(x1) > 0:
x1.append(x1[len(x1)-1])
#combine the records
new__df = pd.DataFrame({'Type 1':x1, 'Type 2': x2})
#compute the ratio
new_df['Ratio'] = new_df['x1']/f_df['x2']