Pandas 在循环中创建和合并数据帧

Pandas 在循环中创建和合并数据帧,pandas,dataframe,for-loop,merge,Pandas,Dataframe,For Loop,Merge,我需要根据一些条件读入一堆I/p数据帧,然后合并它们,最后创建数据帧,如“merge_m0”、“merge_m1”、“merge_m2”等等 在实际代码中,我需要阅读大约20个数据帧。但是,为了简单易懂,我创建了3个数据帧,并使用for循环来读取和合并它们 #输入:示例输入数据帧df0、df1和df2 df0=pd.DataFrame({'id':[100,101,102,103],'m0_val_mthd':[1,8,25,41],'name':['AAA','BBB','CCC','DDD'

我需要根据一些条件读入一堆I/p数据帧,然后合并它们,最后创建数据帧,如“merge_m0”、“merge_m1”、“merge_m2”等等

在实际代码中,我需要阅读大约20个数据帧。但是,为了简单易懂,我创建了3个数据帧,并使用for循环来读取和合并它们

#输入:示例输入数据帧df0、df1和df2

df0=pd.DataFrame({'id':[100,101,102,103],'m0_val_mthd':[1,8,25,41],'name':['AAA','BBB','CCC','DDD'],'m0_orig_val_mthd':[2,3,4,5]})
df1=pd.DataFrame({'id':[100,104,102,103],'m1_val_mthd':[1,8,10,25],'name':['EEE','FFF','GGG','HHH'],'m1_orig_val_mthd':[2,3,4,5]})
df2=pd.DataFrame({'id':[100,104,102,103],'m2_val_mthd':[1,8,10,25],'name':['III','JJJ','KKK','LLL'],'m2_orig_val_mthd':[2,3,4,5]})
为此,我使用globals()在循环中创建数据帧并合并它们,但它不起作用,并抛出“'DataFrame'对象没有属性'globals'”错误

#代码:

我已经尝试了以下方法来代替上面函数中的第一行

#globals()[f"m{x}"] = globals()[f'df{x}'][globals()[f'df{x}'].m{x}_val_mthd.isin([1,25])]
#globals()[f"m{x}"] = globals()[f'df{x}']["[f'm{x}_val_mthd']"].isin([1,25])
我想一定有更好、更简单的方法来做这件事,如果有人能帮忙,我将不胜感激。谢谢

编辑# 我的最新帖子:

df0=pd.DataFrame({'id':[100,101,102,103],'m0_val_mthd':[1,8,25,41],'name':['AAA','BBB','CCC','DDD'],'m0_orig_val_mthd':[2,3,4,5]})
df1=pd.DataFrame({'id':[100,104,102,103],'m1_val_mthd':[1,8,10,25],'name':['EEE','FFF','GGG','HHH'],'m1_orig_val_mthd':[2,3,4,5]})
df2=pd.DataFrame({'id':[100,104,102,103],'m2_val_mthd':[1,8,10,25],'name':['III','JJJ','KKK','LLL'],'m2_orig_val_mthd':[2,3,4,5]})

df_list=[]
for i in range(0,3):
    df_list.append(globals()[f'df{i}']) #I'm appending all the i/p dataframes which are created already by other step in the code and hope this works

def comb_mths(i):
    dfa = df_list[i]
    dfb = df_list[i+1]
    dfma = dfa[dfa.iloc[:, 1].isin([1,25])] 
    dfmb = dfb[(dfb.iloc[:, 1].isin([8,10,11,12])) & (dfb.iloc[:, 3].isin([2,3,4,5]))]
    print(dfma)
    print(dfmb)
    print('\n'*3)

    globals()[f"merge_m{i}"]  = dfma.merge(dfmb, how='inner', on=['id'])
    return globals()[f"merge_m{i}"] 

for i in range(0,2): 
    comb_mths(i)

print(merge_m0)    
print(merge_m1)
在创建“merge_m{i}”数据帧后的上述函数中,我需要再检查一个“if else”条件并计算一个变量,比如“mths”。


我采用了一种不同的方法,假设您可以首先创建所有输入数据帧。如果您可以创建数据帧并将其放入列表中,则可以更轻松地处理它们,并使代码更易于阅读

df0=pd.DataFrame({'id':[100,101,102,103],'m0_val_mthd':[1,8,25,41],'name':['AAA','BBB','CCC','DDD'],'m0_orig_val_mthd':[2,3,4,5]})
df1=pd.DataFrame({'id':[100,104,102,103],'m1_val_mthd':[1,8,10,25],'name':['EEE','FFF','GGG','HHH'],'m1_orig_val_mthd':[2,3,4,5]})
df2=pd.DataFrame({'id':[100,104,102,103],'m2_val_mthd':[1,8,10,25],'name':['III','JJJ','KKK','LLL'],'m2_orig_val_mthd':[2,3,4,5]})


# add your inputs to the list    
df_list = [df0, df1, df2]

# only pass in i, then call dfa, dfb by position in the list
def comb_mths(i):
    dfa = df_list[i]
    dfb = df_list[i+1]
    # print(dfa)
    # print(dfb)
    # print('\n'*3)
    
    # I wasn't exactly sure what you wanted here, but I think the original issue was you were calling your new dataframe before it was created.
    dfma = dfa[dfa.iloc[:, 1].isin([1,25])] # as long as columns are in the same position, you don't need to call them by name, just position
    dfmb = dfb[(dfb.iloc[:, 1].isin([8,10,11,12])) & (dfb.iloc[:, 3].isin([2,3,4,5]))]
    print(dfma)
    print(dfmb)
    print('\n'*3)

    #creating new merged datframes. cleaned this up too
    globals()[f"merge_m{i}"]  = dfma.merge(dfmb, how='inner', on=['id'])
    return globals()[f"merge_m{i}"] #added return statement

for i in range(0,2): # watch range end or you'll get an error
    comb_mths(i)

print(merge_m0)    
print(merge_m1)
附加代码:

# to populate the df_list, do this
# you aren't actually naming them, I only did that in example above due to your Example
# when you call them, you are calling the position in the list
df_list = []
for i in range(0,20):
    df = 'do your code here'
    df_list.append(df)

# print the df to verify they are created
for df in df_list:
    print(df)

您的示例数据帧没有正确的语法。你能先解决这些问题吗?@JonathanLeon抱歉,这是复制粘贴问题…修复了i/p dataframesThanks以获得你的解决方案。几个问题。1) 是的,所有i/p数据帧首先基于先前代码中的“mths”变量值创建,它可以是0到20之间的任意数字,并且基于它的值,所有i/p数据帧首先创建。那么,有没有办法用所有i/p数据帧自动填充df_列表?我尝试创建一个空列表,并将数据帧名称作为元素添加到列表中。但是,对于dfa.iloc[;1]和dfb.iloc[;1]行,它是作为字符串添加的,如df_list=['df0','df1',df2'…]2),不,变量名可能不在同一位置,请参阅答案以获取其他代码:对于这一行,对于dfa.iloc[;1]&dfb.iloc[;1]行,不,变量名可能不在同一位置——不清楚为什么会这样,但这是您的数据。如果你能重新安排这些列,总是把它们放在第一位和第二位,那么你将省去一些麻烦。非常感谢,感谢你的帮助!!。1) 为了用已经创建的数据帧名称填充df_列表,我使用了globals(),希望它是正确的。2) 如果我想使用变量名而不是位置,我可以用df.loc['var']代替iloc和position吗?3) 另外,在您创建的同一个函数中,我需要再检查一个if-else条件并计算一个新变量。请查看我更新的帖子,了解详情,并告诉我如何添加。你不知道如何填充列表。如果你的方法行得通,那就去吧。2.搜索这个论坛如何选择行使用列名,看看你是否可以让它工作3。在创建合并df之后(在return语句之前)添加else语句,并确保在else语句3中引用了正确的df。我尝试在globals()[f“merge_m{I}”]=dfma.merge(dfmb,how='internal',on=['id'])之后添加if-else条件,但它会引发各种错误,因为我不确定如何使用globals()使用列名,或者在使用globals()创建数据帧之后添加if-else条件。如果你不介意的话,你能帮我做一个补充吗?谢谢你的帮助!
df0=pd.DataFrame({'id':[100,101,102,103],'m0_val_mthd':[1,8,25,41],'name':['AAA','BBB','CCC','DDD'],'m0_orig_val_mthd':[2,3,4,5]})
df1=pd.DataFrame({'id':[100,104,102,103],'m1_val_mthd':[1,8,10,25],'name':['EEE','FFF','GGG','HHH'],'m1_orig_val_mthd':[2,3,4,5]})
df2=pd.DataFrame({'id':[100,104,102,103],'m2_val_mthd':[1,8,10,25],'name':['III','JJJ','KKK','LLL'],'m2_orig_val_mthd':[2,3,4,5]})


# add your inputs to the list    
df_list = [df0, df1, df2]

# only pass in i, then call dfa, dfb by position in the list
def comb_mths(i):
    dfa = df_list[i]
    dfb = df_list[i+1]
    # print(dfa)
    # print(dfb)
    # print('\n'*3)
    
    # I wasn't exactly sure what you wanted here, but I think the original issue was you were calling your new dataframe before it was created.
    dfma = dfa[dfa.iloc[:, 1].isin([1,25])] # as long as columns are in the same position, you don't need to call them by name, just position
    dfmb = dfb[(dfb.iloc[:, 1].isin([8,10,11,12])) & (dfb.iloc[:, 3].isin([2,3,4,5]))]
    print(dfma)
    print(dfmb)
    print('\n'*3)

    #creating new merged datframes. cleaned this up too
    globals()[f"merge_m{i}"]  = dfma.merge(dfmb, how='inner', on=['id'])
    return globals()[f"merge_m{i}"] #added return statement

for i in range(0,2): # watch range end or you'll get an error
    comb_mths(i)

print(merge_m0)    
print(merge_m1)
# to populate the df_list, do this
# you aren't actually naming them, I only did that in example above due to your Example
# when you call them, you are calling the position in the list
df_list = []
for i in range(0,20):
    df = 'do your code here'
    df_list.append(df)

# print the df to verify they are created
for df in df_list:
    print(df)