Python 如何合并具有动态列数的列_Python_Pandas_Append_Conda

Python 如何合并具有动态列数的列

python pandas

Python 如何合并具有动态列数的列,python,pandas,append,conda,Python,Pandas,Append,Conda,我正在从事一个人工智能项目，该项目涉及用Python处理大量数据帧。我试图将值附加到df，但是，我想根据数据框a的列数，使df的列数动态变化。rowMerge是一个接受两个变量（a和b）的函数。a是我们提供的数据帧，b是我们期望函数返回的数据帧。当a有五列时，此函数使我能够合并行 def rowMerger(a,b): try: b = pd.DataFrame(data=None, columns =[f'Column{i}' for i in range(0, len(a.colum

我正在从事一个人工智能项目，该项目涉及用Python处理大量数据帧。我试图将值附加到

df

，但是，我想根据数据框

的列数，使

df

的列数动态变化。rowMerge是一个接受两个变量（

和

）的函数。a是我们提供的数据帧，b是我们期望函数返回的数据帧。当a有五列时，此函数使我能够合并行

def rowMerger(a,b):
try:
    b = pd.DataFrame(data=None, columns =[f'Column{i}' for i in range(0, len(a.columns))])
    rule1 = lambda x: x not in ['']
    u = a.loc[a['Column0'].apply(rule1) & a['Column1'].apply(rule1) & a['Column2'].apply(rule1)].index
    findMergerindexs = list(u)
    findMergerindexs.sort()
    a = pd.DataFrame(a)
    if (len(findMergerindexs) > 0):
       for m in range(len(findMergerindexs)):
           if not (m == (len(findMergerindexs)-1)): 
               startLoop = findMergerindexs[m]
               endLoop = findMergerindexs[m+1]
           else:
               startLoop = findMergerindexs[m]
               endLoop = len(a)
           Column0 = ''
           Column1 = ''
           Column2 = ''
           Column3 = ''
           Column4 = ''
           for n in range(startLoop,endLoop):
               Column0 = Column0 + str(a.iloc[n,0])
               Column1 = Column1 + str(a.iloc[n,1])
               Column2 = Column2 + str(a.iloc[n,2])
               Column3 = Column3 + str(a.iloc[n,3])
               Column4 = Column4 + str(a.iloc[n,4])
           b = b.append({'Column0': Column0.strip(), 'Column1': Column1.strip(), 'Column2': Column2.strip(), 'Column3': Column3.strip(), 'Column4': Column4.strip()}, ignore_index=True)
    else:
        print("File is not having a row for merging instances - Please check the file manually for instance - ")
except: 
    print("Error - While merging the rows")
return b

上面的函数是我用来合并行的函数，这样我就可以去掉行之间的空间。例如，我有一个数据框，如下所示

    df=[['7','4','5','7','8'],["","","",'7','4'],['9','4','7','8','4'],["","","",'7','5'],['4','8','5','4','6']]
df=pd.DataFrame(df)
df.columns=[f'Column{i}' for i in range(0, len(df.columns))]



Column0 Column1 Column2 Column3 Column4
7       4       5       7       8 
                        7       4
9       4       7       8       4
                        7       5
4       8       5       4       6

函数

rowmerge

删除了行之间的空间，并给出了如下所示的数据帧

rowMerger(df,0)
    Column1 Column2 Column3 Column4 Column5
    7       4       5       77       84
    9       4       7       87       45
    4       8       5       4         6

但是，此功能不是动态的。也就是说，变量

的列数是手动确定的。相反，我希望根据变量

的列数，使函数内部生成的列数成为动态的。例如，如果

的列数是三列，我想创建三列（

Column0

，

Column0

，

Column0

），并将值附加到这些列，然后返回一个包含三列的数据帧

def rowMerger(a,b):
try:
    b = pd.DataFrame(data=None, columns =[f'Column{i}' for i in range(0, len(a.columns))])
    rule1 = lambda x: x not in ['']
    u = a.loc[a['Column0'].apply(rule1) & a['Column1'].apply(rule1) & a['Column2'].apply(rule1)].index
    findMergerindexs = list(u)
    findMergerindexs.sort()
    a = pd.DataFrame(a)
    if (len(findMergerindexs) > 0):
       for m in range(len(findMergerindexs)):
           if not (m == (len(findMergerindexs)-1)): 
               startLoop = findMergerindexs[m]
               endLoop = findMergerindexs[m+1]
           else:
               startLoop = findMergerindexs[m]
               endLoop = len(a)
           Column0 = ''
           Column1 = ''
           Column2 = ''
           Column3 = ''
           Column4 = ''
           for n in range(startLoop,endLoop):
               Column0 = Column0 + str(a.iloc[n,0])
               Column1 = Column1 + str(a.iloc[n,1])
               Column2 = Column2 + str(a.iloc[n,2])
               Column3 = Column3 + str(a.iloc[n,3])
               Column4 = Column4 + str(a.iloc[n,4])
           b = b.append({'Column0': Column0.strip(), 'Column1': Column1.strip(), 'Column2': Column2.strip(), 'Column3': Column3.strip(), 'Column4': Column4.strip()}, ignore_index=True)
    else:
        print("File is not having a row for merging instances - Please check the file manually for instance - ")
except: 
    print("Error - While merging the rows")
return b

我已经尽力了，但这是我力所不及的。我仍在学习python，如果有人能帮助我，我将不胜感激；它适用于您提供的示例，但您必须调整它以适应许多其他场景：其思想是找到具有空字符串的行，获取这些行的列，组合它们，并以某种方式将它们传递回原始数据帧。我在代码中添加注释；希望他们能很好地解释这一点。让我知道怎么回事。其他人可能会有一个更好的，所以只要玩它和c

 def process_data(df):

    #convert to string
    #easier to merge rows
    df = df.astype(str)

    #find rows where there are empty strings
    empty_rows_index = df.loc[df.eq('').any(axis=1)].index

    #find columns where there are no empty strings
    non_empty_cols = df.loc[:,df.ne('').all()].columns.tolist()

    #this gets us the index above the rows with empty strings
    empty_rows_pair = [[ind-1,ind] for ind in empty_rows_index]

    #pair index with columns
    rows_cols = [[entry,non_empty_cols] for entry in empty_rows_pair]

    #this combines the columns where empty strings are in the next row
    #with the non empty string row in the previous column
    lump = [df.loc[x,y].sum().astype('int') for x,y in rows_cols]

    #combine and flip, so that the column names are the headers
    merger = pd.concat(lump,axis=1).T

    #to ensure complete reintegration back to the dataframe
    #set the merger index to the previous row index
    merger.index = [i for i,j in empty_rows_pair]

    #drop the empty string rows
    df = df.drop(empty_rows_index)

    #set the rows in df to match with
    #the rows and columns in merger
    #and assign merger to that section
    df.loc[merger.index,merger.columns] = merger

    df = df.astype(int).reset_index(drop=True)
    return df

    process_data(df)

    Column0 Column1 Column2 Column3 Column4
0       7      4       5      77     84
1       9      4       7      87     45
2       4      8       5      4      6

IIUC，你想删除列中的空格，对吗？这是你的主要目标？@IIUC，是的，但也要将行合并到上面的行中。但是，我想考虑输入数据帧的列数。我希望对输入df的不同列数获得相同的结果。