Python 如何合并具有动态列数的列
我正在从事一个人工智能项目,该项目涉及用Python处理大量数据帧。我试图将值附加到Python 如何合并具有动态列数的列,python,pandas,append,conda,Python,Pandas,Append,Conda,我正在从事一个人工智能项目,该项目涉及用Python处理大量数据帧。我试图将值附加到df,但是,我想根据数据框a的列数,使df的列数动态变化。rowMerge是一个接受两个变量(a和b)的函数。a是我们提供的数据帧,b是我们期望函数返回的数据帧。当a有五列时,此函数使我能够合并行 def rowMerger(a,b): try: b = pd.DataFrame(data=None, columns =[f'Column{i}' for i in range(0, len(a.colum
df
,但是,我想根据数据框a
的列数,使df
的列数动态变化。rowMerge是一个接受两个变量(a
和b
)的函数。a是我们提供的数据帧,b是我们期望函数返回的数据帧。当a有五列时,此函数使我能够合并行
def rowMerger(a,b):
try:
b = pd.DataFrame(data=None, columns =[f'Column{i}' for i in range(0, len(a.columns))])
rule1 = lambda x: x not in ['']
u = a.loc[a['Column0'].apply(rule1) & a['Column1'].apply(rule1) & a['Column2'].apply(rule1)].index
findMergerindexs = list(u)
findMergerindexs.sort()
a = pd.DataFrame(a)
if (len(findMergerindexs) > 0):
for m in range(len(findMergerindexs)):
if not (m == (len(findMergerindexs)-1)):
startLoop = findMergerindexs[m]
endLoop = findMergerindexs[m+1]
else:
startLoop = findMergerindexs[m]
endLoop = len(a)
Column0 = ''
Column1 = ''
Column2 = ''
Column3 = ''
Column4 = ''
for n in range(startLoop,endLoop):
Column0 = Column0 + str(a.iloc[n,0])
Column1 = Column1 + str(a.iloc[n,1])
Column2 = Column2 + str(a.iloc[n,2])
Column3 = Column3 + str(a.iloc[n,3])
Column4 = Column4 + str(a.iloc[n,4])
b = b.append({'Column0': Column0.strip(), 'Column1': Column1.strip(), 'Column2': Column2.strip(), 'Column3': Column3.strip(), 'Column4': Column4.strip()}, ignore_index=True)
else:
print("File is not having a row for merging instances - Please check the file manually for instance - ")
except:
print("Error - While merging the rows")
return b
上面的函数是我用来合并行的函数,这样我就可以去掉行之间的空间。例如,我有一个数据框,如下所示
df=[['7','4','5','7','8'],["","","",'7','4'],['9','4','7','8','4'],["","","",'7','5'],['4','8','5','4','6']]
df=pd.DataFrame(df)
df.columns=[f'Column{i}' for i in range(0, len(df.columns))]
Column0 Column1 Column2 Column3 Column4
7 4 5 7 8
7 4
9 4 7 8 4
7 5
4 8 5 4 6
函数rowmerge
删除了行之间的空间,并给出了如下所示的数据帧
rowMerger(df,0)
Column1 Column2 Column3 Column4 Column5
7 4 5 77 84
9 4 7 87 45
4 8 5 4 6
但是,此功能不是动态的。也就是说,变量b
的列数是手动确定的。相反,我希望根据变量a
的列数,使函数内部生成的列数成为动态的。例如,如果a
的列数是三列,我想创建三列(Column0
,Column0
,Column0
),并将值附加到这些列,然后返回一个包含三列的数据帧
def rowMerger(a,b):
try:
b = pd.DataFrame(data=None, columns =[f'Column{i}' for i in range(0, len(a.columns))])
rule1 = lambda x: x not in ['']
u = a.loc[a['Column0'].apply(rule1) & a['Column1'].apply(rule1) & a['Column2'].apply(rule1)].index
findMergerindexs = list(u)
findMergerindexs.sort()
a = pd.DataFrame(a)
if (len(findMergerindexs) > 0):
for m in range(len(findMergerindexs)):
if not (m == (len(findMergerindexs)-1)):
startLoop = findMergerindexs[m]
endLoop = findMergerindexs[m+1]
else:
startLoop = findMergerindexs[m]
endLoop = len(a)
Column0 = ''
Column1 = ''
Column2 = ''
Column3 = ''
Column4 = ''
for n in range(startLoop,endLoop):
Column0 = Column0 + str(a.iloc[n,0])
Column1 = Column1 + str(a.iloc[n,1])
Column2 = Column2 + str(a.iloc[n,2])
Column3 = Column3 + str(a.iloc[n,3])
Column4 = Column4 + str(a.iloc[n,4])
b = b.append({'Column0': Column0.strip(), 'Column1': Column1.strip(), 'Column2': Column2.strip(), 'Column3': Column3.strip(), 'Column4': Column4.strip()}, ignore_index=True)
else:
print("File is not having a row for merging instances - Please check the file manually for instance - ")
except:
print("Error - While merging the rows")
return b
我已经尽力了,但这是我力所不及的。我仍在学习python,如果有人能帮助我,我将不胜感激;它适用于您提供的示例,但您必须调整它以适应许多其他场景:其思想是找到具有空字符串的行,获取这些行的列,组合它们,并以某种方式将它们传递回原始数据帧。我在代码中添加注释;希望他们能很好地解释这一点。让我知道怎么回事。其他人可能会有一个更好的,所以只要玩它和c
def process_data(df):
#convert to string
#easier to merge rows
df = df.astype(str)
#find rows where there are empty strings
empty_rows_index = df.loc[df.eq('').any(axis=1)].index
#find columns where there are no empty strings
non_empty_cols = df.loc[:,df.ne('').all()].columns.tolist()
#this gets us the index above the rows with empty strings
empty_rows_pair = [[ind-1,ind] for ind in empty_rows_index]
#pair index with columns
rows_cols = [[entry,non_empty_cols] for entry in empty_rows_pair]
#this combines the columns where empty strings are in the next row
#with the non empty string row in the previous column
lump = [df.loc[x,y].sum().astype('int') for x,y in rows_cols]
#combine and flip, so that the column names are the headers
merger = pd.concat(lump,axis=1).T
#to ensure complete reintegration back to the dataframe
#set the merger index to the previous row index
merger.index = [i for i,j in empty_rows_pair]
#drop the empty string rows
df = df.drop(empty_rows_index)
#set the rows in df to match with
#the rows and columns in merger
#and assign merger to that section
df.loc[merger.index,merger.columns] = merger
df = df.astype(int).reset_index(drop=True)
return df
process_data(df)
Column0 Column1 Column2 Column3 Column4
0 7 4 5 77 84
1 9 4 7 87 45
2 4 8 5 4 6
IIUC,你想删除列中的空格,对吗?这是你的主要目标?@IIUC,是的,但也要将行合并到上面的行中。但是,我想考虑输入数据帧的列数。我希望对输入df的不同列数获得相同的结果。