Python 如何将值连接到字符串集?
让我们假设我有一个如下所示的数据帧:Python 如何将值连接到字符串集?,python,pandas,dataframe,Python,Pandas,Dataframe,让我们假设我有一个如下所示的数据帧: REFERENCE_CODE DUMMY_DATA dog foo cat fi fish fo bird fum 1 u 2 v 3 x 4 y 我的目标是创建一个数据帧,它将变成: REFERENCE_CODE DUMMY_DAT
REFERENCE_CODE DUMMY_DATA
dog foo
cat fi
fish fo
bird fum
1 u
2 v
3 x
4 y
我的目标是创建一个数据帧,它将变成:
REFERENCE_CODE DUMMY_DATA
dog foo
cat fi
fish fo
bird fum
dog_1 u
dog_2 v
dog_3 x
dog_4 y
cat_1 u
cat_2 v
cat_3 x
cat_4 y
fish_1 u
fish_2 v
fish_3 x
fish_4 y
bird_1 u
bird_2 v
bird_3 x
bird_4 y
我能够做到:
REFERENCE_CODE DUMMY_DATA
dog foo
cat fi
fish fo
bird fum
bird_1 u
bird_2 v
bird_3 x
bird_4 y
通过使用以下代码:
df.REFERENCE_CODE = df.REFERENCE_CODE.fillna('')
df['REFERENCE_CODE'] = df['REFERENCE_CODE'].apply(lambda x: str(x))
headers = (df.REFERENCE_CODE != '') & ~df['REFERENCE_CODE'].fillna('').str.isnumeric()
res = df.groupby(headers.cumsum())['REFERENCE_CODE'].apply(lambda x: x.iloc[0] + '_' + x)
df.REFERENCE_CODE.update(res[df.REFERENCE_CODE.str.isnumeric()])
如何使其应用于所有其他列并扩展数据帧,同时又不丢失其他列的完整性。思想是使用交叉连接和过滤的非数值
参考\u code
列和过滤的带有数值的行:
#simplify code
df['REFERENCE_CODE'] = df.REFERENCE_CODE.fillna('').astype(str)
mask = (df.REFERENCE_CODE != '') & ~df['REFERENCE_CODE'].str.isnumeric()
#filter by condition for matched and not matched rows
df1 = df[mask]
df2 = df[~mask]
#cross join
df = df1[['REFERENCE_CODE']].assign(A=1).merge(df2.assign(A=1), on='A')
#join columns together
df['REFERENCE_CODE'] = df['REFERENCE_CODE_x'] + '_' + df['REFERENCE_CODE_y']
#concat new DataFrame with first filtered
df = pd.concat([df1, df[['REFERENCE_CODE','DUMMY_DATA']]], ignore_index=True)
print (df)
REFERENCE_CODE DUMMY_DATA
0 dog foo
1 cat fi
2 fish fo
3 bird fum
4 dog_1 u
5 dog_2 v
6 dog_3 x
7 dog_4 y
8 cat_1 u
9 cat_2 v
10 cat_3 x
11 cat_4 y
12 fish_1 u
13 fish_2 v
14 fish_3 x
15 fish_4 y
16 bird_1 u
17 bird_2 v
18 bird_3 x
19 bird_4 y