String 基于正则表达式的熊猫连续行Concat
我有一个包含String 基于正则表达式的熊猫连续行Concat,string,pandas,dataframe,rows,concat,String,Pandas,Dataframe,Rows,Concat,我有一个包含date的数据框,它被扭曲了 index Date Particulars 0 01-12- AVON AGRO 1 2018 NaN 2 01-12- CASH 3 2018 NaN 4 03-12- NEFTOut/UTBIN18337459966/LUNI 5 2018 A MARKETING/SBIN00019 6 03-12- ANJANI TRADERS
date
的数据框,它被扭曲了
index Date Particulars
0 01-12- AVON AGRO
1 2018 NaN
2 01-12- CASH
3 2018 NaN
4 03-12- NEFTOut/UTBIN18337459966/LUNI
5 2018 A MARKETING/SBIN00019
6 03-12- ANJANI TRADERS
7 2018 NaN
8 03-12- NEFTOut/UTBIN18337484160/BIGS
9 2018 MILE PRODUCTS/UTIB000
但我想要如下输出:
index Date Particulars
0 01-12-2018 AVON AGRO
2 01-12-2018 CASH
4 03-12-2018 NEFTOut/UTBIN18337459966/LUNIA MARKETING/SBIN00019
6 03-12-2018 ANJANI TRADERS
8 03-12-2018 NEFTOut/UTBIN18337484160/BIGSMILE PRODUCTS/UTIB000
我尝试了df.apply(lambda x:x if re.search('\d{4}$',str(x))else str(x.shift(-1))+str(x))
,但它给了我:
Date 0 2018\n1 01-12-\n2 2018...
Particulars 0 NaN\n1 ...
dtype: object
首先将缺少的值替换为空字符串,然后通过
groupby
与join
将inpair和pair行连接起来:
df1 = df.fillna('').groupby(df.index // 2).agg(''.join)
print (df1)
Date Particulars
index
0 01-12-2018 AVON AGRO
1 01-12-2018 CASH
2 03-12-2018 NEFTOut/UTBIN18337459966/LUNIA MARKETING/SBIN0...
3 03-12-2018 ANJANI TRADERS
4 03-12-2018 NEFTOut/UTBIN18337484160/BIGSMILE PRODUCTS/UTI...
或选择按位置配对和取消配对:
df1 = df.fillna('')
df1 = df1.iloc[::2].reset_index(drop=True) + df1.iloc[1::2].reset_index(drop=True)
print (df1)
Date Particulars
0 01-12-2018 AVON AGRO
1 01-12-2018 CASH
2 03-12-2018 NEFTOut/UTBIN18337459966/LUNIA MARKETING/SBIN0...
3 03-12-2018 ANJANI TRADERS
4 03-12-2018 NEFTOut/UTBIN18337484160/BIGSMILE PRODUCTS/UTI...
使用正则表达式的解决方案也是可能的:
df1 = df.fillna('')
m = df1['Date'].str.contains('\d{4}$')
df1 = df1[m.shift(-1).fillna(False)].reset_index(drop=True) + df1[m].reset_index(drop=True)
非常感谢你。我喜欢正则表达式的解决方案,因为它对我很有用。