Python 分裂柱_Python_Pandas - Fatal编程技术网

Python 分裂柱

python pandas

Python 分裂柱,python,pandas,Python,Pandas,我有一个字符串列，我希望根据字符串将其拆分为三列。该列如下所示 full_string x a b c d e m n o y m n y d e f d e f x和y是前缀。我想将此列转换为三列 prefix_string first_string last_string x a c d e m o y m

我有一个字符串列，我希望根据字符串将其拆分为三列。该列如下所示

full_string
x a b c
d e
m n o
y m n
y d e f
d e f

和

是前缀。我想将此列转换为三列

prefix_string  first_string last_string
x              a            c
               d            e
               m            o
y              m            n
y              d            f
               d            f

我有这个密码

df['first_string'] = df[df['full_string'].str.split().str.len() == 2]['full_string'].str.split().str[0] 
df['first_string'] = df[df['full_string'].str.split().str.len() > 2]['full_string'].str.split().str[1]

df['last_string'] = df['full_string'].str.split().str[-1]

prefix_string = ['x', 'y'] 
df['prefix_string'] = df[df['full_string'].str.split().str[0].isin(prefix_string)]['full_string'].str.split().str[0]

对于

第一个\u字符串

，此代码无法正常工作。是否有方法提取

第一个字符串

，而不考虑

前缀字符串

和字符串长度？

而不是上述代码中的这些行：

df['first_string'] = df[df['full_string'].str.split().str.len() == 2]['full_string'].str.split().str[0] 
df['first_string'] = df[df['full_string'].str.split().str.len() > 2]['full_string'].str.split().str[1]

使用

split（）

，

contains（）

和

fillna（）

方法：

df['first_string']=df['full_string'].str.split(expand=True).loc[~df['full_string'].str.split(expand=True)[0].str.contains('x|y'),0]
df['first_string']=df['first_string'].fillna(df['full_string'].str.split(expand=True)[1])

df的输出

：

    full_string     first_string    last_string     prefix_string
0   x a b c             a               c                   x
1   d e                 d               e                   NaN
2   m n o               m               o                   NaN
3   y m n               m               n                   y
4   y d e f             d               f                   y
5   d e f               d               f                   NaN

尝试使用

numpy.where

和

pandas.Series.str.split

：

将numpy导入为np
前缀_str=[“x”，“y”]
res=df[“完整字符串”].str.split（“”，expand=True）.ffill（axis=1）
res[“last_string”]=res.iloc[：，-1]
res[“prefix_string”]=np.where（res[0].isin（prefix_str），res[0]，“”）
res[“first_string”]=np.where（res[“prefix_string”].ne（“”，res[1]，res[0]）
res=res[[“前缀字符串”、“第一个字符串”、“最后一个字符串”]]

产出：

前缀\u字符串第一个\u字符串最后一个\u字符串
0 x a c
1 d e
200万
3 y m n
4 y d f
5df

resdf出现错误。更新了我的答案…请查看