Python 熊猫中的部分拆分字符串列_Python_String_Pandas_Dataframe_Split

Python 熊猫中的部分拆分字符串列

python string pandas dataframe

Python 熊猫中的部分拆分字符串列,python,string,pandas,dataframe,split,Python,String,Pandas,Dataframe,Split,我在python中有以下数据帧： df = pd.DataFrame({'name': ['Vinay', 'Kushal', 'Aman', 'Saif'], 'age': [22, 25, 24, 28], 'occupation': ['A1|A2|A3', 'B1|B2|B3', 'C1|C2|C3', 'D1|D2|D3']}) 请注意“占领”字段，其值用“|”分隔我想向dataframe添加两个新列

我在python中有以下数据帧：

df = pd.DataFrame({'name': ['Vinay', 'Kushal', 'Aman', 'Saif'], 
                   'age': [22, 25, 24, 28], 
                    'occupation': ['A1|A2|A3', 'B1|B2|B3', 'C1|C2|C3', 'D1|D2|D3']})

请注意“占领”字段，其值用“|”分隔

我想向dataframe添加两个新列，比如new1和new2，它们的值分别为A1和A2、B1和B2等

我尝试使用以下代码实现这一点：

df['new1'] = df['occupation'].str.split("|", n = 2,expand = False)

得到的结果是：

    name    age occupation  new1
0   Vinay   22  A1|A2|A3    [A1, A2, A3]
1   Kushal  25  B1|B2|B3    [B1, B2, B3]
2   Aman    24  C1|C2|C3    [C1, C2, C3]
3   Saif    28  D1|D2|D3    [D1, D2, D3]

我不想在新字段中看到A1、A2、A3等。预期产出：

        name    age occupation  new1 new2
    0   Vinay   22  A1|A2|A3    [A1] [A2]
    1   Kushal  25  B1|B2|B3    [B1] [B2]
    2   Aman    24  C1|C2|C3    [C1] [C2]
    3   Saif    28  D1|D2|D3    [D1] [D2]

请建议可能的解决方案。

对于性能，请使用带有列表理解的

str.split

：

u = pd.DataFrame([
    x.split('|')[:2] for x in df.occupation], columns=['new1', 'new2'], index=df.index)
u

  new1 new2
0   A1   A2
1   B1   B2
2   C1   C2
3   D1   D2

pd.concat([df, u], axis=1)

     name  age occupation new1 new2
0   Vinay   22   A1|A2|A3   A1   A2
1  Kushal   25   B1|B2|B3   B1   B2
2    Aman   24   C1|C2|C3   C1   C2
3    Saif   28   D1|D2|D3   D1   D2

为什么列表理解速度很快？您可以在上阅读更多信息。

这里有一个使用正则表达式和命名捕获组的选项。您可以通过在解释器中运行

pd.Series.str.extract？

来参考docstring以了解更多详细信息

# get the new columns in a separate dataframe
df_ = df['occupation'].str.extract('^(?P<new1>\w{2})\|(?P<new2>\w{2})')

# add brackets around each item in the new dataframe
df_ = df_.applymap(lambda x: '[{}]'.format(x))

# add the new dataframe to your original to get the desired result
df = df.join(df_)

#在单独的数据帧中获取新列
df_=df['occulation'].str.extract（'^（？P\w{2}）\\|（？P\w{2}）'））
#在新数据框中的每个项目周围添加括号
df_u=df_u.applymap（lambda x:'[{}]'.format（x））
#将新数据帧添加到原始数据帧以获得所需的结果
df=df.join（df_u2;）