Python 列中的连接词_Python_String_Pandas_Dataframe_Nlp

Python 列中的连接词

python string pandas dataframe nlp

Python 列中的连接词,python,string,pandas,dataframe,nlp,Python,String,Pandas,Dataframe,Nlp,背景我有以下代码 import pandas as pd #create df df = pd.DataFrame({'Before' : ['there are many different', 'i like a lot of sports ', 'the middle east has many '], 'After' :

背景

我有以下代码

import pandas as pd
#create df
df = pd.DataFrame({'Before' : ['there are many different', 
                               'i like a lot of sports ', 
                               'the middle east has many '], 
                   'After' : ['in the bright blue box', 
                               'because they go really fast ', 
                               'to ride and have fun '],

                  'P_ID': [1,2,3], 
                  'Word' : ['crayons', 'cars', 'camels'],
                  'N_ID' : ['A1', 'A2', 'A3']

                 })

#rearrange
df = df[['P_ID', 'N_ID', 'Before', 'Word','After']]

这将创建以下

df

  P_ID  N_ID    Before                 Words       After
0   1   A1   there are many different   crayons     in the bright blue box
1   2   A2  i like a lot of sports      cars      because they go really fast
2   3   A3  the middle east has many    camels      to ride and have fun

目标

1）将

前

和

后

列中的单词与

单词

列中的单词连接起来

2）创建一个

新列

所需输出

具有以下输出的

新列

new_column
there are many different crayons in the bright blue box
i like a lot of sports cars because they go really fast
the middle east has many camels to ride and have fun

问题

如何实现我的目标？

您可以添加以下列：

df['new_column']=df['Before']+''+df['Word']+''+df['Before']

以下是完整的代码：

import pandas as pd
#create df
df = pd.DataFrame({'Before' : ['there are many different', 
                               'i like a lot of sports ', 
                               'the middle east has many '], 
                   'After' : ['in the bright blue box', 
                               'because they go really fast ', 
                               'to ride and have fun '],

                  'P_ID': [1,2,3], 
                  'Word' : ['crayons', 'cars', 'camels'],
                  'N_ID' : ['A1', 'A2', 'A3']

                 })

#rearrange
df = df[['P_ID', 'N_ID', 'Word', 'Before', 'After']]
df['new_column'] = df['Before'] + ' ' + df['Word'] + ' ' + df['After']
df['new_column']

您可以按照上面的建议添加列，或者对许多可能出现的类似问题提供更通用的解决方案

df['new_column']=df.apply(lambda x: x.Before+x.Word+x.After, axis=1)

可以使用.str访问器的cat（）方法

df['New_column'] = df['Before'].str.cat(df[['Word','After']],sep=" ")

cat（）甚至允许添加分隔符
连接多个列只是将序列列表或包含除第一列以外的所有列的数据帧作为参数传递给在第一列上调用的str.cat（）（之前）：

代码：

df['New_column'] = df['Before'].str.cat(df[['Word','After']],sep=" ")

import pandas as pd
#create df
df = pd.DataFrame({'Before' : ['there are many different',
                               'i like a lot of sports ',
                               'the middle east has many '],
                   'After' : ['in the bright blue box',
                               'because they go really fast ',
                               'to ride and have fun '],

                  'P_ID': [1,2,3],
                  'Word' : ['crayons', 'cars', 'camels'],
                  'N_ID' : ['A1', 'A2', 'A3']

                 })

#rearrange
df = df[['P_ID', 'N_ID', 'Before', 'Word','After']]
print (df)
df['New_column'] = df['Before'].str.cat(df[['Word','After']],sep=" ")
print (df)