Python 列中的连接词
背景 我有以下代码Python 列中的连接词,python,string,pandas,dataframe,nlp,Python,String,Pandas,Dataframe,Nlp,背景 我有以下代码 import pandas as pd #create df df = pd.DataFrame({'Before' : ['there are many different', 'i like a lot of sports ', 'the middle east has many '], 'After' :
import pandas as pd
#create df
df = pd.DataFrame({'Before' : ['there are many different',
'i like a lot of sports ',
'the middle east has many '],
'After' : ['in the bright blue box',
'because they go really fast ',
'to ride and have fun '],
'P_ID': [1,2,3],
'Word' : ['crayons', 'cars', 'camels'],
'N_ID' : ['A1', 'A2', 'A3']
})
#rearrange
df = df[['P_ID', 'N_ID', 'Before', 'Word','After']]
这将创建以下df
P_ID N_ID Before Words After
0 1 A1 there are many different crayons in the bright blue box
1 2 A2 i like a lot of sports cars because they go really fast
2 3 A3 the middle east has many camels to ride and have fun
目标
1) 将前
和后
列中的单词与单词
列中的单词连接起来
2) 创建一个新列
所需输出
具有以下输出的新列
new_column
there are many different crayons in the bright blue box
i like a lot of sports cars because they go really fast
the middle east has many camels to ride and have fun
问题
如何实现我的目标?您可以添加以下列:
df['new_column']=df['Before']+''+df['Word']+''+df['Before']
以下是完整的代码:
import pandas as pd
#create df
df = pd.DataFrame({'Before' : ['there are many different',
'i like a lot of sports ',
'the middle east has many '],
'After' : ['in the bright blue box',
'because they go really fast ',
'to ride and have fun '],
'P_ID': [1,2,3],
'Word' : ['crayons', 'cars', 'camels'],
'N_ID' : ['A1', 'A2', 'A3']
})
#rearrange
df = df[['P_ID', 'N_ID', 'Word', 'Before', 'After']]
df['new_column'] = df['Before'] + ' ' + df['Word'] + ' ' + df['After']
df['new_column']
您可以按照上面的建议添加列,或者对许多可能出现的类似问题提供更通用的解决方案
df['new_column']=df.apply(lambda x: x.Before+x.Word+x.After, axis=1)
可以使用.str访问器的cat()方法
df['New_column'] = df['Before'].str.cat(df[['Word','After']],sep=" ")
- cat()甚至允许添加分隔符
- 连接多个列只是将序列列表或包含除第一列以外的所有列的数据帧作为参数传递给在第一列上调用的str.cat()(之前):
df['New_column'] = df['Before'].str.cat(df[['Word','After']],sep=" ")
import pandas as pd
#create df
df = pd.DataFrame({'Before' : ['there are many different',
'i like a lot of sports ',
'the middle east has many '],
'After' : ['in the bright blue box',
'because they go really fast ',
'to ride and have fun '],
'P_ID': [1,2,3],
'Word' : ['crayons', 'cars', 'camels'],
'N_ID' : ['A1', 'A2', 'A3']
})
#rearrange
df = df[['P_ID', 'N_ID', 'Before', 'Word','After']]
print (df)
df['New_column'] = df['Before'].str.cat(df[['Word','After']],sep=" ")
print (df)