Python 如果一个字符串列包含在数据库中的另一列中,则合并两个数据帧
我需要根据以下条件合并以下Python 如果一个字符串列包含在数据库中的另一列中,则合并两个数据帧,python,string,pandas,dataframe,Python,String,Pandas,Dataframe,我需要根据以下条件合并以下df1和df2:如果df1中的address包含state在df2中 df1: address \ 0 Cecilia Chapman 711-2880 Nulla St. Mankato Mississippi 96522 (257) 563-7401 1 Iris Watson P.O. Box 2
df1
和df2
:如果df1
中的address
包含state
在df2
中
df1:
address \
0 Cecilia Chapman 711-2880 Nulla St. Mankato Mississippi 96522 (257) 563-7401
1 Iris Watson P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 20620 (372) 587-2335
2 Celeste Slater 606-3727 Ullamcorper. Street Roseville NH 11523 (786) 713-8616
3 Theodore Lowe Ap #867-859 Sit Rd. Azusa New York 39531 (793) 151-6230
4 Calista Wise 7292 Dictum Av. San Antonio MI 47096 (492) 709-6392
quantity price
0 2 20
1 3 13
2 5 23
3 3 32
4 5 45
id state
0 1 Mississippi
1 2 Nebraska
2 3 New York
df2:
address \
0 Cecilia Chapman 711-2880 Nulla St. Mankato Mississippi 96522 (257) 563-7401
1 Iris Watson P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 20620 (372) 587-2335
2 Celeste Slater 606-3727 Ullamcorper. Street Roseville NH 11523 (786) 713-8616
3 Theodore Lowe Ap #867-859 Sit Rd. Azusa New York 39531 (793) 151-6230
4 Calista Wise 7292 Dictum Av. San Antonio MI 47096 (492) 709-6392
quantity price
0 2 20
1 3 13
2 5 23
3 3 32
4 5 45
id state
0 1 Mississippi
1 2 Nebraska
2 3 New York
我的预期输出将如下所示。我怎么能这么做?多谢各位
address \
0 Cecilia Chapman 711-2880 Nulla St. Mankato Mississippi 96522 (257) 563-7401
1 Iris Watson P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 20620 (372) 587-2335
2 Celeste Slater 606-3727 Ullamcorper. Street Roseville NH 11523 (786) 713-8616
3 Theodore Lowe Ap #867-859 Sit Rd. Azusa New York 39531 (793) 151-6230
4 Calista Wise 7292 Dictum Av. San Antonio MI 47096 (492) 709-6392
quantity price id state
0 2 20 1.0 Mississippi
1 3 13 2.0 Nebraska
2 5 23 NaN NaN
3 3 32 3.0 New York
4 5 45 NaN NaN
更新:pat='|'.join(r“\b{}\b.”的输出,df2['state']中x的格式(x);
打印(df1['address'].str.extract('('+pat+'),expand=False))
您可以通过使用
\b\b
将单词边界提取到新列,然后使用左连接进行合并来提取所有可能的状态:
pat = '|'.join(r"\b{}\b".format(x) for x in df2['state'])
df1['state']= df1['address'].str.extract('('+ pat + ')', expand=False)
print (df1)
address quantity price \
0 Cecilia Chapman 711-2880 Nulla St. Mankato Mis... 2 20
1 Iris Watson P.O. Box 283 8562 Fusce Rd. Freder... 3 13
2 Celeste Slater 606-3727 Ullamcorper. Street Ro... 5 23
3 Theodore Lowe Ap #867-859 Sit Rd. Azusa New Yo... 3 32
4 Calista Wise 7292 Dictum Av. San Antonio MI 47... 5 45
state
0 Mississippi
1 Nebraska
2 NaN
3 New York
4 NaN
df = df1.merge(df2, on='state', how='left')
print (df)
address quantity price \
0 Cecilia Chapman 711-2880 Nulla St. Mankato Mis... 2 20
1 Iris Watson P.O. Box 283 8562 Fusce Rd. Freder... 3 13
2 Celeste Slater 606-3727 Ullamcorper. Street Ro... 5 23
3 Theodore Lowe Ap #867-859 Sit Rd. Azusa New Yo... 3 32
4 Calista Wise 7292 Dictum Av. San Antonio MI 47... 5 45
state id
0 Mississippi 1.0
1 Nebraska 2.0
2 NaN NaN
3 New York 3.0
4 NaN NaN
谢谢,但我不明白str.extract(“(“+…+”)”
中的两个括号,你能解释更多吗?@ahbon-这是因为匹配的正则表达式模式需要(regex)
,所以在pat
中添加了()
,是的,我想是的。@ahbon-如果更改pat='.'124;',可能会进行测试。join(r”\b{\b}\b)。format(x)对于df2['state']]中的x而言
到导入re
和pat=''124;'。对于df2['state']]中的x而言,连接(r“\b{}\b).格式化(re.escape(x))
?没问题。谢谢。:)但我认为逻辑应该是一样的,只是英文字符有更多的空间来拆分单词。对不起,我不这么认为(