Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/309.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/ios/107.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如果一个字符串列包含在数据库中的另一列中,则合并两个数据帧_Python_String_Pandas_Dataframe - Fatal编程技术网

Python 如果一个字符串列包含在数据库中的另一列中,则合并两个数据帧

Python 如果一个字符串列包含在数据库中的另一列中,则合并两个数据帧,python,string,pandas,dataframe,Python,String,Pandas,Dataframe,我需要根据以下条件合并以下df1和df2:如果df1中的address包含state在df2中 df1: address \ 0 Cecilia Chapman 711-2880 Nulla St. Mankato Mississippi 96522 (257) 563-7401 1 Iris Watson P.O. Box 2

我需要根据以下条件合并以下
df1
df2
:如果
df1
中的
address
包含
state
df2

df1:

                                                                           address  \
0      Cecilia Chapman 711-2880 Nulla St. Mankato Mississippi 96522 (257) 563-7401   
1  Iris Watson P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 20620 (372) 587-2335   
2    Celeste Slater 606-3727 Ullamcorper. Street Roseville NH 11523 (786) 713-8616   
3            Theodore Lowe Ap #867-859 Sit Rd. Azusa New York 39531 (793) 151-6230   
4                 Calista Wise 7292 Dictum Av. San Antonio MI 47096 (492) 709-6392   

   quantity  price  
0         2     20  
1         3     13  
2         5     23  
3         3     32  
4         5     45  
   id        state
0   1  Mississippi
1   2     Nebraska
2   3     New York
df2:

                                                                           address  \
0      Cecilia Chapman 711-2880 Nulla St. Mankato Mississippi 96522 (257) 563-7401   
1  Iris Watson P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 20620 (372) 587-2335   
2    Celeste Slater 606-3727 Ullamcorper. Street Roseville NH 11523 (786) 713-8616   
3            Theodore Lowe Ap #867-859 Sit Rd. Azusa New York 39531 (793) 151-6230   
4                 Calista Wise 7292 Dictum Av. San Antonio MI 47096 (492) 709-6392   

   quantity  price  
0         2     20  
1         3     13  
2         5     23  
3         3     32  
4         5     45  
   id        state
0   1  Mississippi
1   2     Nebraska
2   3     New York
我的预期输出将如下所示。我怎么能这么做?多谢各位

                                                                           address  \
0      Cecilia Chapman 711-2880 Nulla St. Mankato Mississippi 96522 (257) 563-7401   
1  Iris Watson P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 20620 (372) 587-2335   
2    Celeste Slater 606-3727 Ullamcorper. Street Roseville NH 11523 (786) 713-8616   
3            Theodore Lowe Ap #867-859 Sit Rd. Azusa New York 39531 (793) 151-6230   
4                 Calista Wise 7292 Dictum Av. San Antonio MI 47096 (492) 709-6392   

   quantity  price   id        state  
0         2     20  1.0  Mississippi  
1         3     13  2.0     Nebraska  
2         5     23  NaN          NaN  
3         3     32  3.0     New York  
4         5     45  NaN          NaN  
更新:pat='|'.join(r“\b{}\b.”的输出,df2['state']中x的格式(x); 打印(df1['address'].str.extract('('+pat+'),expand=False))


您可以通过使用
\b\b
将单词边界提取到新列,然后使用左连接进行
合并来提取所有可能的状态:

pat = '|'.join(r"\b{}\b".format(x) for x in df2['state'])
df1['state']= df1['address'].str.extract('('+ pat + ')', expand=False)
print (df1)
                                             address  quantity  price  \
0  Cecilia Chapman 711-2880 Nulla St. Mankato Mis...         2     20   
1  Iris Watson P.O. Box 283 8562 Fusce Rd. Freder...         3     13   
2  Celeste Slater 606-3727 Ullamcorper. Street Ro...         5     23   
3  Theodore Lowe Ap #867-859 Sit Rd. Azusa New Yo...         3     32   
4  Calista Wise 7292 Dictum Av. San Antonio MI 47...         5     45   

         state  
0  Mississippi  
1     Nebraska  
2          NaN  
3     New York  
4          NaN  

df = df1.merge(df2, on='state', how='left')
print (df)
                                             address  quantity  price  \
0  Cecilia Chapman 711-2880 Nulla St. Mankato Mis...         2     20   
1  Iris Watson P.O. Box 283 8562 Fusce Rd. Freder...         3     13   
2  Celeste Slater 606-3727 Ullamcorper. Street Ro...         5     23   
3  Theodore Lowe Ap #867-859 Sit Rd. Azusa New Yo...         3     32   
4  Calista Wise 7292 Dictum Av. San Antonio MI 47...         5     45   

         state   id  
0  Mississippi  1.0  
1     Nebraska  2.0  
2          NaN  NaN  
3     New York  3.0  
4          NaN  NaN  

谢谢,但我不明白
str.extract(“(“+…+”)”
中的两个
括号,你能解释更多吗?@ahbon-这是因为匹配的正则表达式模式需要
(regex)
,所以在
pat
中添加了
()
,是的,我想是的。@ahbon-如果更改
pat='.'124;',可能会进行测试。join(r”\b{\b}\b)。format(x)对于df2['state']]中的x而言
导入re
pat=''124;'。对于df2['state']]中的x而言,连接(r“\b{}\b).格式化(re.escape(x))
?没问题。谢谢。:)但我认为逻辑应该是一样的,只是英文字符有更多的空间来拆分单词。对不起,我不这么认为(