R 数据帧中两列之间的字符串匹配
我有一个数据框(df),包含“地址”、“地区”和“州”三列: 输入:R 数据帧中两列之间的字符串匹配,r,string-matching,R,String Matching,我有一个数据框(df),包含“地址”、“地区”和“州”三列: 输入: Address District State 132, 1st block, Mysore,Karnataka Mysore Karnataka 24, 4th Block, Jayanagar India Bangalore Karnataka Prestige owen, M.G Road
Address District State
132, 1st block, Mysore,Karnataka Mysore Karnataka
24, 4th Block, Jayanagar India Bangalore Karnataka
Prestige owen, M.G Road Bangalore Karnataka
Opp: Reliance trend, Mantri Mall,-Delhi New Delhi New Delhi
基本上,我想标记(作为新列)列“District”下的实体在地址列下的行
预期产出:
Address District State Dist_match
132, 1st block, Mysore,Karnataka Mysore Karnataka TRUE
24, 4th Block, Jayanagar India Bangalore Karnataka FALSE
Prestige owen, M.G Road Bangalore Karnataka FALSE
Opp: Reliance trend,
Mantri Mall,-Delhi New Delhi New Delhi TRUE
我尝试了以下方法,但收到了警告,效果不佳
df$Dist_match <- mapply(grepl, pattern=df$District, x=df$Address)
df$Dist_match为什么前两行有#
?@Pascal正确,删除它以避免混淆它对我有效(基于提供的示例)mapply(grepl,pattern=df$District,x=df$Address)#Mysore Bangalore Bangalore New Delhi TRUE FALSE FALSE
此案例地址:“CHOWK,MAUNATH BHANJAN DIST,MAU-220187”和地区:“MAU”代码返回“FALSE”,模式是什么?为什么前两行有#
?@Pascal是正确的,删除它以避免混淆它为我工作(基于提供的示例)mapply(grepl,pattern=df$District,x=df$Address)#Mysore Bangalore Bangalore New Delhi TRUE FALSE FALSE
本案例地址:“CHOWK,MAUNATH BHANJAN区,MAU-220187”和地区:“MAU”代码返回“FALSE”,其模式是什么?