Python 使用正则表达式将具有多个值的字典映射到键_Python_Regex_Python 3.x_Pandas_Dictionary

Python 使用正则表达式将具有多个值的字典映射到键

python regex python-3.x pandas dictionary

Python 使用正则表达式将具有多个值的字典映射到键,python,regex,python-3.x,pandas,dictionary,Python,Regex,Python 3.x,Pandas,Dictionary,位置列示例： file= pd.DataFrame(columns = ['location']) file['location'] = ['India, city3','city3','city2','china'] 新dict示例它是一个默认dict： new_dict = {'India':['India','city1', 'city2', 'city3'],'China':['China','city4','city5']} 预期产出： India India India Chin

位置列示例：

file= pd.DataFrame(columns = ['location'])
file['location'] = ['India, city3','city3','city2','china']

新dict示例它是一个默认dict：

new_dict = {'India':['India','city1', 'city2', 'city3'],'China':['China','city4','city5']}

预期产出：

India
India
India
China

示例代码：

for x in file['location']:
    for Country,Cities in new_dict.items():
        if re.findall('(?<![a-zA-Z])'+str(Cities).lower()+'(?![a-zA-Z])', str(x).lower()) != None:
            file['COUNTRY'] = Country

我目前正试图用字典把城市映射到乡村。我正试图将一些正则表达式合并在一起，因为列位置不会提供精确的匹配。我在1408位置接收到错误字符范围I-d。请告诉我如何解决这个问题。

首先，您需要使用ChainMap展平您的新词

然后使用replace和split生成结果

sample_df.replace(d,regex=True).location.str.split(',').str[0]
Out[53]: 
0    India
1    India
2    India
3    china
Name: location, dtype: object

您正在问多个问题1如何使用字典将城市映射到国家2接收错误>位置1408I处的错误字符范围i-d我可以将城市映射到国家，但它只进行精确匹配。而不去接其他任何东西，例如印度的城市1，就不会被接走。唯一精确的匹配，如城市2或印度对不起，我没说清楚。上面的代码有效。但是，它并不适用于所有行，因为并非所有行的格式都相同。例如，“中国上海”，“中国上海”，“中国上海，三号楼”等。

sample_df.replace(d,regex=True).location.str.split(',').str[0]
Out[53]: 
0    India
1    India
2    India
3    china
Name: location, dtype: object