Python str.match不完全匹配,因为没有考虑后面的字符
我有一个CSV文件:Python str.match不完全匹配,因为没有考虑后面的字符,python,pandas,csv,Python,Pandas,Csv,我有一个CSV文件: State, Region AK, Pacific Non Continuous HI, Pacific Non Continuous AL, East South Central AZ, Mountain CA, Pacific OR, Pacific 当我跑步时: df = pd.r
State, Region
AK, Pacific Non Continuous
HI, Pacific Non Continuous
AL, East South Central
AZ, Mountain
CA, Pacific
OR, Pacific
当我跑步时:
df = pd.read_csv('C:...\input.csv')
df['SuperRegion'] = pd.np.where(df.Region.str.match("New England|Middle Atlantic|South Atlantic"), "East",
pd.np.where(df.Region.str.match("East North Central|East South Central|West North Central|West South Central"), "Mid West",
pd.np.where(df.Region.str.match("Mountain|Pacific"), "West", "Other")))
df.to_csv('C:...\Output.csv', index=False)
我希望前两行的SuperRegion
值为Other
State, Region, SuperRegion
AK, Pacific Non Continuous, **Other**
HI, Pacific Non Continuous, **Other**
AL, East South Central, Mid West
AZ, Mountain, West
CA, Pacific, West
OR, Pacific, West
但我得到的却是:
State, Region, SuperRegion
AK, Pacific Non Continuous, **West**
HI, Pacific Non Continuous, **West**
AL, East South Central, Mid West
AZ, Mountain, West
CA, Pacific, West
OR, Pacific, West
我假设当它运行时,它不会像我希望的那样区分Pacific
和Pacific Non-Continuous
。有什么建议吗 为什么不改变:
pd.np.where(df.Region.str.match("Mountain|Pacific"), "West", "Other")))
致:
或单独添加案例:
df['SuperRegion'] = pd.np.where(df.Region.str.match("New England|Middle Atlantic|South Atlantic"), "East",
pd.np.where(df.Region.str.match("East North Central|East South Central|West North Central|West South Central"), "Mid West",
pd.np.where(df.Region.str.match("Pacific Non Continuous"), "Other",
pd.np.where(df.Region.str.match("Mountain|Pacific"), "West")))
理想的解决方案是创建一个字典,其中键作为区域,值作为超区域,并使用
df['Regions'].map(dict)
您可以使用isin()
你得到
State Region SuperRegion
0 AK Pacific Non Continuous Other
1 HI Pacific Non Continuous Other
2 AL East South Central Mid West
3 AZ Mountain West
4 CA Pacific West
5 OR Pacific West
正如你上面提到的,我在比赛之前将比赛添加到,效果非常好!谢谢我很惊讶没有一个精确的匹配命令。
df['SuperRegion'] = np.where(df.Region.isin(['New England','Middle Atlantic','South Atlantic']), "East",\
np.where(df.Region.isin(["East North Central","East South Central","West North Central","West South Central"]), "Mid West",\
np.where(df.Region.isin(["Mountain","Pacific"]), "West", "Other")))
State Region SuperRegion
0 AK Pacific Non Continuous Other
1 HI Pacific Non Continuous Other
2 AL East South Central Mid West
3 AZ Mountain West
4 CA Pacific West
5 OR Pacific West