在Python中,将一列中的多个字符替换为NaN
我想替换在Python中,将一列中的多个字符替换为NaN,python,regex,pandas,dataframe,replace,Python,Regex,Pandas,Dataframe,Replace,我想替换字符串列中的位置词:如果它们单独出现或以多个形式出现,但与、和空格相连 id strings 0 1 south 1 2 north 2 3 east 3 4 west 4 5
字符串
列中的位置词:如果它们单独出现或以多个形式出现,但与、
和空格
相连
id strings
0 1 south
1 2 north
2 3 east
3 4 west
4 5 west, east, south
5 6 west, west
6 7 north, north
7 8 north, south
8 9 West Corporation global office
9 10 West-Riding
10 11 University of West Florida
11 12 Southwest
我的预期结果是这样的。请注意,如果它们是短语或单词的组成部分,那么我不需要替换它们
有可能吗?多谢各位
id strings
0 1 NaN
1 2 NaN
2 3 NaN
3 4 NaN
4 5 NaN
5 6 NaN
6 7 NaN
7 8 NaN
8 9 West Corporation global office
9 10 West-Riding
10 11 University of West Florida
11 12 Southwest
下面的代码可以工作,但我只是想知道是否有一些更简洁的方法
df['strings'].astype(str).replace('south', np.nan).replace('north', np.nan)\
.replace('west', np.nan).replace('east', np.nan).replace('west, east', np.nan)\
.replace('west, west', np.nan).replace('north, north', np.nan).replace('west, east', np.nan)\
.replace('north, south', np.nan)
第一次使用,正向填充替换缺失值,测试所有匹配值是否符合掩码和,最后设置缺失值是否符合:
另一个关于set
s、isdisjoint
和:
第一次使用,正向填充替换缺失值,测试所有匹配值是否符合掩码和,最后设置缺失值是否符合:
另一个关于set
s、isdisjoint
和:
使用正则表达式
Ex:
df = pd.DataFrame({'strings': ['south', 'north', 'east', 'west', 'west, east, south', 'west, west', 'north, north', 'north, south', 'West Corporation global office', 'West-Riding', 'University of West Florida', 'Southwest']})
df['R'] = df['strings'].replace(r"\b(south|north|east|west)\b,?", np.NAN, regex=True)
print(df)
strings R
0 south NaN
1 north NaN
2 east NaN
3 west NaN
4 west, east, south NaN
5 west, west NaN
6 north, north NaN
7 north, south NaN
8 West Corporation global office West Corporation global office
9 West-Riding West-Riding
10 University of West Florida University of West Florida
11 Southwest Southwest
输出:
df = pd.DataFrame({'strings': ['south', 'north', 'east', 'west', 'west, east, south', 'west, west', 'north, north', 'north, south', 'West Corporation global office', 'West-Riding', 'University of West Florida', 'Southwest']})
df['R'] = df['strings'].replace(r"\b(south|north|east|west)\b,?", np.NAN, regex=True)
print(df)
strings R
0 south NaN
1 north NaN
2 east NaN
3 west NaN
4 west, east, south NaN
5 west, west NaN
6 north, north NaN
7 north, south NaN
8 West Corporation global office West Corporation global office
9 West-Riding West-Riding
10 University of West Florida University of West Florida
11 Southwest Southwest
使用正则表达式
Ex:
df = pd.DataFrame({'strings': ['south', 'north', 'east', 'west', 'west, east, south', 'west, west', 'north, north', 'north, south', 'West Corporation global office', 'West-Riding', 'University of West Florida', 'Southwest']})
df['R'] = df['strings'].replace(r"\b(south|north|east|west)\b,?", np.NAN, regex=True)
print(df)
strings R
0 south NaN
1 north NaN
2 east NaN
3 west NaN
4 west, east, south NaN
5 west, west NaN
6 north, north NaN
7 north, south NaN
8 West Corporation global office West Corporation global office
9 West-Riding West-Riding
10 University of West Florida University of West Florida
11 Southwest Southwest
输出:
df = pd.DataFrame({'strings': ['south', 'north', 'east', 'west', 'west, east, south', 'west, west', 'north, north', 'north, south', 'West Corporation global office', 'West-Riding', 'University of West Florida', 'Southwest']})
df['R'] = df['strings'].replace(r"\b(south|north|east|west)\b,?", np.NAN, regex=True)
print(df)
strings R
0 south NaN
1 north NaN
2 east NaN
3 west NaN
4 west, east, south NaN
5 west, west NaN
6 north, north NaN
7 north, south NaN
8 West Corporation global office West Corporation global office
9 West-Riding West-Riding
10 University of West Florida University of West Florida
11 Southwest Southwest
感谢您提供此解决方案,
\b
表示不匹配,对吗?@ahbon。不,这意味着边界感谢您提供此解决方案,\b
表示不匹配,对吗?@ahbon。不,这意味着边界