Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/359.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在Python中,将一列中的多个字符替换为NaN_Python_Regex_Pandas_Dataframe_Replace - Fatal编程技术网

在Python中,将一列中的多个字符替换为NaN

在Python中,将一列中的多个字符替换为NaN,python,regex,pandas,dataframe,replace,Python,Regex,Pandas,Dataframe,Replace,我想替换字符串列中的位置词:如果它们单独出现或以多个形式出现,但与、和空格相连 id strings 0 1 south 1 2 north 2 3 east 3 4 west 4 5

我想替换
字符串
列中的位置词:如果它们单独出现或以多个形式出现,但与
空格
相连

    id                         strings
0    1                           south
1    2                           north
2    3                            east
3    4                            west
4    5               west, east, south
5    6                      west, west
6    7                    north, north
7    8                    north, south
8    9  West Corporation global office
9   10                     West-Riding
10  11      University of West Florida
11  12                       Southwest
我的预期结果是这样的。请注意,如果它们是短语或单词的组成部分,那么我不需要替换它们

有可能吗?多谢各位

    id                         strings
0    1                             NaN
1    2                             NaN
2    3                             NaN
3    4                             NaN
4    5                             NaN
5    6                             NaN
6    7                             NaN
7    8                             NaN
8    9  West Corporation global office
9   10                     West-Riding
10  11      University of West Florida
11  12                       Southwest
下面的代码可以工作,但我只是想知道是否有一些更简洁的方法

df['strings'].astype(str).replace('south', np.nan).replace('north', np.nan)\
.replace('west', np.nan).replace('east', np.nan).replace('west, east', np.nan)\
.replace('west, west', np.nan).replace('north, north', np.nan).replace('west, east', np.nan)\
.replace('north, south', np.nan)
第一次使用,正向填充替换缺失值,测试所有匹配值是否符合掩码和,最后设置缺失值是否符合:

另一个关于
set
s、
isdisjoint
和:

第一次使用,正向填充替换缺失值,测试所有匹配值是否符合掩码和,最后设置缺失值是否符合:

另一个关于
set
s、
isdisjoint
和:

使用正则表达式

Ex:

df = pd.DataFrame({'strings': ['south', 'north', 'east', 'west', 'west, east, south', 'west, west', 'north, north', 'north, south', 'West Corporation global office', 'West-Riding', 'University of West Florida', 'Southwest']})
df['R'] = df['strings'].replace(r"\b(south|north|east|west)\b,?", np.NAN, regex=True)
print(df)
                           strings                               R
0                            south                             NaN
1                            north                             NaN
2                             east                             NaN
3                             west                             NaN
4                west, east, south                             NaN
5                       west, west                             NaN
6                     north, north                             NaN
7                     north, south                             NaN
8   West Corporation global office  West Corporation global office
9                      West-Riding                     West-Riding
10      University of West Florida      University of West Florida
11                       Southwest                       Southwest
输出:

df = pd.DataFrame({'strings': ['south', 'north', 'east', 'west', 'west, east, south', 'west, west', 'north, north', 'north, south', 'West Corporation global office', 'West-Riding', 'University of West Florida', 'Southwest']})
df['R'] = df['strings'].replace(r"\b(south|north|east|west)\b,?", np.NAN, regex=True)
print(df)
                           strings                               R
0                            south                             NaN
1                            north                             NaN
2                             east                             NaN
3                             west                             NaN
4                west, east, south                             NaN
5                       west, west                             NaN
6                     north, north                             NaN
7                     north, south                             NaN
8   West Corporation global office  West Corporation global office
9                      West-Riding                     West-Riding
10      University of West Florida      University of West Florida
11                       Southwest                       Southwest
使用正则表达式

Ex:

df = pd.DataFrame({'strings': ['south', 'north', 'east', 'west', 'west, east, south', 'west, west', 'north, north', 'north, south', 'West Corporation global office', 'West-Riding', 'University of West Florida', 'Southwest']})
df['R'] = df['strings'].replace(r"\b(south|north|east|west)\b,?", np.NAN, regex=True)
print(df)
                           strings                               R
0                            south                             NaN
1                            north                             NaN
2                             east                             NaN
3                             west                             NaN
4                west, east, south                             NaN
5                       west, west                             NaN
6                     north, north                             NaN
7                     north, south                             NaN
8   West Corporation global office  West Corporation global office
9                      West-Riding                     West-Riding
10      University of West Florida      University of West Florida
11                       Southwest                       Southwest
输出:

df = pd.DataFrame({'strings': ['south', 'north', 'east', 'west', 'west, east, south', 'west, west', 'north, north', 'north, south', 'West Corporation global office', 'West-Riding', 'University of West Florida', 'Southwest']})
df['R'] = df['strings'].replace(r"\b(south|north|east|west)\b,?", np.NAN, regex=True)
print(df)
                           strings                               R
0                            south                             NaN
1                            north                             NaN
2                             east                             NaN
3                             west                             NaN
4                west, east, south                             NaN
5                       west, west                             NaN
6                     north, north                             NaN
7                     north, south                             NaN
8   West Corporation global office  West Corporation global office
9                      West-Riding                     West-Riding
10      University of West Florida      University of West Florida
11                       Southwest                       Southwest

感谢您提供此解决方案,
\b
表示不匹配,对吗?@ahbon。不,这意味着边界感谢您提供此解决方案,
\b
表示不匹配,对吗?@ahbon。不,这意味着边界