Python 是否从数据帧单元格中的字符串中删除单词/字符？_Python_Pandas

Python 是否从数据帧单元格中的字符串中删除单词/字符？

python pandas

Python 是否从数据帧单元格中的字符串中删除单词/字符？,python,pandas,Python,Pandas,我有一个包含街道交叉点的列的数据框 | Locations | -------------------------------- |W Madison Ave & S Randall Blvd| |N Clemson St & E Tower Ave | |E Thompson St & S Garfield Ln | 我想删除方向字符（N、S、E、W）以及街道的后缀（Blvd、St、Ave等），以便我的输出如下所示 |

我有一个包含街道交叉点的列的数据框

|          Locations           |
--------------------------------
|W Madison Ave & S Randall Blvd|
|N Clemson St & E Tower Ave    |
|E Thompson St & S Garfield Ln |

我想删除方向字符（N、S、E、W）以及街道的后缀（Blvd、St、Ave等），以便我的输出如下所示

|     Locations     |
---------------------
|Madison & Randall  |
|Clemson & Tower    |
|Thompson & Garfield|

我无法执行

str.replace（）

，因为这将从我需要保留的单词中删除字符。我尝试使用

lstrip（）

和

rstrip（）

但这无法修复我希望从字符串中间删除的字符

我还尝试了

Series.apply（）

但这实际上是执行一个

str.replace（）

，并将所有内容放在数据帧的列表中。

您很接近-您可以先拆分值，然后加入

：
f = lambda x: ' '.join([item for item in x.split() if item not in banned])
df["Locations"] = df["Locations"].apply(f)

或列表理解
：
df["Locations"] = [' '.join([item for item in x.split() 
                  if item not in banned]) 
                  for x in df["Locations"]]


print (df)
             Locations
0    Madison & Randall
1      Clemson & Tower
2  Thompson & Garfield

您很接近-您可以先拆分值，然后加入：
f = lambda x: ' '.join([item for item in x.split() if item not in banned])
df["Locations"] = df["Locations"].apply(f)

或列表理解
：
df["Locations"] = [' '.join([item for item in x.split() 
                  if item not in banned]) 
                  for x in df["Locations"]]


print (df)
             Locations
0    Madison & Randall
1      Clemson & Tower
2  Thompson & Garfield

可能使用您提到的替换
df.replace(dict(zip(banned,['']*len(banned))),regex=True)
Out[54]: 
                      Locations           
0           Madison  &  Randall 
1            Clemson t &  Tower     
2        Thompson t &  Garfield  

可能使用您提到的替换
df.replace(dict(zip(banned,['']*len(banned))),regex=True)
Out[54]: 
                      Locations           
0           Madison  &  Randall 
1            Clemson t &  Tower     
2        Thompson t &  Garfield  

作为删除不想要的单词的替代方法，您可以选择您想要的单词。由于示例行遵循相同的模式，看起来您希望选择第2个和第6个单词，并使用它们来创建位置的新名称。这看起来像这样：
df['new_location'] = ''

for i,location in enumerate(df.Locations):
        df.new_location.iloc[i] = location.split(' ')[1] +' & ' +location.split(' ')[5]

作为删除不想要的单词的替代方法，您可以选择您想要的单词。由于示例行遵循相同的模式，看起来您希望选择第2个和第6个单词，并使用它们来创建位置的新名称。这看起来像这样：
df['new_location'] = ''

for i,location in enumerate(df.Locations):
        df.new_location.iloc[i] = location.split(' ')[1] +' & ' +location.split(' ')[5]

给定的s
是以下系列
：
0    |          Locations           |
1    --------------------------------
2    |W Madison Ave & S Randall Blvd|
3    |N Clemson St & E Tower Ave    |
4    |E Thompson St & S Garfield Ln |
Name: 0, dtype: object

可以使用以下正则表达式
s.str.replace('(?:E|W|N|St?|Blvd|Ave|Ln)', '')

得到
0    |          Locations           |
1    --------------------------------
2             | Madison  &  Randall |
3           | Clemson  &  Tower     |
4          | Thompson  &  Garfield  |
Name: 0, dtype: object

给定的s
是以下系列
：
0    |          Locations           |
1    --------------------------------
2    |W Madison Ave & S Randall Blvd|
3    |N Clemson St & E Tower Ave    |
4    |E Thompson St & S Garfield Ln |
Name: 0, dtype: object

可以使用以下正则表达式
s.str.replace('(?:E|W|N|St?|Blvd|Ave|Ln)', '')

得到
0    |          Locations           |
1    --------------------------------
2             | Madison  &  Randall |
3           | Clemson  &  Tower     |
4          | Thompson  &  Garfield  |
Name: 0, dtype: object

如果df.col.apply（lambda r:''.join（k代表r.split（）中的k，如果不是np.isin（k，禁止）），则应用将起作用。
所有条目是否遵循完全相同的模式？因为这样，只显示第二个和第六个字符串可能会更容易，在空格上拆分后，如果df.col.apply（lambda r:''.join（k代表r.split（）中的k，如果不是np.isin（k，禁止）），则应用将起作用。
所有条目都遵循完全相同的模式吗？因为这样，只显示第二个和第六个字符串可能更容易，在空格上拆分后，您需要按从最大字符串到最小字符串的顺序显示它们df.Locations.str.replace（f“\s*（{'|'.join（已排序（禁止，key=len，reverse=True））}）\s*”，“”）.str.strip（）
您需要按从最大字符串到最小字符串的顺序排列它们df.Locations.str.replace（f“\s*（{'|'.join（sorted（disabled，key=len，reverse=True）））））\s*”，“”.str.strip（）
你知道我可以把.lower（）
放在哪里来覆盖我所有的基础吗？数据中包含一些大写的禁止单词。@ATCH_-train-然后使用bannd1=[a.lower（）表示禁止中的a]f=lambda x:''.join（[item for item in x.split（）if item.lower（）not in bannd1]）
你知道我可以把.lower（）
放在哪里来覆盖我所有的基础吗？数据中包含一些大写的禁止单词。@ATCH_-train-然后使用banned1=[a.lower（）表示禁止中的a]f=lambda x:''.join（[x.split（）中的项表示x.split（）如果项.lower（）不在banned1中]）