Python str.replace在数据帧中从后面开始_Python_String_Pandas

Python str.replace在数据帧中从后面开始

python string pandas

Python str.replace在数据帧中从后面开始,python,string,pandas,Python,String,Pandas,我有两个这样的专栏： string s 0 the best new york cheesecake new york ny new york 1 houston public school houston houston 我想删除string中最后出现的s。对于上下文，我的数据帧有数十万行。我知

我有两个这样的专栏：

                                       string                    s
0    the best new york cheesecake new york ny             new york
1               houston public school houston              houston

我想删除

string

中最后出现的

。对于上下文，我的数据帧有数十万行。我知道关于

str.replace

和

str.rfind

，但是没有任何东西能够实现这两者的理想组合，而且我在即兴创作解决方案时遇到了空白

提前感谢您的帮助

您可以使用

rsplit

和

join

：

df.apply(lambda x: ''.join(x['string'].rsplit(x['s'],1)),axis=1)

输出：

0    the best new york cheesecake  ny
1              houston public school 
dtype: object

                            string         s  third
0  the best new york cheesecake ny  new york      1
1           houston public school    houston      1

编辑：

输出：

0    the best new york cheesecake  ny
1              houston public school 
dtype: object

                            string         s  third
0  the best new york cheesecake ny  new york      1
1           houston public school    houston      1

选项1
矢量化的

rsplit

，具有理解力

from numpy.core.defchararray import rsplit

v = df.string.values.astype(str)
s = df.s.values.astype(str)

df.assign(string=[' '.join([x.strip() for x in y]) for y in rsplit(v, s, 1)])

                            string         s
0  the best new york cheesecake ny  new york
1           houston public school    houston

选项2
使用

re.sub

这里的正则表达式从

中查找后面没有相同值的值

import re

v = df.string.values.astype(str)
s = df.s.values.astype(str)
f = lambda i, j: re.sub(r' *{0} *(?!.*{0}.*)'.format(i), ' ', j).strip()

df.assign(string=[f(i, j) for i, j in zip(s, v)])

                            string         s
0  the best new york cheesecake ny  new york
1            houston public school   houston

很不错的。如果最后一个事件出现在字符串的中间，你可以添加一个<代码>替换< /代码>或一个类似的函数来消除分割后剩下的双空间吗？@code>df.apply（lambda x:'.join（x['string'].rsplit（x['s']，1）），axis=1.str.replace（'\s\s'，''）我添加了第三列，这似乎只保留了

字符串

列。有没有办法保存其他专栏？我很乐意。快乐编码！