Python 从依赖于另一列的列中删除字符串

Python 从依赖于另一列的列中删除字符串,python,pandas,Python,Pandas,我有一个数据帧示例: col1 col2 0 Hello, is it me you're looking for Hello 1 Hello, is it me you're looking for me 2 Hello, is it me you're looking for looking 3 Hello, is it me you're l

我有一个数据帧示例:

      col1                                   col2  
0     Hello, is it me you're looking for     Hello   
1     Hello, is it me you're looking for     me 
2     Hello, is it me you're looking for     looking 
3     Hello, is it me you're looking for     for   
4     Hello, is it me you're looking for     Lionel  
5     Hello, is it me you're looking for     Richie   
我想更改col1,以便它删除col2中的字符串,并返回ammended数据帧。我还想删除字符串前面的字符1和后面的字符1,例如,索引1所需的输出为:

      col 1                                   col 2
1     Hello, is ityou're looking for          me
我已尝试使用
pd.apply()
pd.map()
.replace()
函数,但无法使用
.replace()
pd.[col2']
用作参数。我也觉得这不是最好的方式

有什么帮助吗?我对熊猫基本上是新手,希望学习,所以请ELI5


谢谢

我的猜测是,您缺少了“axis=1”,因此应用程序不是在列上工作,而是在行上工作

A = """Hello, is it me you're looking for;Hello
Hello, is it me you're looking for;me
Hello, is it me you're looking for;looking
Hello, is it me you're looking for;for
Hello, is it me you're looking for;Lionel
Hello, is it me you're looking for;Richie
"""
df = pd.DataFrame([a.split(";") for a in A.split("\n") ][:-1],
                   columns=["col1","col2"])

df.col1 = df.apply( lambda x: x.col1.replace( x.col2, "" )  , axis=1)

为dataframe中的每一行执行一些功能可以使用:

df.apply(func, axis=1)
func将把每一行作为系列作为参数

df['col1'] = df.apply(lambda row: row['col1'].replace(row['col2'],''))
但是,在前后删除一个字符需要更多的工作

因此,定义func:

def func(row):
    c1 = row['col1'] #string col1
    c2 = row['col2'] #string col2
    find_index = c1.find(c2) #first find c2 index from left
    if find_index == -1: # not find
        return c1 #not change
    else:
        start_index = max(find_index - 1, 0) #1 before but not negative
        end_index = find_index + len(c2) +1 #1 after, python will handle index overflow
        return c1.replace(c1[start_index:end_index], '') #remove
然后:

*要避免复制警告,请使用:

df = df.assign(col1=df.apply(func, axis=1))

也许有一种更具蟒蛇风格或优雅的方式,但下面是我如何快速完成上述工作的。如果您不需要灵活性来操作字符串,并且修复速度比性能更重要,那么这将是最有效的

我将dataframe的列作为两个单独的系列取出

col1Series = df['col1']
col2Series = df['col2']
接下来,创建一个空列表以存储最终字符串值:

rowxList = []
按如下方式迭代以填充列表:

for x,y in zip(col1Series,col2Series):
    rowx  = x.replace(y,'')
    rowxList.append(rowx)
最后,将rowxList作为新列放回原始数据帧中。可以替换旧列。更安全的做法是在新列下执行此操作,并对照原来的两列检查输出,然后删除不再需要的旧列:

df['newCol'] = rowxList

你能给我们看看你的密码吗?你离得有多近?
df['newCol'] = rowxList