Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/344.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 删除一个单词,除非它是另一个单词的一部分_Python_Pandas - Fatal编程技术网

Python 删除一个单词,除非它是另一个单词的一部分

Python 删除一个单词,除非它是另一个单词的一部分,python,pandas,Python,Pandas,我想删除除另一个单词的单词部分以外的特定单词 下面是一个例子 data1 name here is a this company there is no food data2 words count is 56 com 17 no 22 我写了这个函数,但问题是它删除了一个词,如果另一个词的一部分 def drop(y): for x in d

我想删除除另一个单词的单词部分以外的特定单词 下面是一个例子

data1 
    name    

    here is a this       
    company 
    there is no food      

data2
    words   count

    is       56
    com     17
    no      22
我写了这个函数,但问题是它删除了一个词,如果另一个词的一部分


def drop(y):
    for x in data2.words.values:
        y['name']= y['name'].str.replace(x, '')

    return y
输出

    name

    here a th       
    pany    
    there food 
我所期望的是:

    name    

    here a this       
    company 
    there food   

为了避免多个空格,您可以按空格拆分值,过滤掉匹配的值,然后重新合并:

s = set(data2['words'])
data1['name'] = [' '.join(y for y in x.split() if not y in s) for x in data1['name']]
print (data1)
          name
0  here a this
1      company
2   there food
如果将单词边界
\b\b
与正则表达式一起使用,则可以使用
replace
解决方案,但要获得多个空格:

pat = '|'.join(r"\b{}\b".format(x) for x in data2['words'])
data1['name'] = data1['name'].str.replace('('+ pat + ')', '')
print (data1)
           name
0  here  a this
1       company
2  there   food
因此,最后一点是必须移除它们:

pat = '|'.join(r"\b{}\b".format(x) for x in data2['words'])
data1['name'] = data1['name'].str.replace('('+ pat + ')', '').str.replace(' +', ' ')
print (data1)
          name
0  here a this
1      company
2   there food

问题是你没有把你的句子分成几个词。因此,也替换了单词片段。这应该起作用:

def drop(y):
    for x in data2.words.values:
        y['name'] = " ".join([entry.replace(x, '') for entry in y['name'].split()])

    return y

这是一个可以解决你问题的解决方案,你需要在替换该值之前把句子分开,它会把它看成是一个单词并替换它的值。

 data1 = pd.DataFrame(data = {"name":["here is a this company there is no food"]})
 data2 = pd.DataFrame(data = {"words": ["is", "com", "no"]})

 def drop(data1,data2):
     for i in df2["words"]:
         data['name'] = " ".join([j.replace(i, '') for j in data1['name'].split()])

     return data