Python 比较句子字符串的两个dataframe列,并为第三个帧创建新值

Python 比较句子字符串的两个dataframe列,并为第三个帧创建新值,python,regex,pandas,dataframe,nlp,Python,Regex,Pandas,Dataframe,Nlp,这里,我有两个dataframe列。A和B。对于每一行[i],所有B都包含在A中,现在我尝试测试A中的B,并为匹配短语中的所有单词返回1,为外部短语B中的所有其他单词返回0,从而创建一个0和1的新数据帧 Why would it be competitive, so it's wond... if the teabaggers hadn't ousted Sen Had he refused to attempt something so partisa...


    Why would it be competitive, so it's wond...        if the teabaggers hadn't ousted Sen
    Had he refused to attempt something so partisa...   Had he refused to attempt something so partisa...
    "This study would then have to be conducted an...   This study would then have to be conducted and 


['0', '0', '0', '0' , '0', '1', '1', '1', '1', '1', '1'........]


['0', '1', '0', '0' , '0', '1', '1', '1', '1', '1', '1', ........]



rx = '({})'.format('|'.join(re.escape(el)for el in B))
     # Generator to yield replaced sentences, rep_lace is a column of 1's for each word in B
it = (re.sub(rx, rep_lace, sentence)for sentence in A)
     # Build list of paired new sentences and old to filter out where not the same
results.append([new_sentence for old_sentence, new_sentence in zip(A, it) if old_sentence != new_sentence])
nw_results = ' '.join([str(elem) for elem in results])
ew_results= nw_results.split(" ")
new_results = ['0' if i is not '1' else i for i in ew_results]
labels =([int(e) for e in new_results]) 



def word_match(col_1, col_2):
    # Gather all words in column B to check column A against
    targets = set(col_2.split())
    # For each word in A, if it's in B then 1, else 0
    output = [1 if x in targets else 0 for x in col_1.split()]
    return output

# Create new column, C, whose value on each row is word_match(A, B) on each row
df['C'] = df.apply(lambda x: word_match(x.A, x.B), axis=1)

