Python 在另一个数据帧行中查找数据帧行中的单词
我想检查数据框B行中的单词是否存在于另一个数据框a行中,并检索数据框a的行号 数据帧A的示例Python 在另一个数据帧行中查找数据帧行中的单词,python,pandas,dataframe,Python,Pandas,Dataframe,我想检查数据框B行中的单词是否存在于另一个数据框a行中,并检索数据框a的行号 数据帧A的示例 LineNumber Description 2539 5401845 Either the well was very deep, or she fell very slowly, 4546 5409117 for she had plenty of time as she went down to look about her, 4368 5408
LineNumber Description
2539 5401845 Either the well was very deep, or she fell very slowly,
4546 5409117 for she had plenty of time as she went down to look about her,
4368 5408517 and to wonder what was going to happen next
数据帧B的示例
Words
50062 well deep fell
44263 plenty time above
4731 plenty time down look
我现在想知道数据帧B的每一行中的所有单词是否都在数据帧A的任何一行中。如果是这样,我将从数据帧A检索行号并将其分配给数据帧B
输出应该是这样的
Words LineNumber
50062 well deep fell 5401845
44263 plenty time above
4731 plenty time down look 5409117
我试过这样的东西,但不起作用
a = 'for she had plenty of time as she went down to look about her,'
str = 'plenty time down look'
if all(x in str for x in a):
print(True)
else:
print(False)
谢谢您已接近您要做的事情。试着这样做:
a = 'for she had plenty of time as she went down to look about her,'
string = 'plenty time down look'
a = a.split(' ')
string = string.split(' ')
if all(x in a for x in string):
print(True)
else:
print(False)
最初在a中x的字符串中使用x的方式有两个问题。第一个是string
和a
中的每个元素都是字符,因此要比较单词,需要创建一个单词列表,这就是我包含拆分的原因
第二个是,如果a
中的每个元素都在string
中,则逻辑x in string for x in a
表示返回True,但您需要的是x in a for x in string
如果string
中的每个元素都在a
中,则返回True
制作数据帧
通过索引将数据帧y中的描述与数据帧x匹配,并从数据帧x中获取匹配的索引
查看这篇文章:谢谢,迭代x数据帧更有效
x = pd.DataFrame({"Description": ["for she had plenty of time as she went down to look about her",
"for she had of time as she went down to look about her"]})
>>> x
Description
0 for she had plenty of time as she went down to look about her
1 for she had of time as she went down to look about her
y = pd.DataFrame({"Description": ["plenty time down look"]})
>>> y
Description
0 plenty time down look
with_words = y["Description"].iloc[[0]].item().split()
with_regex = "".join(['(?=.*{})'.format(word) for word in with_words])
>>> with_regex
'(?=.*plenty)(?=.*time)(?=.*down)(?=.*look)'
>>> x.loc[(x.Description.str.contains(with_regex))].index.item()
0