Python 熊猫过滤：返回真/假与实际值_Python_Python 3.x_Pandas

Python 熊猫过滤：返回真/假与实际值

python python-3.x pandas

Python 熊猫过滤：返回真/假与实际值,python,python-3.x,pandas,Python,Python 3.x,Pandas,我的数据帧： df_all_xml_mfiles_tgther file_names searching_for everything 0 a.txt where Dave Ran Away. Where is Dave? 1 a.txt candy

我的数据帧：

df_all_xml_mfiles_tgther

      file_names     searching_for                                 everything
0          a.txt             where              Dave Ran Away. Where is Dave?
1          a.txt             candy                                mmmm, candy
2          b.txt              time                We are looking for the book.
3          b.txt             where                   where the red fern grows

我的问题是：

我正在尝试筛选包含在搜索条件中找到的单词的记录。我需要一次检查一条记录，并返回实际记录，而不仅仅是单词true

我所尝试的：

search_content_array = ['where', 'candy', 'time']
file_names_only = ['a.txt', 'b.txt']


for cc in range(0, len(file_names_only), 1):
     for bb in range(0, len(search_content_array), 1):

            stuff = `df_all_xml_mfiles_tgther[cc:cc+1].everything.str.contains(search_content_array[bb], flags=re.IGNORECASE, na=False, regex=True)`

            if not regex_stuff.empty:
                 regex_stuff_new = pd.DataFrame([regex_stuff.rename(None)])
                 regex_stuff_new.columns = ['everything']
                 regex_stuff_new['searched_for_found'] = search_content_array[bb]
                 regex_stuff_new['file_names'] = file_names_only[cc]

            regex_stuff_new = regex_stuff_new[['file_names', 'searched_for_found', 'everything']] ##This rearranges the columns

            df_regex_test =  df_regex_test.append(regex_stuff_new, ignore_index=True, sort=False)

我得到的结果是：

    file_names  searched_for_found  everything
0        a.txt               where        True
1        a.txt               candy        True
2        b.txt               where        True

    file_names  searched_for_found                           everything
0        a.txt               where        Dave Ran Away. Where is Dave?
1        a.txt               candy                          mmmm, candy
3        b.txt               where             where the red fern grows

我想要的结果是：

    file_names  searched_for_found  everything
0        a.txt               where        True
1        a.txt               candy        True
2        b.txt               where        True

    file_names  searched_for_found                           everything
0        a.txt               where        Dave Ran Away. Where is Dave?
1        a.txt               candy                          mmmm, candy
3        b.txt               where             where the red fern grows

如何获取返回结果的实际值，而不仅仅是true/false？

使用列表理解按元素执行此操作

df[[y.lower() in x.lower() for x, y in zip(df['everything'], df['searching_for'])]]

或者

使用

replace

和

str.contains

，我认为cold的方法更简洁

s=df.everything.replace(regex=r'(?i)'+ df.searching_for,value='OkIFINDIT')
df[s.str.contains('OkIFINDIT')]
Out[405]: 
  file_names searching_for                  everything
0      a.txt         where Dave Ran Away Where is Dave
1      a.txt         candy                  mmmm,candy
3      b.txt         where    where the red fern grows

您可以替换与

np.nan

不匹配的行，然后删除

nan

值

 import numpy as np,re

 df.apply(lambda x: x if re.search(x[1], x[2],re.I) else np.nan,axis=1).dropna()

 file_names searching_for                     everything
0      a.txt         where  Dave Ran Away. Where is Dave?
1      a.txt         candy                    mmmm, candy
3      b.txt         where       where the red fern grows

你为什么保留“candy”这一行？我刚刚编辑了我的帖子。另外，因为它包含单词candy。如果记录包含搜索到的单词，则在该行的任意位置，我希望保留整行内容。您是否尝试过使用值而不是len进行迭代？例如，

如果字符串中的where:#set searched_For_found=where

这不具有相同的效果吗？目标是一次遍历数组1的值。底部的值到底是如何工作的？您是在告诉我搜索第0行时的值吗？在“一切”列中搜索并仅搜索第0行的值？我已经使用python整整4个月了，所以我还在学习：）@午餐盒第二种方法是

df[['everything'，'search\u for']].values.tolist（）

返回一个2D列表。每行有2列，并以与上面类似的方式解压为x和y。我不断收到一条错误消息['where']不在索引中。在签出另一个响应之前，我仍在处理它。@午餐盒我在脑海中运行了一个可能选项的列表，然后为您找出最佳选项。我想这是因为我已经回答了成千上万的问题。