Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫过滤:返回真/假与实际值_Python_Python 3.x_Pandas - Fatal编程技术网

Python 熊猫过滤:返回真/假与实际值

Python 熊猫过滤:返回真/假与实际值,python,python-3.x,pandas,Python,Python 3.x,Pandas,我的数据帧: df_all_xml_mfiles_tgther file_names searching_for everything 0 a.txt where Dave Ran Away. Where is Dave? 1 a.txt candy

我的数据帧:

df_all_xml_mfiles_tgther

      file_names     searching_for                                 everything
0          a.txt             where              Dave Ran Away. Where is Dave?
1          a.txt             candy                                mmmm, candy
2          b.txt              time                We are looking for the book.
3          b.txt             where                   where the red fern grows
我的问题是:

我正在尝试筛选包含在搜索条件中找到的单词的记录。我需要一次检查一条记录,并返回实际记录,而不仅仅是单词true

我所尝试的:

search_content_array = ['where', 'candy', 'time']
file_names_only = ['a.txt', 'b.txt']


for cc in range(0, len(file_names_only), 1):
     for bb in range(0, len(search_content_array), 1):

            stuff = `df_all_xml_mfiles_tgther[cc:cc+1].everything.str.contains(search_content_array[bb], flags=re.IGNORECASE, na=False, regex=True)`

            if not regex_stuff.empty:
                 regex_stuff_new = pd.DataFrame([regex_stuff.rename(None)])
                 regex_stuff_new.columns = ['everything']
                 regex_stuff_new['searched_for_found'] = search_content_array[bb]
                 regex_stuff_new['file_names'] = file_names_only[cc]

            regex_stuff_new = regex_stuff_new[['file_names', 'searched_for_found', 'everything']] ##This rearranges the columns

            df_regex_test =  df_regex_test.append(regex_stuff_new, ignore_index=True, sort=False)
我得到的结果是:

    file_names  searched_for_found  everything
0        a.txt               where        True
1        a.txt               candy        True
2        b.txt               where        True
    file_names  searched_for_found                           everything
0        a.txt               where        Dave Ran Away. Where is Dave?
1        a.txt               candy                          mmmm, candy
3        b.txt               where             where the red fern grows
我想要的结果是:

    file_names  searched_for_found  everything
0        a.txt               where        True
1        a.txt               candy        True
2        b.txt               where        True
    file_names  searched_for_found                           everything
0        a.txt               where        Dave Ran Away. Where is Dave?
1        a.txt               candy                          mmmm, candy
3        b.txt               where             where the red fern grows

如何获取返回结果的实际值,而不仅仅是true/false?

使用列表理解按元素执行此操作

df[[y.lower() in x.lower() for x, y in zip(df['everything'], df['searching_for'])]]
或者


使用
replace
str.contains
,我认为cold的方法更简洁

s=df.everything.replace(regex=r'(?i)'+ df.searching_for,value='OkIFINDIT')
df[s.str.contains('OkIFINDIT')]
Out[405]: 
  file_names searching_for                  everything
0      a.txt         where Dave Ran Away Where is Dave
1      a.txt         candy                  mmmm,candy
3      b.txt         where    where the red fern grows

您可以替换与
np.nan
不匹配的行,然后删除
nan

 import numpy as np,re

 df.apply(lambda x: x if re.search(x[1], x[2],re.I) else np.nan,axis=1).dropna()

 file_names searching_for                     everything
0      a.txt         where  Dave Ran Away. Where is Dave?
1      a.txt         candy                    mmmm, candy
3      b.txt         where       where the red fern grows

你为什么保留“candy”这一行?我刚刚编辑了我的帖子。另外,因为它包含单词candy。如果记录包含搜索到的单词,则在该行的任意位置,我希望保留整行内容。您是否尝试过使用值而不是len进行迭代?例如,
如果字符串中的where:#set searched_For_found=where
这不具有相同的效果吗?目标是一次遍历数组1的值。底部的值到底是如何工作的?您是在告诉我搜索第0行时的值吗?在“一切”列中搜索并仅搜索第0行的值?我已经使用python整整4个月了,所以我还在学习:)@午餐盒第二种方法是
df[['everything','search\u for']].values.tolist()
返回一个2D列表。每行有2列,并以与上面类似的方式解压为x和y。我不断收到一条错误消息['where']不在索引中。在签出另一个响应之前,我仍在处理它。@午餐盒我在脑海中运行了一个可能选项的列表,然后为您找出最佳选项。我想这是因为我已经回答了成千上万的问题。