Python 熊猫过滤:返回真/假与实际值
我的数据帧:Python 熊猫过滤:返回真/假与实际值,python,python-3.x,pandas,Python,Python 3.x,Pandas,我的数据帧: df_all_xml_mfiles_tgther file_names searching_for everything 0 a.txt where Dave Ran Away. Where is Dave? 1 a.txt candy
df_all_xml_mfiles_tgther
file_names searching_for everything
0 a.txt where Dave Ran Away. Where is Dave?
1 a.txt candy mmmm, candy
2 b.txt time We are looking for the book.
3 b.txt where where the red fern grows
我的问题是:
我正在尝试筛选包含在搜索条件中找到的单词的记录。我需要一次检查一条记录,并返回实际记录,而不仅仅是单词true
我所尝试的:
search_content_array = ['where', 'candy', 'time']
file_names_only = ['a.txt', 'b.txt']
for cc in range(0, len(file_names_only), 1):
for bb in range(0, len(search_content_array), 1):
stuff = `df_all_xml_mfiles_tgther[cc:cc+1].everything.str.contains(search_content_array[bb], flags=re.IGNORECASE, na=False, regex=True)`
if not regex_stuff.empty:
regex_stuff_new = pd.DataFrame([regex_stuff.rename(None)])
regex_stuff_new.columns = ['everything']
regex_stuff_new['searched_for_found'] = search_content_array[bb]
regex_stuff_new['file_names'] = file_names_only[cc]
regex_stuff_new = regex_stuff_new[['file_names', 'searched_for_found', 'everything']] ##This rearranges the columns
df_regex_test = df_regex_test.append(regex_stuff_new, ignore_index=True, sort=False)
我得到的结果是:
file_names searched_for_found everything
0 a.txt where True
1 a.txt candy True
2 b.txt where True
file_names searched_for_found everything
0 a.txt where Dave Ran Away. Where is Dave?
1 a.txt candy mmmm, candy
3 b.txt where where the red fern grows
我想要的结果是:
file_names searched_for_found everything
0 a.txt where True
1 a.txt candy True
2 b.txt where True
file_names searched_for_found everything
0 a.txt where Dave Ran Away. Where is Dave?
1 a.txt candy mmmm, candy
3 b.txt where where the red fern grows
如何获取返回结果的实际值,而不仅仅是true/false?使用列表理解按元素执行此操作
df[[y.lower() in x.lower() for x, y in zip(df['everything'], df['searching_for'])]]
或者
使用
replace
和str.contains
,我认为cold的方法更简洁
s=df.everything.replace(regex=r'(?i)'+ df.searching_for,value='OkIFINDIT')
df[s.str.contains('OkIFINDIT')]
Out[405]:
file_names searching_for everything
0 a.txt where Dave Ran Away Where is Dave
1 a.txt candy mmmm,candy
3 b.txt where where the red fern grows
您可以替换与
np.nan
不匹配的行,然后删除nan
值
import numpy as np,re
df.apply(lambda x: x if re.search(x[1], x[2],re.I) else np.nan,axis=1).dropna()
file_names searching_for everything
0 a.txt where Dave Ran Away. Where is Dave?
1 a.txt candy mmmm, candy
3 b.txt where where the red fern grows
你为什么保留“candy”这一行?我刚刚编辑了我的帖子。另外,因为它包含单词candy。如果记录包含搜索到的单词,则在该行的任意位置,我希望保留整行内容。您是否尝试过使用值而不是len进行迭代?例如,
如果字符串中的where:#set searched_For_found=where
这不具有相同的效果吗?目标是一次遍历数组1的值。底部的值到底是如何工作的?您是在告诉我搜索第0行时的值吗?在“一切”列中搜索并仅搜索第0行的值?我已经使用python整整4个月了,所以我还在学习:)@午餐盒第二种方法是df[['everything','search\u for']].values.tolist()
返回一个2D列表。每行有2列,并以与上面类似的方式解压为x和y。我不断收到一条错误消息['where']不在索引中。在签出另一个响应之前,我仍在处理它。@午餐盒我在脑海中运行了一个可能选项的列表,然后为您找出最佳选项。我想这是因为我已经回答了成千上万的问题。