Python 如何将查询列表传递到数据帧,并输出结果列表?
当选择列值Python 如何将查询列表传递到数据帧,并输出结果列表?,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,当选择列值column\u name等于标量的行时,我们使用==: df.loc[df['column_name'] == some_value] 或者使用.query() 在一个具体的例子中: import pandas as pd import numpy as np df = pd.DataFrame({'Col1': 'what are men to rocks and mountains'.split(), 'Col2': 'the curves
column\u name
等于标量的行时,我们使用==:
df.loc[df['column_name'] == some_value]
或者使用.query()
在一个具体的例子中:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Col1': 'what are men to rocks and mountains'.split(),
'Col2': 'the curves of your lips rewrite history.'.split(),
'Col3': np.arange(7),
'Col4': np.arange(7) * 8})
print(df)
Col1 Col2 Col3 Col4
0 what the 0 0
1 are curves 1 8
2 men of 2 16
3 to your 3 24
4 rocks lips 4 32
5 and rewrite 5 40
6 mountains history 6 48
查询可以是
rocks_row = df.loc[df['Col1'] == "rocks"]
哪个输出
print(rocks_row)
Col1 Col2 Col3 Col4
4 rocks lips 4 32
我想传递一个要查询的数据帧的值列表,该数据帧输出一个“正确查询”列表
要执行的查询将位于列表中,例如
list_match = ['men', 'curves', 'history']
将输出满足此条件的所有行,即
matches = pd.concat([df1, df2, df3])
在哪里
df1 = df.loc[df['Col1'] == "men"]
df2 = df.loc[df['Col1'] == "curves"]
df3 = df.loc[df['Col1'] == "history"]
我的想法是创建一个接受
output = []
def find_queries(dataframe, column, value, output):
for scalar in value:
query = dataframe.loc[dataframe[column] == scalar]]
output.append(query) # append all query results to a list
return pd.concat(output) # return concatenated list of dataframes
然而,这看起来异常缓慢,实际上并没有利用熊猫数据结构。通过数据帧传递查询列表的“标准”方式是什么
编辑:这如何转化为熊猫中的“更复杂”查询?e、 g.在哪里
使用HDF5文档
df.to_hdf('test.h5','df',mode='w',format='table',data_columns=['A','B'])
pd.read_hdf('test.h5','df')
pd.read_hdf('test.h5','df',where='A=["foo","bar"] & B=1')
处理这一问题的最佳方法是使用布尔级数对行进行索引,就像在R中一样 以df为例
In [5]: df.Col1 == "what"
Out[5]:
0 True
1 False
2 False
3 False
4 False
5 False
6 False
Name: Col1, dtype: bool
In [6]: df[df.Col1 == "what"]
Out[6]:
Col1 Col2 Col3 Col4
0 what the 0 0
现在我们将其与isin函数相结合
In [8]: df[df.Col1.isin(["men","rocks","mountains"])]
Out[8]:
Col1 Col2 Col3 Col4
2 men of 2 16
4 rocks lips 4 32
6 mountains history. 6 48
要在多个列上进行筛选,我们可以使用&和|运算符将它们链接在一起
In [10]: df[df.Col1.isin(["men","rocks","mountains"]) | df.Col2.isin(["lips","your"])]
Out[10]:
Col1 Col2 Col3 Col4
2 men of 2 16
3 to your 3 24
4 rocks lips 4 32
6 mountains history. 6 48
In [11]: df[df.Col1.isin(["men","rocks","mountains"]) & df.Col2.isin(["lips","your"])]
Out[11]:
Col1 Col2 Col3 Col4
4 rocks lips 4 32
如果我正确理解了您的问题,您可以使用布尔索引作为或使用方法:
.query()
方法:
In [32]: df.query('Col1 in @search_list and Col4 > 40')
Out[32]:
Col1 Col2 Col3 Col4
6 mountains history. 6 48
In [33]: df.query('Col1 in @search_list')
Out[33]:
Col1 Col2 Col3 Col4
4 rocks lips 4 32
6 mountains history. 6 48
使用布尔索引:
In [34]: df.ix[df.Col1.isin(search_list) & (df.Col4 > 40)]
Out[34]:
Col1 Col2 Col3 Col4
6 mountains history. 6 48
In [35]: df.ix[df.Col1.isin(search_list)]
Out[35]:
Col1 Col2 Col3 Col4
4 rocks lips 4 32
6 mountains history. 6 48
更新:使用函数:
def find_queries(df, qry, debug=0, **parms):
if debug:
print('[DEBUG]: Query:\t' + qry.format(**parms))
return df.query(qry.format(**parms))
In [31]: find_queries(df, 'Col1 in {Col1} and Col4 > {Col4}', Col1='@search_list', Col4=40)
...:
Out[31]:
Col1 Col2 Col3 Col4
6 mountains history. 6 48
In [32]: find_queries(df, 'Col1 in {Col1} and Col4 > {Col4}', Col1='@search_list', Col4=10)
Out[32]:
Col1 Col2 Col3 Col4
4 rocks lips 4 32
6 mountains history. 6 48
包括调试信息(打印查询):
上面的问题可能还不清楚——我正在寻找一个函数来实现这一点。用户输入一个值列表,输出一个查询点击列表。我不确定我是否理解这里的问题。您可以使用isin功能来实现所需的功能。如果我要重写find_querys函数,我会这样做:
find_querys=lambda df,col,values:df[df[col].isin(values)]
不太清楚您想要实现什么。。。你想得到满足不同条件的DFs列表,还是一个DF满足所有条件?@MaxU这不清楚——抱歉。一个DF满足所有条件。这里的函数就是find\u querys=lambda-DF,col,values:DF。(“col-in-values)
Perfect!在实现调试功能方面,您比我领先了好几步。
In [34]: df.ix[df.Col1.isin(search_list) & (df.Col4 > 40)]
Out[34]:
Col1 Col2 Col3 Col4
6 mountains history. 6 48
In [35]: df.ix[df.Col1.isin(search_list)]
Out[35]:
Col1 Col2 Col3 Col4
4 rocks lips 4 32
6 mountains history. 6 48
def find_queries(df, qry, debug=0, **parms):
if debug:
print('[DEBUG]: Query:\t' + qry.format(**parms))
return df.query(qry.format(**parms))
In [31]: find_queries(df, 'Col1 in {Col1} and Col4 > {Col4}', Col1='@search_list', Col4=40)
...:
Out[31]:
Col1 Col2 Col3 Col4
6 mountains history. 6 48
In [32]: find_queries(df, 'Col1 in {Col1} and Col4 > {Col4}', Col1='@search_list', Col4=10)
Out[32]:
Col1 Col2 Col3 Col4
4 rocks lips 4 32
6 mountains history. 6 48
In [40]: find_queries(df, 'Col1 in {Col1} and Col4 > {Col4}', Col1='@search_list', Col4=10, debug=1)
[DEBUG]: Query: Col1 in @search_list and Col4 > 10
Out[40]:
Col1 Col2 Col3 Col4
4 rocks lips 4 32
6 mountains history. 6 48