Python 如何将查询列表传递到数据帧，并输出结果列表？_Python_Python 3.x_Pandas_Dataframe

Python 如何将查询列表传递到数据帧，并输出结果列表？

python python-3.x pandas dataframe

Python 如何将查询列表传递到数据帧，并输出结果列表？,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,当选择列值column\u name等于标量的行时，我们使用==： df.loc[df['column_name'] == some_value] 或者使用.query（）在一个具体的例子中： import pandas as pd import numpy as np df = pd.DataFrame({'Col1': 'what are men to rocks and mountains'.split(), 'Col2': 'the curves

当选择列值

column\u name

等于标量的行时，我们使用

==：

df.loc[df['column_name'] == some_value]

或者使用

.query（）

在一个具体的例子中：

import pandas as pd
import numpy as np
df = pd.DataFrame({'Col1': 'what are men to rocks and mountains'.split(),
                   'Col2': 'the curves of your lips rewrite history.'.split(),
                   'Col3': np.arange(7),
                   'Col4': np.arange(7) * 8})

print(df)

         Col1      Col2  Col3  Col4
0       what       the     0     0
1        are    curves     1     8
2        men        of     2    16
3         to      your     3    24
4      rocks      lips     4    32
5        and   rewrite     5    40
6  mountains  history      6    48

查询可以是

rocks_row = df.loc[df['Col1'] == "rocks"]

哪个输出

print(rocks_row)
    Col1  Col2  Col3  Col4
4  rocks  lips     4    32

我想传递一个要查询的数据帧的值列表，该数据帧输出一个“正确查询”列表

要执行的查询将位于列表中，例如

list_match = ['men', 'curves', 'history']

将输出满足此条件的所有行，即

matches = pd.concat([df1, df2, df3])

在哪里

df1 = df.loc[df['Col1'] == "men"]

df2 = df.loc[df['Col1'] == "curves"]

df3 = df.loc[df['Col1'] == "history"]

我的想法是创建一个接受

output = []
def find_queries(dataframe, column, value, output):
    for scalar in value: 
        query = dataframe.loc[dataframe[column] == scalar]]
        output.append(query)    # append all query results to a list
    return pd.concat(output)    # return concatenated list of dataframes

然而，这看起来异常缓慢，实际上并没有利用熊猫数据结构。通过数据帧传递查询列表的“标准”方式是什么

编辑：这如何转化为熊猫中的“更复杂”查询？e、 g.

在哪里

使用HDF5文档

df.to_hdf('test.h5','df',mode='w',format='table',data_columns=['A','B'])

pd.read_hdf('test.h5','df')

pd.read_hdf('test.h5','df',where='A=["foo","bar"] & B=1')

处理这一问题的最佳方法是使用布尔级数对行进行索引，就像在R中一样

以df为例

In [5]: df.Col1 == "what"
Out[5]:
0     True
1    False
2    False
3    False
4    False
5    False
6    False
Name: Col1, dtype: bool

In [6]: df[df.Col1 == "what"]
Out[6]:
   Col1 Col2  Col3  Col4
0  what  the     0     0

现在我们将其与isin函数相结合

In [8]: df[df.Col1.isin(["men","rocks","mountains"])]
Out[8]:
        Col1      Col2  Col3  Col4
2        men        of     2    16
4      rocks      lips     4    32
6  mountains  history.     6    48

要在多个列上进行筛选，我们可以使用&和|运算符将它们链接在一起

In [10]: df[df.Col1.isin(["men","rocks","mountains"]) | df.Col2.isin(["lips","your"])]
Out[10]:
        Col1      Col2  Col3  Col4
2        men        of     2    16
3         to      your     3    24
4      rocks      lips     4    32
6  mountains  history.     6    48

In [11]: df[df.Col1.isin(["men","rocks","mountains"]) & df.Col2.isin(["lips","your"])]
Out[11]:
    Col1  Col2  Col3  Col4
4  rocks  lips     4    32

如果我正确理解了您的问题，您可以使用布尔索引作为或使用方法：

.query（）

方法：

In [32]: df.query('Col1 in @search_list and Col4 > 40')
Out[32]:
        Col1      Col2  Col3  Col4
6  mountains  history.     6    48

In [33]: df.query('Col1 in @search_list')
Out[33]:
        Col1      Col2  Col3  Col4
4      rocks      lips     4    32
6  mountains  history.     6    48

使用布尔索引：

In [34]: df.ix[df.Col1.isin(search_list) & (df.Col4 > 40)]
Out[34]:
        Col1      Col2  Col3  Col4
6  mountains  history.     6    48

In [35]: df.ix[df.Col1.isin(search_list)]
Out[35]:
        Col1      Col2  Col3  Col4
4      rocks      lips     4    32
6  mountains  history.     6    48

更新：使用函数：

def find_queries(df, qry, debug=0, **parms):
    if debug:
        print('[DEBUG]: Query:\t' + qry.format(**parms))
    return df.query(qry.format(**parms))

In [31]: find_queries(df, 'Col1 in {Col1} and Col4 > {Col4}', Col1='@search_list', Col4=40)
    ...:
Out[31]:
        Col1      Col2  Col3  Col4
6  mountains  history.     6    48

In [32]: find_queries(df, 'Col1 in {Col1} and Col4 > {Col4}', Col1='@search_list', Col4=10)
Out[32]:
        Col1      Col2  Col3  Col4
4      rocks      lips     4    32
6  mountains  history.     6    48

包括调试信息（打印查询）：

上面的问题可能还不清楚——我正在寻找一个函数来实现这一点。用户输入一个值列表，输出一个查询点击列表。我不确定我是否理解这里的问题。您可以使用isin功能来实现所需的功能。如果我要重写find_querys函数，我会这样做：

find_querys=lambda df，col，values:df[df[col].isin（values）]

不太清楚您想要实现什么。。。你想得到满足不同条件的DFs列表，还是一个DF满足所有条件？@MaxU这不清楚——抱歉。一个DF满足所有条件。这里的函数就是

find\u querys=lambda-DF，col，values:DF。（“col-in-values）

Perfect！在实现调试功能方面，您比我领先了好几步。

In [34]: df.ix[df.Col1.isin(search_list) & (df.Col4 > 40)]
Out[34]:
        Col1      Col2  Col3  Col4
6  mountains  history.     6    48

In [35]: df.ix[df.Col1.isin(search_list)]
Out[35]:
        Col1      Col2  Col3  Col4
4      rocks      lips     4    32
6  mountains  history.     6    48

def find_queries(df, qry, debug=0, **parms):
    if debug:
        print('[DEBUG]: Query:\t' + qry.format(**parms))
    return df.query(qry.format(**parms))

In [31]: find_queries(df, 'Col1 in {Col1} and Col4 > {Col4}', Col1='@search_list', Col4=40)
    ...:
Out[31]:
        Col1      Col2  Col3  Col4
6  mountains  history.     6    48

In [32]: find_queries(df, 'Col1 in {Col1} and Col4 > {Col4}', Col1='@search_list', Col4=10)
Out[32]:
        Col1      Col2  Col3  Col4
4      rocks      lips     4    32
6  mountains  history.     6    48

In [40]: find_queries(df, 'Col1 in {Col1} and Col4 > {Col4}', Col1='@search_list', Col4=10, debug=1)
[DEBUG]: Query: Col1 in @search_list and Col4 > 10
Out[40]:
        Col1      Col2  Col3  Col4
4      rocks      lips     4    32
6  mountains  history.     6    48