索引具有多个条件的Python数据帧，如where语句_Python_Sql_Indexing_Pandas

索引具有多个条件的Python数据帧，如where语句

python sql indexing pandas

索引具有多个条件的Python数据帧，如where语句,python,sql,indexing,pandas,Python,Sql,Indexing,Pandas,我有R方面的经验，对Python熊猫还不熟悉。我试图索引一个数据帧，以检索满足一组逻辑条件的行，这很像SQL的“where”语句我知道如何在R中使用数据帧（以及R的data.table包，它更像熊猫数据帧，而不是R的原生数据帧）来实现这一点下面是一些构造数据帧的示例代码，以及我希望如何对其进行索引的描述。有没有一个简单的方法可以做到这一点 import pandas as pd import numpy as np # generate some data mult = 10000 fru

我有R方面的经验，对Python熊猫还不熟悉。我试图索引一个数据帧，以检索满足一组逻辑条件的行，这很像SQL的“where”语句

我知道如何在R中使用数据帧（以及R的data.table包，它更像熊猫数据帧，而不是R的原生数据帧）来实现这一点

下面是一些构造数据帧的示例代码，以及我希望如何对其进行索引的描述。有没有一个简单的方法可以做到这一点

import pandas as pd
import numpy as np

# generate some data
mult = 10000
fruits = ['Apple', 'Banana', 'Kiwi', 'Grape', 'Orange', 'Strawberry']*mult
vegetables = ['Asparagus', 'Broccoli', 'Carrot', 'Lettuce', 'Rutabaga', 'Spinach']*mult
animals = ['Dog', 'Cat', 'Bird', 'Fish', 'Lion', 'Mouse']*mult
xValues = np.random.normal(loc=80, scale=2, size=6*mult)
yValues = np.random.normal(loc=79, scale=2, size=6*mult)

data = {'Fruit': fruits,
        'Vegetable': vegetables, 
        'Animal': animals, 
        'xValue': xValues,
        'yValue': yValues,}

df = pd.DataFrame(data)

# shuffle the columns to break structure of repeating fruits, vegetables, animals
np.random.shuffle(df.Fruit)
np.random.shuffle(df.Vegetable)
np.random.shuffle(df.Animal)

df.head(30)

# filter sets
fruitsInclude = ['Apple', 'Banana', 'Grape']
vegetablesExclude = ['Asparagus', 'Broccoli']

# subset1:  All rows and columns where:
#   (fruit in fruitsInclude) AND (Vegetable not in vegetablesExlude)

# subset2:  All rows and columns where:
#   (fruit in fruitsInclude) AND [(Vegetable not in vegetablesExlude) OR (Animal == 'Dog')]

# subset3:  All rows and specific columns where above logical conditions are true.

欢迎并高度赞赏所有帮助和投入

谢谢，

Randall也几乎没能打败我！这与我提出的+1解决方案完全相同。如果我只想要索引，有没有比这更短的方法：

df.ix[df['Fruit'].isin（fruitsInclude.index

@Zhubarb:

df.index[df['Fruit'].isin（fruitsInclude）]

比

df.ix[df['fruiture'].isin（fruitsInclude）]更短（在我的机器上大约快33%）.index

.Wow.正是我所需要的。感谢您快速而直接的回答。请注意，我拼写的VEGETABLESLUDE错误…应该是VEGETABLESCLUDE（使用c）。在上面的代码中更正了它，因此应该复制并粘贴到测试中。再次感谢。Randall。

# subset1:  All rows and columns where:
#   (fruit in fruitsInclude) AND (Vegetable not in vegetablesExlude)
df.ix[df['Fruit'].isin(fruitsInclude) & ~df['Vegetable'].isin(vegetablesExclude)]

# subset2:  All rows and columns where:
#   (fruit in fruitsInclude) AND [(Vegetable not in vegetablesExlude) OR (Animal == 'Dog')]
df.ix[df['Fruit'].isin(fruitsInclude) & (~df['Vegetable'].isin(vegetablesExclude) | (df['Animal']=='Dog'))]

# subset3:  All rows and specific columns where above logical conditions are true.
df.ix[df['Fruit'].isin(fruitsInclude) & ~df['Vegetable'].isin(vegetablesExclude) & (df['Animal']=='Dog')]