Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/c/61.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 提取列列表在数据帧中包含特定值的行_Python_Pandas - Fatal编程技术网

Python 提取列列表在数据帧中包含特定值的行

Python 提取列列表在数据帧中包含特定值的行,python,pandas,Python,Pandas,我有一个如下所示的数据帧: ID AgeGroups PaperIDs 0 1 [3, 3, 10] [A, B, C] 1 2 [5] [D] 2 3 [4, 12] [A, D] 3 4 [2, 6, 13, 12] [X, Z, T, D] ID AgeGroups PaperIDs 0 1 [3, 3, 10] [

我有一个如下所示的数据帧:

    ID   AgeGroups        PaperIDs
0   1    [3, 3, 10]       [A, B, C]
1   2    [5]              [D]
2   3    [4, 12]          [A, D]
3   4    [2, 6, 13, 12]   [X, Z, T, D]
    ID   AgeGroups        PaperIDs
0   1    [3, 3, 10]       [A, B, C]
3   4    [2, 6, 13, 12]   [X, Z, T, D]
我希望提取AgeGroups列中的列表中至少有2个值小于7且至少有1个值大于8的行

所以结果应该是这样的:

    ID   AgeGroups        PaperIDs
0   1    [3, 3, 10]       [A, B, C]
1   2    [5]              [D]
2   3    [4, 12]          [A, D]
3   4    [2, 6, 13, 12]   [X, Z, T, D]
    ID   AgeGroups        PaperIDs
0   1    [3, 3, 10]       [A, B, C]
3   4    [2, 6, 13, 12]   [X, Z, T, D]
我不知道该怎么做。

首先创建helper数据帧,并通过和进行比较 ,然后依次按和链掩码按&进行按位和:

import ast
#if not lists
#df['AgeGroups'] = df['AgeGroups'].apply(ast.literal_eval)
或者将列表理解与比较numpy数组、按和计数以及比较由和链接的两个计数一起使用,因为标量:

m = [(np.array(x) < 7).sum() >= 2 and (np.array(x) > 8).sum() >=1  for x in df['AgeGroups']]

df = df[m]
print (df)
   ID       AgeGroups      PaperIDs
0   1      [3, 3, 10]     [A, B, C]
3   4  [2, 6, 13, 12]  [X, Z, T, D]
首先创建helper数据框并通过和进行比较 ,然后依次按和链掩码按&进行按位和:

import ast
#if not lists
#df['AgeGroups'] = df['AgeGroups'].apply(ast.literal_eval)
或者将列表理解与比较numpy数组、按和计数以及比较由和链接的两个计数一起使用,因为标量:

m = [(np.array(x) < 7).sum() >= 2 and (np.array(x) > 8).sum() >=1  for x in df['AgeGroups']]

df = df[m]
print (df)
   ID       AgeGroups      PaperIDs
0   1      [3, 3, 10]     [A, B, C]
3   4  [2, 6, 13, 12]  [X, Z, T, D]

我使用apply函数为每一行编写了简单的if-else逻辑,您也可以对该行使用列表理解

data = {'ID':['1', '2', '3', '4'], 'AgeGroups':[[3,3,10],[2],[4,12],[2,6,13,12]],'PaperIDs':[['A','B','C'],['D'],['A','D'],['X','Z','T','D']]} 
df = pd.DataFrame(data)
def extract_age(row):
    my_list = row['AgeGroups']
    count1 = 0
    count2 = 0
    if len(my_list)>=3:
        for i in my_list:
            if i<7:
                count1 = count1 +1
            elif i>8:
                count2 = count2+1
    if (count1 >= 2) and (count2 >=1):
        print(row['AgeGroups'],row['PaperIDs'])


df.apply(lambda x: extract_age(x), axis =1)

我使用apply函数为每一行编写了简单的if-else逻辑,您也可以对该行使用列表理解

data = {'ID':['1', '2', '3', '4'], 'AgeGroups':[[3,3,10],[2],[4,12],[2,6,13,12]],'PaperIDs':[['A','B','C'],['D'],['A','D'],['X','Z','T','D']]} 
df = pd.DataFrame(data)
def extract_age(row):
    my_list = row['AgeGroups']
    count1 = 0
    count2 = 0
    if len(my_list)>=3:
        for i in my_list:
            if i<7:
                count1 = count1 +1
            elif i>8:
                count2 = count2+1
    if (count1 >= 2) and (count2 >=1):
        print(row['AgeGroups'],row['PaperIDs'])


df.apply(lambda x: extract_age(x), axis =1)