Python 想知道两个不同子集的重叠中有多少对象吗_Python_Pandas_Numpy_If Statement_Multidimensional Array

Python 想知道两个不同子集的重叠中有多少对象吗

python pandas numpy if-statement

Python 想知道两个不同子集的重叠中有多少对象吗,python,pandas,numpy,if-statement,multidimensional-array,Python,Pandas,Numpy,If Statement,Multidimensional Array,我有一个具有特定特征的类别（身高和体重，由np.where定义）和另一个具有其他特征的类别（如果某人是双胞胎或非双胞胎，有多少兄弟姐妹，由np.where定义）。我想看看有多少人同时属于这两个类别（比如如果制作了维恩图，有多少人会在中间？）我正在导入CSV文件的列。这是该表的外观： Child Inches Weight Twin Siblings 0 A 53 100 Y 3 1 B 54 110

我有一个具有特定特征的类别（身高和体重，由np.where定义）和另一个具有其他特征的类别（如果某人是双胞胎或非双胞胎，有多少兄弟姐妹，由np.where定义）。我想看看有多少人同时属于这两个类别（比如如果制作了维恩图，有多少人会在中间？）

我正在导入CSV文件的列。这是该表的外观：

    Child  Inches  Weight Twin  Siblings
0     A      53     100    Y         3
1     B      54     110    N         4
2     C      56     120    Y         2
3     D      58     165    Y         1
4     E      60     150    N         1
5     F      62     160    N         1
6     H      65     165    N         3

提前谢谢

我不确定你想用I和循环做什么，但这应该行得通

import os
import pandas as pd
file_data = pd.read_csv(r'~/Downloads/Test3 CVS_Sheet1.csv')
area1 = file_data[file_data['Inches'] <= 60]
area1 = area1[area1['Weight'] <= 150]

group_a = area1[area1['Siblings'] >= 2]
group_a = group_a[group_a['Twin'] == 'Y']

group_b = area1[area1['Siblings'] >= 2]
group_b = group_b[group_b['Twin'] == 'N']

group_c = area1[area1['Siblings'] == 1]
group_c = group_c[group_c['Twin'] == 'Y']

group_d = area1[area1['Siblings'] == 1]
group_d = group_d[group_d['Twin'] == 'N']


print("in area1 there are", len(group_a.index), "children in group_a")
print("in area1 there are", len(group_b.index), "children in group_b")
print("in area1 there are", len(group_c.index), "children in group_c")
print("in area1 there are", len(group_d.index), "children in group_d")

导入操作系统
作为pd进口熊猫
文件\u data=pd.read\u csv（r'~/Downloads/Test3 CVS\u Sheet1.csv'）
区域1=文件数据[文件数据['Inches']=2]
b组=b组[b组['Twin']=='N']
c组=区域1[区域1[“兄弟姐妹”]==1]
组c=组c[组c['Twin']=='Y']
d组=区域1[区域1[“兄弟姐妹”]==1]
组d=组d[组d['Twin']=='N']
打印（“区域1中有”，len（a组索引），“a组中的儿童”）
打印（“区域1中有”，len（组索引），“组中的儿童”）
打印（“区域1中有”，len（c组索引），“c组中的儿童”）
打印（“区域1中有”，len（组索引），“组中的儿童”）

我不确定你想用I和循环做什么，但这应该行得通

import os
import pandas as pd
file_data = pd.read_csv(r'~/Downloads/Test3 CVS_Sheet1.csv')
area1 = file_data[file_data['Inches'] <= 60]
area1 = area1[area1['Weight'] <= 150]

group_a = area1[area1['Siblings'] >= 2]
group_a = group_a[group_a['Twin'] == 'Y']

group_b = area1[area1['Siblings'] >= 2]
group_b = group_b[group_b['Twin'] == 'N']

group_c = area1[area1['Siblings'] == 1]
group_c = group_c[group_c['Twin'] == 'Y']

group_d = area1[area1['Siblings'] == 1]
group_d = group_d[group_d['Twin'] == 'N']


print("in area1 there are", len(group_a.index), "children in group_a")
print("in area1 there are", len(group_b.index), "children in group_b")
print("in area1 there are", len(group_c.index), "children in group_c")
print("in area1 there are", len(group_d.index), "children in group_d")

导入操作系统
作为pd进口熊猫
文件\u data=pd.read\u csv（r'~/Downloads/Test3 CVS\u Sheet1.csv'）
区域1=文件数据[文件数据['Inches']=2]
b组=b组[b组['Twin']=='N']
c组=区域1[区域1[“兄弟姐妹”]==1]
组c=组c[组c['Twin']=='Y']
d组=区域1[区域1[“兄弟姐妹”]==1]
组d=组d[组d['Twin']=='N']
打印（“区域1中有”，len（a组索引），“a组中的儿童”）
打印（“区域1中有”，len（组索引），“组中的儿童”）
打印（“区域1中有”，len（c组索引），“c组中的儿童”）
打印（“区域1中有”，len（组索引），“组中的儿童”）

在您的示例中，我将采用稍微不同的设计。您可以这样做：

df['area1'] = np.where((df.Inches <= 60) & (df.Weight <= 150),1,0)
df['group_a'] = np.where((df.Siblings >= 2) & (df.Twin == 'Y'),1,0)
df['group_b'] = np.where((df.Siblings >= 2) & (df.Twin == 'N'),1,0)
df['group_c'] = np.where((df.Siblings == 1) & (df.Twin == 'Y'),1,0)
df['group_d'] = np.where((df.Siblings == 1) & (df.Twin == 'N'),1,0)

你会得到你想要的结果：1。你可以玩总和或计数来调整你的桌子

最后：

for col in df.columns[6:]:
   r = df.groupby(['area1'])[col].sum()[1]
   print ("in area1 there are",r,'children in',col)

将产生：

in area1 there are 2 children in group_a
in area1 there are 1 children in group_b
in area1 there are 0 children in group_c
in area1 there are 1 children in group_d

在您的示例中，我将采用稍微不同的设计。您可以这样做：

df['area1'] = np.where((df.Inches <= 60) & (df.Weight <= 150),1,0)
df['group_a'] = np.where((df.Siblings >= 2) & (df.Twin == 'Y'),1,0)
df['group_b'] = np.where((df.Siblings >= 2) & (df.Twin == 'N'),1,0)
df['group_c'] = np.where((df.Siblings == 1) & (df.Twin == 'Y'),1,0)
df['group_d'] = np.where((df.Siblings == 1) & (df.Twin == 'N'),1,0)

你会得到你想要的结果：1。你可以玩总和或计数来调整你的桌子

最后：

for col in df.columns[6:]:
   r = df.groupby(['area1'])[col].sum()[1]
   print ("in area1 there are",r,'children in',col)

将产生：

in area1 there are 2 children in group_a
in area1 there are 1 children in group_b
in area1 there are 0 children in group_c
in area1 there are 1 children in group_d

请提供完整的错误回溯，因为这有助于诊断您的错误issue@G.Anderson文件“/anaconda3/lib/python3.6/site packages/spyder/utils/site/sitecustomize.py”，第705行，在runfile execfile（文件名，命名空间）文件“/anaconda3/lib/python3.6/site packages/spyder/utils/site/sitecustomize.py”，第102行，在execfile exec（compile）中（f.read（），filename，'exec'），namespace）File“~box_test1.py”，第54行，if group_a==True:ValueError：包含多个元素的数组的真值不明确。使用a.any（）或a.all（）@G.Anderson这是什么意思？请提供完整的错误回溯，因为这有助于诊断错误issue@G.Anderson文件“/anaconda3/lib/python3.6/site packages/spyder/utils/site/sitecustomize.py”，第705行，在runfile execfile（文件名，命名空间）文件/anaconda3/lib/python3.6/site packages/spyder/utils/site/site-customize.py中，第102行，在execfile exec（compile（f.read（），filename，'exec'），namespace）文件”~box\u test1.py“，第54行，if group_a==True:ValueError:包含多个元素的数组的真值不明确。请使用a.any（）或a.all（）@G.Anderson这是您的意思吗？我对pandas不太熟悉。那么您是如何直接编辑CSV表的呢？@madrose在您使用df=pd.read_CSV读取文件之后（r'~/Downloads/Test3 CVS_Sheet1.csv'）您可以编写我在回答中提供的添加内容。（您称之为file，我称之为df）好的，我知道这是如何处理此集合的。谢谢！我现在将在实际的100 x 1200 csv表中尝试同样的方法。快速问题：“r=df.groupby（['area1']）[col.sum（）[1]中[1]的作用是什么？”？您的数据帧上有一个for循环，因此它可以为每个组生成r（结果）（组从第6列开始，因此[6:]不要忘记我们从0开始计数。[1]表示组中的第二项（所有零结果都有item0=和1ns的item1[1]）@mardose很酷。熊猫非常棒，有很多东西需要学习，但一旦你学会了——你将成为一名忍者：-）我对熊猫不太熟悉。那么你是如何直接编辑CSV表的呢？@madrose在你用df=pd阅读完你的文件后，你可以写下我在回答中提供的补充内容。（你称它为file，我称它为df）好的，我知道这是如何处理这个集合的。谢谢！我现在要用我实际的100 x 1200 CSV表来尝试同样的方法。快速问题：“r=df.groupby（['area1']][col].sum（）[1]”中的[1]的目的是什么？在数据帧上有一个for循环，它可以为每个组生成r（结果）（小组从第6列开始，因此[6:]别忘了我们从0开始计算。第[1]表示小组中的第2项（所有零结果都有第0项，第1项为第1项）@mardose cool。熊猫很棒，需要学习很多，但一旦你学会了，你将成为一名忍者：-）