Python 3.x 如何计算每个组中的记录数并将其添加到主数据集中?
鉴于我有如下数据集:Python 3.x 如何计算每个组中的记录数并将其添加到主数据集中?,python-3.x,pandas,Python 3.x,Pandas,鉴于我有如下数据集: import pandas as pd import numpy as np dt = { "facility":["Ann Arbor","Ann Arbor","Detriot","Detriot","Detriot"], "patient_ID":[4388,4388,9086,9086,9086], "year":[2004,2007,2007,2008,2011], "month":[8,9,9,6,2], "Nr_Sma
import pandas as pd
import numpy as np
dt = {
"facility":["Ann Arbor","Ann Arbor","Detriot","Detriot","Detriot"],
"patient_ID":[4388,4388,9086,9086,9086],
"year":[2004,2007,2007,2008,2011],
"month":[8,9,9,6,2],
"Nr_Small":[0,0,5,12,10],
"Nr_Medium":[3,1,1,4,3],
"Nr_Large":[2,0,0,0,0]
}
dt = pd.DataFrame(dt)
dt.head()
我需要添加一列,显示每组paitant中的记录数。以下是我正在做的:
dt["NumberOfVisits"] = dt.groupby(['patient_ID']).size()
或者我试过这个:
但是它在我的数据集中添加了一列Nas
在此处使用
变换
:
df["NumberOfVisits"]=df.groupby(['patient_ID'])['patient_ID'].transform('size')
print(df)
facility patient_ID year month Nr_Small Nr_Medium Nr_Large \
0 Ann Arbor 4388 2004 8 0 3 2
1 Ann Arbor 4388 2007 9 0 1 0
2 Detriot 9086 2007 9 5 1 0
3 Detriot 9086 2008 6 12 4 0
4 Detriot 9086 2011 2 10 3 0
NumberOfVisits
0 2
1 2
2 3
3 3
4 3