Python 使用Pandas向数据帧子集添加新行_Python_Pandas_Dataframe

Python 使用Pandas向数据帧子集添加新行

python pandas dataframe

Python 使用Pandas向数据帧子集添加新行,python,pandas,dataframe,Python,Pandas,Dataframe,我有以下数据帧： Customer ProductID Count John 1 25 John 6 50 Mary 2 15 Mary 3 35 我希望我的输出如下所示： Customer ProductID Count John 1 25 John 2 0 John 3 0 John 6 50 M

我有以下数据帧：

Customer ProductID Count

John     1         25
John     6         50
Mary     2         15
Mary     3         35

我希望我的输出如下所示：

Customer ProductID Count

John     1         25
John     2         0
John     3         0
John     6         50
Mary     1         0
Mary     2         15
Mary     3         35
Mary     6         0

我试图做的是从数据帧中识别唯一的

ProductID

unique_ID =  pd.unique(df.ProductID.ravel())
print (unique_ID) = array([1,6,2,3])

由于客户John不存在

ProductID

2,3，因此我将按客户名称拆分数据帧

df1 = df[df['Customer']=='John']
df2 = df[df['Customer']=='Mary']

打印df1

Customer  ProductID  Count
John      1          25
John      6          50

打印df2

Customer  ProductID  Count
Mary      2          15
Mary      3          35

我想为John添加

ProductID

2,3，为Mary添加

ProductID

1,6，并为这些

ProductID

设置

Count

为0，如上面所示。

我认为您可以使用-您可以获得

NaN

值，这些值由

确定，最后需要

df

的原始形状-与一起使用:

另一种解决方案-首先获取列的值（列的

ProductID

），然后通过此

多索引创建和df
：
a = df.Customer.unique()
b = df.ProductID.sort_values().unique()

print (a)
['John' 'Mary']
print (b)
[1 2 3 6]

m = pd.MultiIndex.from_product([a,b])
print (m)
MultiIndex(levels=[['John', 'Mary'], [1, 2, 3, 6]],
           labels=[[0, 0, 0, 0, 1, 1, 1, 1], [0, 1, 2, 3, 0, 1, 2, 3]])

df1 = df.set_index(['Customer','ProductID']).reindex(m, fill_value=0).reset_index()
df1.columns = ['Customer','ProductID','Count']
print (df1)
  Customer  ProductID  Count
0     John          1     25
1     John          2      0
2     John          3      0
3     John          6     50
4     Mary          1      0
5     Mary          2     15
6     Mary          3     35
7     Mary          6      0

a = df.Customer.unique()
b = df.ProductID.sort_values().unique()

print (a)
['John' 'Mary']
print (b)
[1 2 3 6]

m = pd.MultiIndex.from_product([a,b])
print (m)
MultiIndex(levels=[['John', 'Mary'], [1, 2, 3, 6]],
           labels=[[0, 0, 0, 0, 1, 1, 1, 1], [0, 1, 2, 3, 0, 1, 2, 3]])

df1 = df.set_index(['Customer','ProductID']).reindex(m, fill_value=0).reset_index()
df1.columns = ['Customer','ProductID','Count']
print (df1)
  Customer  ProductID  Count
0     John          1     25
1     John          2      0
2     John          3      0
3     John          6     50
4     Mary          1      0
5     Mary          2     15
6     Mary          3     35
7     Mary          6      0