Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫或蟒蛇相当于tidyr complete_Python_Python 3.x_Pandas - Fatal编程技术网

Python 熊猫或蟒蛇相当于tidyr complete

Python 熊猫或蟒蛇相当于tidyr complete,python,python-3.x,pandas,Python,Python 3.x,Pandas,我有如下数据: library("tidyverse") df <- tibble(user = c(1, 1, 2, 3, 3, 3), x = c("a", "b", "a", "a", "c", "d"), y = 1) df # user x y # 1 1 a 1 # 2 1 b 1 # 3 2 a 1 # 4 3 a 1 # 5 3 c

我有如下数据:

library("tidyverse")

df <- tibble(user = c(1, 1, 2, 3, 3, 3), x = c("a", "b", "a", "a", "c", "d"), y = 1)
df

#    user     x     y
# 1     1     a     1
# 2     1     b     1
# 3     2     a     1
# 4     3     a     1
# 5     3     c     1
# 6     3     d     1
我希望“完成”数据帧,以便每个
用户
都有一条记录,记录每个可能的
x
,默认
y
填充设置为0

这在R(tidyverse/tidyr)中有点微不足道:

pandas/python中是否有一个完整的等价物可以产生相同的结果?

您可以通过以下方式使用:

或++:


现在,在python中使用这些
dplyr
/
tidyr
API非常容易:

>>从datar.all导入f、c、TIBLE、完整、嵌套
>>>df=tibble(用户=c(1,1,2,3,3),x=c(“a”,“b”,“a”,“a”,“c”,“d”),y=1)
>>>df>>完成(嵌套(f.user),x=c(“a”、“b”、“c”、“d”),填充={'y':0})
用户x y
01A1.0
1b1.0
21 c0.0
31 d 0.0
4.2 a 1.0
52B0.0
6 2 c 0.0
7.2 d 0.0
8.3 a 1.0
93B0.0
103C1.0
11三维1.0
我是这个包裹的作者。如果您有任何问题,请随时提交问题

import pandas as pd
df = pd.DataFrame({'user':[1, 1, 2, 3, 3, 3], 'x':['a', 'b', 'a', 'a', 'c', 'd'], 'y':1})
df %>% 
    complete(nesting(user), x = c("a", "b", "c", "d"), fill = list(y = 0))

#    user     x     y
# 1     1     a     1
# 2     1     b     1
# 3     1     c     0
# 4     1     d     0
# 5     2     a     1
# 6     2     b     0
# 7     2     c     0
# 8     2     d     0
# 9     3     a     1
# 10    3     b     0
# 11    3     c     1
# 12    3     d     1
df = df.set_index(['user','x'])
mux = pd.MultiIndex.from_product([df.index.levels[0], df.index.levels[1]],names=['user','x'])
df = df.reindex(mux, fill_value=0).reset_index()
print (df)
    user  x  y
0      1  a  1
1      1  b  1
2      1  c  0
3      1  d  0
4      2  a  1
5      2  b  0
6      2  c  0
7      2  d  0
8      3  a  1
9      3  b  0
10     3  c  1
11     3  d  1
df = df.set_index(['user','x'])['y'].unstack(fill_value=0).stack().reset_index(name='y')
print (df)
    user  x  y
0      1  a  1
1      1  b  1
2      1  c  0
3      1  d  0
4      2  a  1
5      2  b  0
6      2  c  0
7      2  d  0
8      3  a  1
9      3  b  0
10     3  c  1
11     3  d  1