Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何更改python 3.DataFrame的表格式?_Python_Python 3.x_Pandas_Dataframe - Fatal编程技术网

如何更改python 3.DataFrame的表格式?

如何更改python 3.DataFrame的表格式?,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,我有一个按特定列计数和分组的熊猫数据帧 import pandas as pd df = pd.DataFrame({'x':list('aaabbbbbccccc'),'y':list('2225555577777'), 'z':list('1312223224432')}) # df.groupby(['x','y','z'])['z'].count() # or df.groupby(['x','y','z'])['z'].agg(['count']) # or df.groupby(['

我有一个按特定列计数和分组的熊猫数据帧

import pandas as pd
df = pd.DataFrame({'x':list('aaabbbbbccccc'),'y':list('2225555577777'), 'z':list('1312223224432')})
#
df.groupby(['x','y','z'])['z'].count()
# or
df.groupby(['x','y','z'])['z'].agg(['count'])
# or
df.groupby(['x','y','z'])['z'].count().reset_index(name='counts')
结果是

   x  y  z  counts
0  a  2  1       2
1  a  2  3       1
2  b  5  2       4
3  b  5  3       1
4  c  7  2       2
5  c  7  3       1
6  c  7  4       2
如何将结果转换为以下形式

   x  y 1 2 3 4
0  a  2 2 0 1 0
1  b  5 0 4 1 0
2  c  7 0 2 1 2

您需要使用
unstack
+
reset\u index

(df.groupby(['x','y','z'])['z']
   .count()
   .unstack(-1, fill_value=0)
   .reset_index()
   .rename_axis(None, axis=1)
)

   x  y  1  2  3  4
0  a  2  2  0  1  0
1  b  5  0  4  1  0
2  c  7  0  2  1  2

注意,您可以将
df.groupby(['x','y','z'])['z'].count()
替换为
df.groupby(['x','y','z']).size()
以获得紧凑性,但请注意
size
也计算非对称性。

类似于
交叉表

pd.crosstab([df.x,df.y],df.z).reset_index()
Out[81]: 
z  x  y  1  2  3  4
0  a  2  2  0  1  0
1  b  5  0  4  1  0
2  c  7  0  2  1  2

PROJECT/KILL
您可以用
size
替换
count
,并且
groupby
['z']
的引用变得不必要了。@piRSquared Oops,抱歉,刚才看到了这个线程。是的,谢谢。将修改。感谢您的回答和评论。我接受这个答案,因为这大约比温的交叉表快8倍,比piRSquared的方法快4倍。@Sezen感谢您公平的仲裁。这也列出了执行此任务的策略。虽然不完全相同,但应该证明它是有用的。
tups = list(zip(df.x, df.y))
i, r = pd.factorize(tups)
j, c = pd.factorize(df.z)
n, m = len(r), len(c)
b = np.bincount(i * m + j, minlength=n * m).reshape(n, m)

pd.DataFrame(
    np.column_stack([r.tolist(), b]),
    columns=['x', 'y'] + c.tolist()
)

   x  y  1  3  2  4
0  a  2  2  1  0  0
1  b  5  0  1  4  0
2  c  7  0  1  2  2
tups = list(zip(df.x, df.y))
i, r = pd.factorize(tups)
j, c = pd.factorize(df.z, sort=True)
n, m = len(r), len(c)
b = np.bincount(i * m + j, minlength=n * m).reshape(n, m)

pd.DataFrame(
    np.column_stack([r.tolist(), b]),
    columns=['x', 'y'] + c.tolist()
)

   x  y  1  2  3  4
0  a  2  2  0  1  0
1  b  5  0  4  1  0
2  c  7  0  2  1  2