Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/309.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 获取具有原始索引的重复行计数_Python_Pandas_Group By_Aggregate_Multiple Columns - Fatal编程技术网

Python 获取具有原始索引的重复行计数

Python 获取具有原始索引的重复行计数,python,pandas,group-by,aggregate,multiple-columns,Python,Pandas,Group By,Aggregate,Multiple Columns,我需要在一个数据帧中找到重复的行,然后添加一个带有count的额外列。假设我们有一个数据帧: >>print(df) +----+-----+-----+-----+-----+-----+-----+-----+-----+ | | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |----+-----+-----+-----+-----+-----+-----+-----+-----| | 0 | 0 | 0 |

我需要在一个数据帧中找到重复的行,然后添加一个带有count的额外列。假设我们有一个数据帧:

>>print(df)

+----+-----+-----+-----+-----+-----+-----+-----+-----+
|    |   2 |   3 |   4 |   5 |   6 |   7 |   8 |   9 |
|----+-----+-----+-----+-----+-----+-----+-----+-----|
|  0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |
|  1 |   2 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |
|  2 |   2 |   4 |   3 |   4 |   1 |   1 |   4 |   4 |
|  3 |   4 |   3 |   4 |   0 |   0 |   0 |   0 |   0 |
|  4 |   2 |   3 |   4 |   3 |   4 |   0 |   0 |   0 |
|  5 |   5 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |
|  6 |   4 |   5 |   0 |   0 |   0 |   0 |   0 |   0 |
|  7 |   1 |   1 |   4 |   0 |   0 |   0 |   0 |   0 |
|  8 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |
|  9 |   4 |   3 |   4 |   0 |   0 |   0 |   0 |   0 |
| 10 |   3 |   3 |   4 |   3 |   5 |   5 |   5 |   0 |
| 11 |   5 |   4 |   0 |   0 |   0 |   0 |   0 |   0 |
| 12 |   5 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |
| 13 |   0 |   4 |   0 |   0 |   0 |   0 |   0 |   0 |
| 14 |   2 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |
| 15 |   1 |   3 |   5 |   0 |   0 |   0 |   0 |   0 |
| 16 |   4 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |
| 17 |   3 |   3 |   4 |   4 |   0 |   0 |   0 |   0 |
| 18 |   5 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |
+----+-----+-----+-----+-----+-----+-----+-----+-----+
然后,上面的框架将变成下面的框架,并带有一个带有count的附加列。您可以看到,我们仍然保留索引列

+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|    |   2 |   3 |   4 |   5 |   6 |   7 |   8 |   9 |  10 |
|----+-----+-----+-----+-----+-----+-----+-----+-----|-----|
|  0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   2 |
|  1 |   2 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   2 |
|  2 |   2 |   4 |   3 |   4 |   1 |   1 |   4 |   4 |   1 |
|  3 |   4 |   3 |   4 |   0 |   0 |   0 |   0 |   0 |   2 |
|  4 |   2 |   3 |   4 |   3 |   4 |   0 |   0 |   0 |   1 |
|  5 |   5 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   3 |
|  6 |   4 |   5 |   0 |   0 |   0 |   0 |   0 |   0 |   1 |
|  7 |   1 |   1 |   4 |   0 |   0 |   0 |   0 |   0 |   1 |
| 10 |   3 |   3 |   4 |   3 |   5 |   5 |   5 |   0 |   1 |
| 11 |   5 |   4 |   0 |   0 |   0 |   0 |   0 |   0 |   1 |
| 13 |   0 |   4 |   0 |   0 |   0 |   0 |   0 |   0 |   1 |
| 15 |   1 |   3 |   5 |   0 |   0 |   0 |   0 |   0 |   1 |
| 16 |   4 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   1 |
| 17 |   3 |   3 |   4 |   4 |   0 |   0 |   0 |   0 |   1 |
+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
我见过其他解决方案,如:

 df.groupby(list(df.columns.values)).size()
但这将返回一个有间隙且没有初始索引的矩阵。

您可以先将
索引
转换为列,然后再通过
首先
len

此外,如果需要按所有列分组,请按以下方式删除
索引
列:

如有必要,添加下一列
10
need
rename

#if necessary convert to str
last_col = str(df.columns.astype(int).max() + 1)
print (last_col)
10

print (df.reset_index()
        .groupby(df.columns.difference(['index']).tolist())['index']
        .agg(['first', 'size'])
        .reset_index()
        .set_index(['first'])
        .sort_index()
        .rename_axis(None)
        .rename(columns={'size':last_col}))

    2  3  4  5  6  7  8  9  10
0   0  0  0  0  0  0  0  0   2
1   2  0  0  0  0  0  0  0   2
2   2  4  3  4  1  1  4  4   1
3   4  3  4  0  0  0  0  0   2
4   2  3  4  3  4  0  0  0   1
5   5  0  0  0  0  0  0  0   3
6   4  5  0  0  0  0  0  0   1
7   1  1  4  0  0  0  0  0   1
10  3  3  4  3  5  5  5  0   1
11  5  4  0  0  0  0  0  0   1
13  0  4  0  0  0  0  0  0   1
15  1  3  5  0  0  0  0  0   1
16  4  0  0  0  0  0  0  0   1
17  3  3  4  4  0  0  0  0   1

很高兴能帮助你!
#if necessary convert to str
last_col = str(df.columns.astype(int).max() + 1)
print (last_col)
10

print (df.reset_index()
        .groupby(df.columns.difference(['index']).tolist())['index']
        .agg(['first', 'size'])
        .reset_index()
        .set_index(['first'])
        .sort_index()
        .rename_axis(None)
        .rename(columns={'size':last_col}))

    2  3  4  5  6  7  8  9  10
0   0  0  0  0  0  0  0  0   2
1   2  0  0  0  0  0  0  0   2
2   2  4  3  4  1  1  4  4   1
3   4  3  4  0  0  0  0  0   2
4   2  3  4  3  4  0  0  0   1
5   5  0  0  0  0  0  0  0   3
6   4  5  0  0  0  0  0  0   1
7   1  1  4  0  0  0  0  0   1
10  3  3  4  3  5  5  5  0   1
11  5  4  0  0  0  0  0  0   1
13  0  4  0  0  0  0  0  0   1
15  1  3  5  0  0  0  0  0   1
16  4  0  0  0  0  0  0  0   1
17  3  3  4  4  0  0  0  0   1