Python Pandas：创建新的数据帧，以平均来自另一个数据帧的重复项_Python_Pandas

Python Pandas：创建新的数据帧，以平均来自另一个数据帧的重复项

python pandas

Python Pandas：创建新的数据帧，以平均来自另一个数据帧的重复项,python,pandas,Python,Pandas,假设我有一个数据帧my_df，列重复，例如 foo bar foo hello 0 1 1 5 1 1 2 5 2 1 3 5 我想创建另一个数据帧，对重复数据进行平均： foo bar hello 0.5 1 5 1.5 1 5 2.5 1 5 my_columns = my_df.columns my_duplicates = print [x for x, y in collections.Counter(my_columns

假设我有一个数据帧

my_df

，列重复，例如

foo bar foo hello
0   1   1   5
1   1   2   5
2   1   3   5

我想创建另一个数据帧，对重复数据进行平均：

foo bar hello
0.5   1   5
1.5   1   5
2.5   1   5

my_columns = my_df.columns
my_duplicates = print [x for x, y in collections.Counter(my_columns).items() if y > 1]

我怎样才能在熊猫身上做到这一点

到目前为止，我已成功识别出重复项：

foo bar hello
0.5   1   5
1.5   1   5
2.5   1   5

my_columns = my_df.columns
my_duplicates = print [x for x, y in collections.Counter(my_columns).items() if y > 1]

顺便说一句，我不知道如何让Pandas计算它们的平均值。

您可以使用列索引并获取：

一个比较棘手的例子是，如果有一个非数字列：

In [21]: df
Out[21]:
   foo  bar  foo hello
0    0    1    1     a
1    1    1    2     a
2    2    1    3     a

上述操作将引发：

DataError:没有要聚合的数值类型

。肯定不会因为效率而赢得任何奖项，但在这种情况下，这里有一个通用方法：

In [22]: dupes = df.columns.get_duplicates()

In [23]: dupes
Out[23]: ['foo']

In [24]: pd.DataFrame({d: df[d] for d in df.columns if d not in dupes})
Out[24]:
   bar hello
0    1     a
1    1     a
2    1     a

In [25]: pd.concat(df.xs(d, axis=1) for d in dupes).groupby(level=0, axis=1).mean()
Out[25]:
   foo
0  0.5
1  1.5
2  2.5

In [26]: pd.concat([Out[24], Out[25]], axis=1)
Out[26]:
   foo  bar hello
0  0.5    1     a
1  1.5    1     a
2  2.5    1     a

我认为需要注意的是避免列重复。。。或者我不知道我在做什么。

谢谢，我得到了

“没有要聚合的数值类型”

，但是我所有的列（除了第一列，它包含

对象字符串）都包含float64
类型。有没有想过是什么导致了这种情况？事实证明，我无法调用mydf.drop（）
来删除str
列，因为我得到：“只对唯一值的索引对象重新编制索引”
@user815423426 hmmm这似乎让它变得有点棘手（我以前肯定是指等，忽略了非数字列…）