Python 如何计算带熊猫的逗号分隔列的平均值? 让我们考虑下面的CSV文件 Test.CSV >: "x","y","A","B" 8000000000,"0,1","0.113948,0.113689",0.114042 8000000000,"0,1","0.114063,0.113823",0.114175 8000000000,"0,1","0.114405,0.114366",0.114524 8000000000,"0,1,2,3","0.167543,0.172369,0.419197,0.427285",0.427576 8000000000,"0,1,2,3","0.167784,0.172145,0.418624,0.426492",0.428736 8000000000,"0,1,2,3","0.168121,0.172729,0.419768,0.427467",0.428578

Python 如何计算带熊猫的逗号分隔列的平均值? 让我们考虑下面的CSV文件 Test.CSV >: "x","y","A","B" 8000000000,"0,1","0.113948,0.113689",0.114042 8000000000,"0,1","0.114063,0.113823",0.114175 8000000000,"0,1","0.114405,0.114366",0.114524 8000000000,"0,1,2,3","0.167543,0.172369,0.419197,0.427285",0.427576 8000000000,"0,1,2,3","0.167784,0.172145,0.418624,0.426492",0.428736 8000000000,"0,1,2,3","0.168121,0.172729,0.419768,0.427467",0.428578,python,pandas,Python,Pandas,我的目标是按列“x”和“y”对行进行分组,并计算列“A”和“B”的算术平均值 我的第一种方法是在熊猫中结合使用groupby()和mean(): import pandas if __name__ == "__main__": data = pandas.read_csv("test.csv", header=0) data = data.groupby(["x", "y"], as_index=F

我的目标是按列
“x”
“y”
对行进行分组,并计算列
“A”
“B”
的算术平均值

我的第一种方法是在熊猫中结合使用
groupby()
mean()

import pandas

if __name__ == "__main__":
    data = pandas.read_csv("test.csv", header=0)
    data = data.groupby(["x", "y"], as_index=False).mean()
    print(data)
运行此脚本将产生以下输出:

            x        y         B
0  8000000000      0,1  0.114247
1  8000000000  0,1,2,3  0.428297
正如我们所见,实现单值列
“B”
的目标非常简单。但是,省略了列
“A”
。相反,我希望列
“A”
带有一个字符串,其中包含每个逗号分隔值的算术平均值。所需的输出应如下所示:

            x        y                                    A         B
0  8000000000      0,1                    0.114139,0.113959  0.114247
1  8000000000  0,1,2,3  0.167816,0.172414,0.419196,0.427081  0.428297

有人知道怎么做吗?

您可以创建一个自定义聚合函数,将这些字符串解析为列表,查找每列的平均值,并将其重新格式化为字符串:

def字符串\u平均值(行):
数据列表=[]
对于行中的行:
数据\u list.append([float(item)表示第行中的项拆分(“,”))
数据=np.数组(数据列表)
返回“,”
groupby([“x”,“y”],as_index=False).agg({“A”:string_mean,“B”:“mean”})
返回

            x        y                                    A         B
0  8000000000      0,1                    0.114139,0.113959  0.114247
1  8000000000  0,1,2,3  0.167816,0.172414,0.419196,0.427081  0.428297
请注意,如果组中的字符串在单个组中具有不同的列数,则会出错

顺便说一句,你可能会清理我上面的函数