Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/python-2.7/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在Python中从平均值计算中删除某些值_Python_Python 2.7_Mean - Fatal编程技术网

在Python中从平均值计算中删除某些值

在Python中从平均值计算中删除某些值,python,python-2.7,mean,Python,Python 2.7,Mean,我试图计算以下样本数据集的平均值: bogie-n bypass-n 0.00304367004111 flask-n bypass-n 0.00298246799918 faggot-n sprayer-n 0.00507314183347 bypass-n sprayer-n 0.00136494481917 sprayer-n sprayer-n 1.0 我想从平均值的计算中删除任何等于1或等于0的值。 为此,我编写了以下代码: with open

我试图计算以下样本数据集的平均值:

bogie-n bypass-n    0.00304367004111
flask-n bypass-n    0.00298246799918
faggot-n    sprayer-n   0.00507314183347
bypass-n    sprayer-n   0.00136494481917
sprayer-n   sprayer-n   1.0
我想从平均值的计算中删除任何等于1或等于0的值。 为此,我编写了以下代码:

with open(infile) as f:
    cols = [float(row.split("\t")[2]) for row in f.readlines()]
    for col in cols:
        if col == 1 or col == 0:
            pass
        else:
            normalizedDataEuc = float(sum(cols))/float(len(cols))
            output = infile + "\t" + str(normalizedDataEuc) + "\n"
            print output
此代码成功地计算了整个数据集的平均值(在上述样本数据的情况下为
0.202492845
),但在排除值1(即
0.202492845
)时,无法计算数据集的平均值


我试图实现一个双条件,该条件是
col
变量要满足的,但它似乎没有这样做,有什么建议吗?

您的双条件工作得很好(当您在其中放入一些打印语句时可以看到);问题是在
for
循环的每次迭代中,您都会计算整个
cols
列表的平均值

相反,您应该过滤
cols
列表以删除
1
0
值,然后计算过滤列表的平均值(仅这两行,不带循环):

“韦斯·麦金尼是个天才。如果你正在实现韦斯·麦金尼(Wes McKinney)已经在他的图书馆中放的任何东西,那就停下来。他的代码比您将要编写的任何代码都更快、更健壮、更可能是正确的。想要滚动窗口聚合器吗?使用熊猫。需要处理丢失的数据吗?使用熊猫。你是在写一种难以置信的丑陋的黑客,它试图在NumPy阵列上实现join和group by,但实际上却花了3个小时计算一个微妙的错误结果?(我已经这样做了)。天哪,停下来用熊猫吧。”

我同意这一点。只需使用Pandas。重做以前做过的、由更多人做过的、针对性能进行优化的工作是毫无意义的

解决方案:

import pandas as pd

s="""bogie-n bypass-n    0.00304367004111
     ...: flask-n bypass-n    0.00298246799918
     ...: faggot-n    sprayer-n   0.00507314183347
     ...: bypass-n    sprayer-n   0.00136494481917
     ...: sprayer-n   sprayer-n   1.0"""

lines = [x.split() for x in s.split('\n')]

lines
Out[152]:
[['bogie-n', 'bypass-n', '0.00304367004111'],
 ['flask-n', 'bypass-n', '0.00298246799918'],
 ['faggot-n', 'sprayer-n', '0.00507314183347'],
 ['bypass-n', 'sprayer-n', '0.00136494481917'],
 ['sprayer-n', 'sprayer-n', '1.0']]

df = pd.DataFrame(lines)

df
Out[154]:
           0          1                 2
0    bogie-n   bypass-n  0.00304367004111
1    flask-n   bypass-n  0.00298246799918
2   faggot-n  sprayer-n  0.00507314183347
3   bypass-n  sprayer-n  0.00136494481917
4  sprayer-n  sprayer-n               1.0



df[2] = df[2].astype(float)

df
Out[163]:
           0          1         2
0    bogie-n   bypass-n  0.003044
1    flask-n   bypass-n  0.002982
2   faggot-n  sprayer-n  0.005073
3   bypass-n  sprayer-n  0.001365
4  sprayer-n  sprayer-n  1.000000



df[df[2] != 1.0]
Out[164]:
          0          1         2
0   bogie-n   bypass-n  0.003044
1   flask-n   bypass-n  0.002982
2  faggot-n  sprayer-n  0.005073
3  bypass-n  sprayer-n  0.001365

这就是问题的答案吗?如果是的话,至少给已经提供了“非1或零表示”功能的Pandas模块提供一个提示。@tobias_k:只是简单的数据帧。这是Pandas中的101,任何教程都会涉及到这类内容。无论如何,我在上面添加了(草案)解决方案。
import pandas as pd

s="""bogie-n bypass-n    0.00304367004111
     ...: flask-n bypass-n    0.00298246799918
     ...: faggot-n    sprayer-n   0.00507314183347
     ...: bypass-n    sprayer-n   0.00136494481917
     ...: sprayer-n   sprayer-n   1.0"""

lines = [x.split() for x in s.split('\n')]

lines
Out[152]:
[['bogie-n', 'bypass-n', '0.00304367004111'],
 ['flask-n', 'bypass-n', '0.00298246799918'],
 ['faggot-n', 'sprayer-n', '0.00507314183347'],
 ['bypass-n', 'sprayer-n', '0.00136494481917'],
 ['sprayer-n', 'sprayer-n', '1.0']]

df = pd.DataFrame(lines)

df
Out[154]:
           0          1                 2
0    bogie-n   bypass-n  0.00304367004111
1    flask-n   bypass-n  0.00298246799918
2   faggot-n  sprayer-n  0.00507314183347
3   bypass-n  sprayer-n  0.00136494481917
4  sprayer-n  sprayer-n               1.0



df[2] = df[2].astype(float)

df
Out[163]:
           0          1         2
0    bogie-n   bypass-n  0.003044
1    flask-n   bypass-n  0.002982
2   faggot-n  sprayer-n  0.005073
3   bypass-n  sprayer-n  0.001365
4  sprayer-n  sprayer-n  1.000000



df[df[2] != 1.0]
Out[164]:
          0          1         2
0   bogie-n   bypass-n  0.003044
1   flask-n   bypass-n  0.002982
2  faggot-n  sprayer-n  0.005073
3  bypass-n  sprayer-n  0.001365