在Python中从平均值计算中删除某些值
我试图计算以下样本数据集的平均值:在Python中从平均值计算中删除某些值,python,python-2.7,mean,Python,Python 2.7,Mean,我试图计算以下样本数据集的平均值: bogie-n bypass-n 0.00304367004111 flask-n bypass-n 0.00298246799918 faggot-n sprayer-n 0.00507314183347 bypass-n sprayer-n 0.00136494481917 sprayer-n sprayer-n 1.0 我想从平均值的计算中删除任何等于1或等于0的值。 为此,我编写了以下代码: with open
bogie-n bypass-n 0.00304367004111
flask-n bypass-n 0.00298246799918
faggot-n sprayer-n 0.00507314183347
bypass-n sprayer-n 0.00136494481917
sprayer-n sprayer-n 1.0
我想从平均值的计算中删除任何等于1或等于0的值。
为此,我编写了以下代码:
with open(infile) as f:
cols = [float(row.split("\t")[2]) for row in f.readlines()]
for col in cols:
if col == 1 or col == 0:
pass
else:
normalizedDataEuc = float(sum(cols))/float(len(cols))
output = infile + "\t" + str(normalizedDataEuc) + "\n"
print output
此代码成功地计算了整个数据集的平均值(在上述样本数据的情况下为0.202492845
),但在排除值1(即0.202492845
)时,无法计算数据集的平均值
我试图实现一个双条件,该条件是
col
变量要满足的,但它似乎没有这样做,有什么建议吗?您的双条件工作得很好(当您在其中放入一些打印语句时可以看到);问题是在for
循环的每次迭代中,您都会计算整个cols
列表的平均值
相反,您应该过滤cols
列表以删除1
和0
值,然后计算过滤列表的平均值(仅这两行,不带循环):
“韦斯·麦金尼是个天才。如果你正在实现韦斯·麦金尼(Wes McKinney)已经在他的图书馆中放的任何东西,那就停下来。他的代码比您将要编写的任何代码都更快、更健壮、更可能是正确的。想要滚动窗口聚合器吗?使用熊猫。需要处理丢失的数据吗?使用熊猫。你是在写一种难以置信的丑陋的黑客,它试图在NumPy阵列上实现join和group by,但实际上却花了3个小时计算一个微妙的错误结果?(我已经这样做了)。天哪,停下来用熊猫吧。”
我同意这一点。只需使用Pandas。重做以前做过的、由更多人做过的、针对性能进行优化的工作是毫无意义的
解决方案:
import pandas as pd
s="""bogie-n bypass-n 0.00304367004111
...: flask-n bypass-n 0.00298246799918
...: faggot-n sprayer-n 0.00507314183347
...: bypass-n sprayer-n 0.00136494481917
...: sprayer-n sprayer-n 1.0"""
lines = [x.split() for x in s.split('\n')]
lines
Out[152]:
[['bogie-n', 'bypass-n', '0.00304367004111'],
['flask-n', 'bypass-n', '0.00298246799918'],
['faggot-n', 'sprayer-n', '0.00507314183347'],
['bypass-n', 'sprayer-n', '0.00136494481917'],
['sprayer-n', 'sprayer-n', '1.0']]
df = pd.DataFrame(lines)
df
Out[154]:
0 1 2
0 bogie-n bypass-n 0.00304367004111
1 flask-n bypass-n 0.00298246799918
2 faggot-n sprayer-n 0.00507314183347
3 bypass-n sprayer-n 0.00136494481917
4 sprayer-n sprayer-n 1.0
df[2] = df[2].astype(float)
df
Out[163]:
0 1 2
0 bogie-n bypass-n 0.003044
1 flask-n bypass-n 0.002982
2 faggot-n sprayer-n 0.005073
3 bypass-n sprayer-n 0.001365
4 sprayer-n sprayer-n 1.000000
df[df[2] != 1.0]
Out[164]:
0 1 2
0 bogie-n bypass-n 0.003044
1 flask-n bypass-n 0.002982
2 faggot-n sprayer-n 0.005073
3 bypass-n sprayer-n 0.001365
这就是问题的答案吗?如果是的话,至少给已经提供了“非1或零表示”功能的Pandas模块提供一个提示。@tobias_k:只是简单的数据帧。这是Pandas中的101,任何教程都会涉及到这类内容。无论如何,我在上面添加了(草案)解决方案。
import pandas as pd
s="""bogie-n bypass-n 0.00304367004111
...: flask-n bypass-n 0.00298246799918
...: faggot-n sprayer-n 0.00507314183347
...: bypass-n sprayer-n 0.00136494481917
...: sprayer-n sprayer-n 1.0"""
lines = [x.split() for x in s.split('\n')]
lines
Out[152]:
[['bogie-n', 'bypass-n', '0.00304367004111'],
['flask-n', 'bypass-n', '0.00298246799918'],
['faggot-n', 'sprayer-n', '0.00507314183347'],
['bypass-n', 'sprayer-n', '0.00136494481917'],
['sprayer-n', 'sprayer-n', '1.0']]
df = pd.DataFrame(lines)
df
Out[154]:
0 1 2
0 bogie-n bypass-n 0.00304367004111
1 flask-n bypass-n 0.00298246799918
2 faggot-n sprayer-n 0.00507314183347
3 bypass-n sprayer-n 0.00136494481917
4 sprayer-n sprayer-n 1.0
df[2] = df[2].astype(float)
df
Out[163]:
0 1 2
0 bogie-n bypass-n 0.003044
1 flask-n bypass-n 0.002982
2 faggot-n sprayer-n 0.005073
3 bypass-n sprayer-n 0.001365
4 sprayer-n sprayer-n 1.000000
df[df[2] != 1.0]
Out[164]:
0 1 2
0 bogie-n bypass-n 0.003044
1 flask-n bypass-n 0.002982
2 faggot-n sprayer-n 0.005073
3 bypass-n sprayer-n 0.001365