Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/331.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 修剪/winsorized标准偏差_Python_Numpy_Statistics_Python 3.x_Scipy - Fatal编程技术网

Python 修剪/winsorized标准偏差

Python 修剪/winsorized标准偏差,python,numpy,statistics,python-3.x,scipy,Python,Numpy,Statistics,Python 3.x,Scipy,什么是计算列表的平均值或标准偏差的有效方法 我不介意使用numpy,但是如果我必须单独复制列表,它会非常慢。这就是生成器函数的用途 SD需要两次通过,再加上一次计数。因此,您需要在基集合上“tee”一些迭代器 所以 trimmed=(如果低这将生成两个副本,那么在_列表中x代表x),但是您应该尝试一下,因为它应该非常快 def trimmed_std(data, low, high): tmp = np.asarray(data) return tmp[(low <= tm

什么是计算列表的平均值或标准偏差的有效方法


我不介意使用
numpy
,但是如果我必须单独复制列表,它会非常慢。

这就是生成器函数的用途

SD需要两次通过,再加上一次计数。因此,您需要在基集合上“tee”一些迭代器

所以


trimmed=(如果低这将生成两个副本,那么在_列表中x代表x),但是您应该尝试一下,因为它应该非常快

def trimmed_std(data, low, high):
    tmp = np.asarray(data)
    return tmp[(low <= tmp) & (tmp < high)].std()

显然,您可以在不使用numpy的情况下实现这一点,但即使包括将列表转换为数组的时间,使用numpy也比我所能想到的任何方法都要快。

为了获得无偏的修剪平均值,您必须考虑列表中项目的小数位,如前所述和(稍微不那么直接).我写了一个函数来实现它:

def percent_tmean( data, pcent ):
   # make sure data is a list
   dc = list( data )
   # find the number of items
   n = len(dc)
   # sort the list
   dc.sort()
   # get the proportion to trim
   p = pcent / 100.0
   k = n*p
   # print "n = %i\np = %.3f\nk = %.3f" % ( n,p,k )
   # get the decimal and integer parts of k
   dec_part, int_part = modf( k )
   # get an index we can use
   index = int(int_part)
   # trim down the list
   dc = dc[ index: index * -1 ]
   # deal with the case of trimming fractional items
   if dec_part != 0.0:
       # deal with the first remaining item
       dc[ 0 ] = dc[ 0 ] * (1 - dec_part)
       # deal with last remaining item
       dc[ -1 ] = dc[ -1 ] * (1 - dec_part)
   return sum( dc ) / ( n - 2.0*k )
我还做了一个演示


我的函数可能会比那些已经发布的函数慢,但它会给出无偏的结果。

是的,我需要进行排名顺序(百分位)修剪。这就是我特别关注时间的原因:我需要再次传递,以确定保留原始列表中的哪些值。确实如此,但不知何故,它比复制到numpy数组慢8倍:(我想常规的python开销会比副本慢很多…慢8倍!这非常清楚地显示了
numpy
的价值。
def trimmed_std(data, percentile):
    data = np.array(data)
    data.sort()
    percentile = percentile / 2.
    low = int(percentile * len(data))
    high = int((1. - percentile) * len(data))
    return data[low:high].std(ddof=0)
def percent_tmean( data, pcent ):
   # make sure data is a list
   dc = list( data )
   # find the number of items
   n = len(dc)
   # sort the list
   dc.sort()
   # get the proportion to trim
   p = pcent / 100.0
   k = n*p
   # print "n = %i\np = %.3f\nk = %.3f" % ( n,p,k )
   # get the decimal and integer parts of k
   dec_part, int_part = modf( k )
   # get an index we can use
   index = int(int_part)
   # trim down the list
   dc = dc[ index: index * -1 ]
   # deal with the case of trimming fractional items
   if dec_part != 0.0:
       # deal with the first remaining item
       dc[ 0 ] = dc[ 0 ] * (1 - dec_part)
       # deal with last remaining item
       dc[ -1 ] = dc[ -1 ] * (1 - dec_part)
   return sum( dc ) / ( n - 2.0*k )