Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/database/8.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 基于熵的数据库选择性直方图_Python_Database_Relational Database_Histogram_Entropy - Fatal编程技术网

Python 基于熵的数据库选择性直方图

Python 基于熵的数据库选择性直方图,python,database,relational-database,histogram,entropy,Python,Database,Relational Database,Histogram,Entropy,我正试图在此基础上实现一个基于熵的直方图(PDF警告!),但最大熵算法伪代码一点也不清楚 他们正在计算一个桶的“面积”,做ai=fi*si(第4.2节),其中fi是i值的频率,si是“排列”。第一个问题是,我不确定价差是多少,但根据参考文献[16],价差应该是(v_(I+1)-v_I),意思是“与下一项的距离” 然而,在算法中,他们使用Ab作为第7行的区域列表(例如),并作为第14行的频率(例如)。所以不清楚ab是否是一个区域、频率的列表,或者它们是否只是随机交换 你能帮我清除伪代码吗?我用py

我正试图在此基础上实现一个基于熵的直方图(PDF警告!),但最大熵算法伪代码一点也不清楚

他们正在计算一个桶的“面积”,做ai=fi*si(第4.2节),其中fi是i值的频率,si是“排列”。第一个问题是,我不确定价差是多少,但根据参考文献[16],价差应该是(v_(I+1)-v_I),意思是“与下一项的距离”

然而,在算法中,他们使用Ab作为第7行的区域列表(例如),并作为第14行的频率(例如)。所以不清楚ab是否是一个区域、频率的列表,或者它们是否只是随机交换

你能帮我清除伪代码吗?我用python做了一个实现,但它不起作用,我得到了
localMinH
的负值:

def build_struct(self):
    self.conn = sqlite3.connect(self.db)
    self.cursor = self.conn.cursor()
    self.calculateFrequency()
    self.calculateArea()
    self.splits = []
    self.entropies = {}
    minHeap = [(self.H(self.frequency.keys()), 0, len(self.frequency), 0)]
    while len(self.splits) < self.parameter:
        previous = minHeap
        minHeap = []
        for bucket in previous:
            a = bucket[1]
            b = bucket[2]
            wb = sum(self.areas[a:b])
            if wb > 1:
                tr = sum(self.frequency.keys()[a:b])
                locCutPos = tl = hl = 0
                localMinH = -1
                hr = ho = self.H(self.frequency.keys()[a:b])
                for j in xrange(len(self.areas[a:b]) - 1):
                    x = self.frequency[self.frequency.keys()[j+a]]
                    tl += x
                    tr -= x
                    hl = self.H2(x/(x+tl), tl/(x+tl)) + (tl/(tl+x))*hl
                    #print a,b,x,tr
                    hr = (hr - self.H2(x/tr, (tr-x)/tr))* (tr/(tr-x))
                    hmenos = ho - (hl + hr)
                    if (localMinH == -1) or (hmenos < localMinH):
                        locCutPos = j
                        localMinH = hmenos
                print wb, localMinH
                heapq.heappush(minHeap, (wb*localMinH, a, a+locCutPos, locCutPos))
                heapq.heappush(minHeap, (wb*localMinH, a+locCutPos, b, locCutPos))
        bucket = minHeap[0]
        self.splits.append(bucket[1] + bucket[3])
        self.entropies[bucket[1] + bucket[3]] = bucket[0]
def build_struct(self):
self.conn=sqlite3.connect(self.db)
self.cursor=self.conn.cursor()
自计算频率()
self.calculateArea()
self.splits=[]
自熵={}
minHeap=[(self.H(self.frequency.keys()),0,len(self.frequency),0]
而len(self.splits)1:
tr=sum(self.frequency.keys()[a:b])
locCutPos=tl=hl=0
localMinH=-1
hr=ho=self.H(self.frequency.keys()[a:b])
对于X范围内的j(透镜(自面积[a:b])-1):
x=self.frequency[self.frequency.keys()[j+a]]
tl+=x
tr-=x
hl=自身H2(x/(x+tl),tl/(x+tl))+(tl/(tl+x))*hl
#打印a、b、x、tr
hr=(hr-self.H2(x/tr,(tr-x)/tr))*(tr/(tr-x))
hmenos=ho-(hl+hr)
如果(localMinH==-1)或(hmenos
self.frequency
是一个dict,其值为键,频率为值。self.areas只是一个区域列表