Machine learning 什么是期望最大化技术的直观解释？_Machine Learning_Cluster Analysis_Data Mining_Mathematical Optimization_Expectation Maximization

Machine learning 什么是期望最大化技术的直观解释？

machine-learning

Machine learning 什么是期望最大化技术的直观解释？,machine-learning,cluster-analysis,data-mining,mathematical-optimization,expectation-maximization,Machine Learning,Cluster Analysis,Data Mining,Mathematical Optimization,Expectation Maximization,期望最大化（EM）是一种对数据进行分类的概率方法。如果不是分类器，请纠正我的错误这种电磁技术的直观解释是什么？这里什么是期望值，什么是最大化？EM用于最大化具有潜在变量Z的模型Q的可能性这是一个迭代优化 theta <- initial guess for hidden parameters while not converged: #e-step Q(theta'|theta) = E[log L(theta|Z)] #m-step theta <

期望最大化（EM）是一种对数据进行分类的概率方法。如果不是分类器，请纠正我的错误

这种电磁技术的直观解释是什么？这里什么是

期望值

，什么是

最大化

？

EM用于最大化具有潜在变量Z的模型Q的可能性

这是一个迭代优化

theta <- initial guess for hidden parameters
while not converged:
    #e-step
    Q(theta'|theta) = E[log L(theta|Z)]
    #m-step
    theta <- argmax_theta' Q(theta'|theta)

thetaEM是一种算法，用于在模型中的某些变量未被观测时（即，当存在潜在变量时），最大化似然函数
你可能会问，如果我们只是想最大化一个函数，为什么不使用现有的机制来最大化一个函数呢。如果你试图通过取导数并将其设为零来最大化，你会发现在很多情况下一阶条件没有解。要解决模型参数，需要知道未观测数据的分布，这是一个鸡和蛋的问题；但未观测数据的分布是模型参数的函数
E-M试图通过迭代猜测未观测数据的分布来绕过此问题，然后通过最大化实际似然函数的下限来估计模型参数，并重复直到收敛：
EM算法
从猜测模型参数的值开始
E-step：对于每个缺失值的数据点，根据您当前对模型参数的猜测和观察到的数据，使用您的模型方程求解缺失数据的分布（请注意，您求解的是每个缺失值的分布，而不是预期值的分布）。现在我们有了每个缺失值的分布，我们可以计算关于未观测变量的似然函数的期望值。如果我们对模型参数的猜测是正确的，那么预期的可能性将是我们观察数据的实际可能性；如果参数不正确，它将只是一个下限
M-step：现在我们已经得到了一个期望似然函数，其中没有未观测到的变量，最大化函数，就像在完全观测的情况下一样，以获得模型参数的新估计
thetaA = 0.71301, thetaB = 0.58134
thetaA = 0.74529, thetaB = 0.56926
thetaA = 0.76810, thetaB = 0.54954
thetaA = 0.78316, thetaB = 0.53462
thetaA = 0.79106, thetaB = 0.52628
thetaA = 0.79453, thetaB = 0.52239
thetaA = 0.79593, thetaB = 0.52073
thetaA = 0.79647, thetaB = 0.52005
thetaA = 0.79667, thetaB = 0.51977
thetaA = 0.79674, thetaB = 0.51966
thetaA = 0.79677, thetaB = 0.51961
thetaA = 0.79678, thetaB = 0.51960
thetaA = 0.79679, thetaB = 0.51959
Final result:
thetaA = 0.79678, thetaB = 0.51960

重复，直到收敛。
从技术上讲，术语“EM”有点不明确，但我假设您指的是高斯混合建模聚类分析技术，这是一般EM原理的一个实例
实际上，EM聚类分析不是一个分类器。我知道有些人认为聚类是“非监督分类”，但实际上聚类分析是完全不同的。
关键的区别和人们对聚类分析的误解在于：在聚类分析中，没有“正确的解决方案”。这是一种知识发现方法，实际上是为了发现新的东西！这使得评估非常棘手。通常使用已知的分类作为参考进行评估，但这并不总是合适的：您拥有的分类可能反映也可能不反映数据中的内容
让我给你举个例子：你有一个庞大的客户数据集，包括性别数据。当您将此数据集与现有类进行比较时，将其拆分为“男性”和“女性”的方法是最佳的。从“预测”的角度来看，这是好的，对于新用户来说，你现在可以预测他们的性别。在“知识发现”的思维方式中，这实际上是不好的，因为您想在数据中发现一些新的结构。然而，将数据分为老年人和儿童的方法，其得分将与男性/女性的得分一样差。然而，这将是一个很好的聚类结果（如果没有给出年龄的话）
现在回到EM。本质上，它假设您的数据由多个多元正态分布组成（请注意，这是一个非常有力的假设，特别是当您确定簇的数量时！）。然后，它尝试通过交替改进模型和模型的对象分配来找到局部最优模型
为了在分类上下文中获得最佳结果，请选择大于类数量的集群数量，或者甚至仅将集群应用于单个类（以了解类中是否存在某种结构！）
假设你想训练一个分类器来区分“汽车”、“自行车”和“卡车”。假设数据恰好由3个正态分布组成是没有什么用处的。但是，您可以假设有多种类型的汽车（以及卡车和自行车）。因此，不要为这三个类训练分类器，而是将汽车、卡车和自行车分别聚类成10个簇（或者10辆汽车、3辆卡车和3辆自行车，随便什么），然后训练分类器区分这30个类，然后将类结果合并回原始类。您还可能发现有一个集群特别难以分类，例如Trikes。它们有点像汽车，也有点像自行车。或者送货卡车，它们更像是超大轿车而不是卡车。
这里有一个简单的方法来理解期望最大化算法：
1-阅读Do和Batzoglou的文章
2-你的脑子里可能有问号，看看这个数学堆栈交换的解释
3-看看我用Python编写的这段代码，它解释了EM教程论文第1项中的示例：
警告：由于我不是Python开发人员，代码可能很混乱/不理想。但它确实起到了作用
import numpy as np
import math

#### E-M Coin Toss Example as given in the EM tutorial paper by Do and Batzoglou* #### 

def get_mn_log_likelihood(obs,probs):
    """ Return the (log)likelihood of obs, given the probs"""
    # Multinomial Distribution Log PMF
    # ln (pdf)      =             multinomial coeff            *   product of probabilities
    # ln[f(x|n, p)] = [ln(n!) - (ln(x1!)+ln(x2!)+...+ln(xk!))] + [x1*ln(p1)+x2*ln(p2)+...+xk*ln(pk)]     

    multinomial_coeff_denom= 0
    prod_probs = 0
    for x in range(0,len(obs)): # loop through state counts in each observation
        multinomial_coeff_denom = multinomial_coeff_denom + math.log(math.factorial(obs[x]))
        prod_probs = prod_probs + obs[x]*math.log(probs[x])

    multinomial_coeff = math.log(math.factorial(sum(obs))) -  multinomial_coeff_denom
    likelihood = multinomial_coeff + prod_probs
    return likelihood

# 1st:  Coin B, {HTTTHHTHTH}, 5H,5T
# 2nd:  Coin A, {HHHHTHHHHH}, 9H,1T
# 3rd:  Coin A, {HTHHHHHTHH}, 8H,2T
# 4th:  Coin B, {HTHTTTHHTT}, 4H,6T
# 5th:  Coin A, {THHHTHHHTH}, 7H,3T
# so, from MLE: pA(heads) = 0.80 and pB(heads)=0.45

# represent the experiments
head_counts = np.array([5,9,8,4,7])
tail_counts = 10-head_counts
experiments = zip(head_counts,tail_counts)

# initialise the pA(heads) and pB(heads)
pA_heads = np.zeros(100); pA_heads[0] = 0.60
pB_heads = np.zeros(100); pB_heads[0] = 0.50

# E-M begins!
delta = 0.001  
j = 0 # iteration counter
improvement = float('inf')
while (improvement>delta):
    expectation_A = np.zeros((5,2), dtype=float) 
    expectation_B = np.zeros((5,2), dtype=float)
    for i in range(0,len(experiments)):
        e = experiments[i] # i'th experiment
        ll_A = get_mn_log_likelihood(e,np.array([pA_heads[j],1-pA_heads[j]])) # loglikelihood of e given coin A
        ll_B = get_mn_log_likelihood(e,np.array([pB_heads[j],1-pB_heads[j]])) # loglikelihood of e given coin B

        weightA = math.exp(ll_A) / ( math.exp(ll_A) + math.exp(ll_B) ) # corresponding weight of A proportional to likelihood of A 
        weightB = math.exp(ll_B) / ( math.exp(ll_A) + math.exp(ll_B) ) # corresponding weight of B proportional to likelihood of B                            

        expectation_A[i] = np.dot(weightA, e) 
        expectation_B[i] = np.dot(weightB, e)

    pA_heads[j+1] = sum(expectation_A)[0] / sum(sum(expectation_A)); 
    pB_heads[j+1] = sum(expectation_B)[0] / sum(sum(expectation_B)); 

    improvement = max( abs(np.array([pA_heads[j+1],pB_heads[j+1]]) - np.array([pA_heads[j],pB_heads[j]]) ))
    j = j+1

使用Zhubarb的答案中引用的Do和Batzoglou的同一篇文章，我实现了EM
1st:  {H,T,T,T,H,H,T,H,T,H} 5 Heads, 5 Tails; Did coin A or B generate me?
2nd:  {H,H,H,H,T,H,H,H,H,H} 9 Heads, 1 Tails
3rd:  {H,T,H,H,H,H,H,T,H,H} 8 Heads, 2 Tails
4th:  {H,T,H,T,T,T,H,H,T,T} 4 Heads, 6 Tails
5th:  {T,H,H,H,T,H,H,H,T,H} 7 Heads, 3 Tails

Two possible coins, A & B are used to generate these distributions.
A & B have an unknown parameter: their bias towards heads.

We don't know the biases, but we can simply start with a guess: A=60% heads, B=50% heads.

          | EM guess | Actual |  Delta
----------+----------+--------+-------
Red mean  |    2.910 |  2.802 |  0.108
Red std   |    0.854 |  0.871 | -0.017
Blue mean |    6.838 |  6.932 | -0.094
Blue std  |    2.227 |  2.195 |  0.032