Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/url/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 回归与回归的视觉比较;主成分分析_R_Regression_Linear Regression_Pca - Fatal编程技术网

R 回归与回归的视觉比较;主成分分析

R 回归与回归的视觉比较;主成分分析,r,regression,linear-regression,pca,R,Regression,Linear Regression,Pca,我正试图完善一种比较回归和主成分分析的方法,这是受到博客的启发,博客也从不同的角度进行了讨论。在我忘记之前,我要感谢JD Long和Josh Ulrich,感谢他们提供了本文的大部分核心内容。我将在下学期的课程中使用这个。对不起,这太长了 更新:我发现了一种几乎可以工作的不同方法(如果可以的话,请修复它!)。我把它贴在底部。一个比我想出的更聪明、更短的方法 我基本上遵循了前面的方案,直到某一点:生成随机数据,找出最佳拟合线,绘制残差。这在下面的第二个代码块中显示。但我也仔细研究并编写了一些函数,

我正试图完善一种比较回归和主成分分析的方法,这是受到博客的启发,博客也从不同的角度进行了讨论。在我忘记之前,我要感谢JD Long和Josh Ulrich,感谢他们提供了本文的大部分核心内容。我将在下学期的课程中使用这个。对不起,这太长了

更新:我发现了一种几乎可以工作的不同方法(如果可以的话,请修复它!)。我把它贴在底部。一个比我想出的更聪明、更短的方法

我基本上遵循了前面的方案,直到某一点:生成随机数据,找出最佳拟合线,绘制残差。这在下面的第二个代码块中显示。但我也仔细研究并编写了一些函数,通过一个随机点(本例中的数据点)画出与直线垂直的直线。我认为这些工作很好,它们在第一个代码块中显示,并证明它们工作

现在,第二个代码块使用与@JDLong相同的流显示了整个过程,我正在添加结果图的图像。黑色的数据,红色是残差粉色的回归,蓝色是第一个PC,浅蓝色应该是正常值,但显然不是。第一个代码块中绘制这些法线的函数看起来不错,但演示中有些地方不正确:我想我一定是误解了什么或传递了错误的值。我的法线是水平的,这似乎是一个有用的线索(但到目前为止,对我来说不是)。有人能看出这里出了什么问题吗

谢谢,这让我烦恼了一段时间。。。

第一个代码块(用于绘制法线并证明其有效的函数):

######下面的函数非常松散地基于末尾的引用

PointOnlineEarpoint尝试更改这行代码:

res <- pointOnLineNearPoint(x, y, yx2.lm$coef[2], yx2.lm$coef[1])

res好的,我必须回答我自己的问题!在进一步阅读和比较了人们在互联网上使用的方法之后,我解决了这个问题。我不确定我是否能清楚地说明我“修复”了什么,因为我经历了很多次迭代。不管怎样,这里是绘图和代码(MWE)。为清晰起见,助手函数位于末尾

# Comparison of Linear Regression & PCA
# Generate sample data

set.seed(39) # gives a decent-looking example
np <- 10 # number of data points
x <- -np:np
e <- rnorm(length(x), 0, 10)
y <- rnorm(1, 0, 2) * x + 3*rnorm(1, 0, 2) + e

# Plot the main data & residuals

plot(x, y, main = "Regression minimizes the y-residuals & PCA the normals", asp = 1)
yx.lm <- lm(y ~ x)
lines(x, predict(yx.lm), col = "red", lwd = 2)
segments(x, y, x, fitted(yx.lm), col = "pink")

# Now the PCA using built-in functions
# rotation = loadings = eigenvectors

r <- prcomp(cbind(x,y), retx = TRUE)
b <- r$rotation[2,1] / r$rotation[1,1] # gets slope of loading/eigenvector 1
a <- r$center[2] - b * r$center[1]
abline(a, b, col = "blue") # Plot 1st PC

# Plot normals to 1st PC

X <- pointOnLineNearPoint(x, y, b, a)
segments( x, y, X[,1], X[,2], col = "lightblue1")

###### Needed Functions

pointOnLineNearPoint <- function(Px, Py, slope, intercept) {
    # Px, Py is the point to test, can be a vector.
    # slope, intercept is the line to check distance.

    Ax <- Px-10*diff(range(Px))
    Bx <- Px+10*diff(range(Px))
    Ay <- Ax * slope + intercept
    By <- Bx * slope + intercept
    pointOnLine(Px, Py, Ax, Ay, Bx, By)
    }

pointOnLine <- function(Px, Py, Ax, Ay, Bx, By) {

    # This approach based upon comingstorm's answer on
    # stackoverflow.com/questions/3120357/get-closest-point-to-a-line
    # Vectorized by Bryan

    PB <- data.frame(x = Px - Bx, y = Py - By)
    AB <- data.frame(x = Ax - Bx, y = Ay - By)
    PB <- as.matrix(PB)
    AB <- as.matrix(AB)
    k_raw <- k <- c()
    for (n in 1:nrow(PB)) {
        k_raw[n] <- (PB[n,] %*% AB[n,])/(AB[n,] %*% AB[n,])
        if (k_raw[n] < 0)  { k[n] <- 0
            } else { if (k_raw[n] > 1) k[n] <- 1
                else k[n] <- k_raw[n] }
        }
    x = (k * Ax + (1 - k)* Bx)
    y = (k * Ay + (1 - k)* By)
    ans <- data.frame(x, y)
    ans
    }

线性回归与主成分分析的比较 #生成样本数据 set.seed(39)#给出了一个看起来不错的例子
npIn,换行
u啊,我可能不太清楚。浅蓝色线应垂直(垂直)于蓝色线,并从原始数据(黑色开放圆)开始。谢谢,很有趣。非常感谢。我相信文森特的密码过去是有用的。我想知道问题是怎么来的。他一定发布了代码草案,但不是最终代码中的数字。
set.seed(1)
x <- rnorm(20)
y <- x + rnorm(20)
plot(y~x, asp = 1)
r <- lm(y~x)
abline(r, col='red')

r <- princomp(cbind(x,y))
b <- r$loadings[2,1] / r$loadings[1,1]
a <- r$center[2] - b * r$center[1]
abline(a, b, col = "blue")
title(main='Appears to use the reflection of PC1')

u <- r$loadings
# Projection onto the first axis
p <- matrix( c(1,0,0,0), nrow=2 )
X <- rbind(x,y)
X <- r$center + solve(u, p %*% u %*% (X - r$center))
segments( x, y, X[1,], X[2,] , col = "lightblue1")
res <- pointOnLineNearPoint(x, y, yx2.lm$coef[2], yx2.lm$coef[1])
res <- pointOnLineNearPoint(x, new.y, yx2.lm$coef[2], yx2.lm$coef[1])
# Comparison of Linear Regression & PCA
# Generate sample data

set.seed(39) # gives a decent-looking example
np <- 10 # number of data points
x <- -np:np
e <- rnorm(length(x), 0, 10)
y <- rnorm(1, 0, 2) * x + 3*rnorm(1, 0, 2) + e

# Plot the main data & residuals

plot(x, y, main = "Regression minimizes the y-residuals & PCA the normals", asp = 1)
yx.lm <- lm(y ~ x)
lines(x, predict(yx.lm), col = "red", lwd = 2)
segments(x, y, x, fitted(yx.lm), col = "pink")

# Now the PCA using built-in functions
# rotation = loadings = eigenvectors

r <- prcomp(cbind(x,y), retx = TRUE)
b <- r$rotation[2,1] / r$rotation[1,1] # gets slope of loading/eigenvector 1
a <- r$center[2] - b * r$center[1]
abline(a, b, col = "blue") # Plot 1st PC

# Plot normals to 1st PC

X <- pointOnLineNearPoint(x, y, b, a)
segments( x, y, X[,1], X[,2], col = "lightblue1")

###### Needed Functions

pointOnLineNearPoint <- function(Px, Py, slope, intercept) {
    # Px, Py is the point to test, can be a vector.
    # slope, intercept is the line to check distance.

    Ax <- Px-10*diff(range(Px))
    Bx <- Px+10*diff(range(Px))
    Ay <- Ax * slope + intercept
    By <- Bx * slope + intercept
    pointOnLine(Px, Py, Ax, Ay, Bx, By)
    }

pointOnLine <- function(Px, Py, Ax, Ay, Bx, By) {

    # This approach based upon comingstorm's answer on
    # stackoverflow.com/questions/3120357/get-closest-point-to-a-line
    # Vectorized by Bryan

    PB <- data.frame(x = Px - Bx, y = Py - By)
    AB <- data.frame(x = Ax - Bx, y = Ay - By)
    PB <- as.matrix(PB)
    AB <- as.matrix(AB)
    k_raw <- k <- c()
    for (n in 1:nrow(PB)) {
        k_raw[n] <- (PB[n,] %*% AB[n,])/(AB[n,] %*% AB[n,])
        if (k_raw[n] < 0)  { k[n] <- 0
            } else { if (k_raw[n] > 1) k[n] <- 1
                else k[n] <- k_raw[n] }
        }
    x = (k * Ax + (1 - k)* Bx)
    y = (k * Ay + (1 - k)* By)
    ans <- data.frame(x, y)
    ans
    }