R 选举/普查数据的多元线性回归&;结果误差

R 选举/普查数据的多元线性回归&;结果误差,r,dplyr,regression,R,Dplyr,Regression,我有以下数据: library(tidyverse) df <- tibble( "racecmb" = c("White", "White", "White", "White", "White", "White", "White", "White", "Black", "White", "Mixed", "Black", "White", "White", "White"), "age" = c(77,74,55,62,60,

我有以下数据:

library(tidyverse)

df <- tibble(
  "racecmb" = c("White", "White", "White", "White", "White", "White", 
            "White", "White", "Black", "White", "Mixed", 
            "Black", "White", "White", "White"),
  "age" = c(77,74,55,62,60,59,32,91,75,73,43,67,58,18,57),
  "income" = c("10 to under $20,000", "100 to under $150,000", 
           "75 to under $100,000",  "75 to under $100,000",
           "10 to under $20,000", "20 to under $30,000",
           "100 to under $150,000", "20 to under $30,000",
           "100 to under $150,000", "20 to under $30,000",
           "100 to under $150,000", "Less than $10,000",
           "$150,000 or more", " 30 to under $40,000",
           "50 to under $75,000"),
  "party" = c("Independent", "Independent", "Independent", "Democrat", 
          "Independent", "Republican", "Independent", 
          "Independent", "Democrat", "Republican", "Republican", 
          "Democrat", "Democrat", "Independent", "Independent"),
 "ideology" = c("Moderate", "Moderate", "Conservative", "Moderate", 
             "Moderate", "Very conservative", "Moderate", 
             "Conservative", 
             "Conservative", "Moderate", "Conservative", 
             "Very conservative", "Liberal", "Moderate", "Conservative")
             )
我的目标是解释一些人投票的方式,但我不知道如何有效地为我的模型编码数据


非常感谢您的任何意见/建议……

因此,首先,对分类变量使用
lm()
并不理想。您希望使用的是
rpart()
,它可以将输出作为类别或类,或者您可以使用多项式logit/probit回归来返回给定条件下发生的结果概率

要安装的软件包:rpart和statisticalModeling

如果您没有分类响应变量,您可以将分类变量转换为虚拟变量,然后运行包含虚拟变量的回归(请记住保留一个作为基线)

这可以使用
fastDummies
软件包快速实现:


示例:
df您似乎试图用分类响应拟合线性模型。那没有多大意义。你能描述一下你想做什么吗?此外,确保在中共享样本数据,以便将其复制/粘贴到R中进行测试。示例数据中的空格使这变得很困难。那么我应该使用
glm()
而不是
lm()
?我想这会更有意义,也会让它重现。可以用公式中的
对交互进行编码,对吗?谢谢你的帮助!另外,当我尝试虚拟转换时,它只给出了一个错误:
stopifnot中的错误(is.null(选择_列)| is.character(选择_列),:未找到对象“意识形态”
-这是因为变量是一个因素吗?我似乎在代码中犯了一个错误。列名应该用引号括起来!因此应该是:
df
regression <- lm(party ~ income + ideo + age, data = df) %>%
   summary()
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
NA/NaN/Inf in 'y'