如何在R中的回归模型中使用ICD10代码？_R_Icd

如何在R中的回归模型中使用ICD10代码？

如何在R中的回归模型中使用ICD10代码？,r,icd,R,Icd,我正在努力寻找导致某种疾病的ICD10代码。但ICD10具有字母数字分类，例如A00.00。有1000种这样的分类，但我不知道如何在我的回归模型中使用它们。有什么建议吗资料患者存在ICD10糖尿病（Y） P1 A00.10 1 P2 A00.20 0 P1 C00.1 1 P3 Z01 1 ..您可能需要解码一个或多个层的变量中的ICD10。一种方法可能是生成一个变量，如dat$糖尿病，级别为0（无疾病）和1（疾病）。一种方法可能是使用grepl。顺便说一下，ICD10代码中糖尿病的常见模式

我正在努力寻找导致某种疾病的ICD10代码。但ICD10具有字母数字分类，例如A00.00。有1000种这样的分类，但我不知道如何在我的回归模型中使用它们。有什么建议吗

资料患者存在ICD10糖尿病（Y） P1 A00.10 1 P2 A00.20 0 P1 C00.1 1 P3 Z01 1

您可能需要解码一个或多个层的变量中的ICD10。一种方法可能是生成一个变量，如dat$糖尿病，级别为0（无疾病）和1（疾病）。一种方法可能是使用grepl。顺便说一下，ICD10代码中糖尿病的常见模式是E08（请检查），而A00是霍乱

dat$diabates <- as.integer(grepl(pattern = "E08", x = dat$ICD10))
###Add to pattern a common pattern in ICD 10 code
as.numeric(as.character(dat$diabetes))->dat$diabetes

dat$diabetes dat$diabetes

如果有多个不同的模式（对每个模式重复此过程），则可以生成新变量并合并它们。例如：

dat$diabetes_final<-0 
dat$diabetes_final[which(dat$diabetes1 ==1 | dat$diabetes2==1)]<-1

dat$diabetes\u final您可能需要解码一个或多个层的变量中的ICD10。一种方法可能是生成一个变量，如dat$糖尿病，级别为0（无疾病）和1（疾病）。一种方法可能是使用grepl。顺便说一下，ICD10代码中糖尿病的常见模式是E08（请检查），而A00是霍乱
dat$diabates <- as.integer(grepl(pattern = "E08", x = dat$ICD10))
###Add to pattern a common pattern in ICD 10 code
as.numeric(as.character(dat$diabetes))->dat$diabetes

dat$diabetes dat$diabetes

如果有多个不同的模式（对每个模式重复此过程），则可以生成新变量并合并它们。
例如：
dat$diabetes_final<-0 
dat$diabetes_final[which(dat$diabetes1 ==1 | dat$diabetes2==1)]<-1

dat$diabetes\u final我建议将“Health”设置为包含诊断的因子变量的参考水平，因为这将为您提供系数，显示在比较健康患者与患有某种疾病的患者时，您的依赖变量如何变化。当然，你可以按照让-克劳德·阿尔伯特的建议对疾病进行分组
这可能看起来像这样：
# your vector with the diagnosis
diagnosis <- c("healthy", "P1 A00.10 1", "P2 A00.20 0", "P1 C00.1 1", "P3 Z01 1")

# grouping your vector. I have no idea about ICD10 groups, so this is only to show how this would work in R
diagnosis[diagnosis %in% c("P1 A00.10 1", "P2 A00.20 0")] <- "diabetes"
diagnosis[diagnosis %in% c("P1 C00.1 1", "P3 Z01 1")] <- "cancer"

# make the vector a factor with healthy as the reference
diagnosis <- factor(diagnosis)
diagnosis <- relevel(diagnosis, ref = "healthy")

# now you can use the variable in a regression
set.seed(1) # making it reproducible
dv <- rnorm(length(diagnosis)) # generating a dependent variable
summary(lm(dv ~ diagnosis)) # linear regression

# the coeficients look like this
...
Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)        -0.6265     0.8126  -0.771    0.521
diagnosiscancer     1.5888     0.9952   1.597    0.251
diagnosisdiabetes   0.3005     0.9952   0.302    0.791
...

#您的诊断向量
诊断我建议将“健康”设置为包含诊断的因子变量的参考水平，因为这将为您提供系数，当您比较健康患者与患有某种疾病的患者时，显示您的依赖变量如何变化。当然，你可以按照让-克劳德·阿尔伯特的建议对疾病进行分组
这可能看起来像这样：
# your vector with the diagnosis
diagnosis <- c("healthy", "P1 A00.10 1", "P2 A00.20 0", "P1 C00.1 1", "P3 Z01 1")

# grouping your vector. I have no idea about ICD10 groups, so this is only to show how this would work in R
diagnosis[diagnosis %in% c("P1 A00.10 1", "P2 A00.20 0")] <- "diabetes"
diagnosis[diagnosis %in% c("P1 C00.1 1", "P3 Z01 1")] <- "cancer"

# make the vector a factor with healthy as the reference
diagnosis <- factor(diagnosis)
diagnosis <- relevel(diagnosis, ref = "healthy")

# now you can use the variable in a regression
set.seed(1) # making it reproducible
dv <- rnorm(length(diagnosis)) # generating a dependent variable
summary(lm(dv ~ diagnosis)) # linear regression

# the coeficients look like this
...
Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)        -0.6265     0.8126  -0.771    0.521
diagnosiscancer     1.5888     0.9952   1.597    0.251
diagnosisdiabetes   0.3005     0.9952   0.302    0.791
...

#您的诊断向量
诊断一种有效的方法是使用共病的概念。我的R软件包为标准化的疾病集，如“糖尿病”、“癌症”、“心脏病”提供了这一功能。有多种共病图可供选择，因此您可以选择一种符合您兴趣的共病图，例如icd中的PCCC图可用于儿科，其他用于成人，并跨越各种疾病状态
例如，如中所述。这些实际上是ICD-9代码，但您可以使用ICD-10
patients <- data.frame(
   visit_id = c(1000, 1000, 1000, 1000, 1001, 1001, 1002),
   icd9 = c("40201", "2258", "7208", "25001", "34400", "4011", "4011"),
   poa = c("Y", NA, "N", "Y", "X", "Y", "E"),
   stringsAsFactors = FALSE
   )
patients

“DM”表示糖尿病，“DMcx”表示伴有并发症的糖尿病，如视网膜病变或肾功能衰竭。这是美国AHRQ对标准Elixhauser类别的修改
当您有疾病状态的二进制标志时，您可以在任何统计或机器学习模型中使用它们。
一种有效的方法是使用共病的概念。我的R软件包为标准化的疾病集，如“糖尿病”、“癌症”、“心脏病”提供了这一功能。有多种共病图可供选择，因此您可以选择一种符合您兴趣的共病图，例如icd中的PCCC图可用于儿科，其他用于成人，并跨越各种疾病状态
例如，如中所述。这些实际上是ICD-9代码，但您可以使用ICD-10
patients <- data.frame(
   visit_id = c(1000, 1000, 1000, 1000, 1001, 1001, 1002),
   icd9 = c("40201", "2258", "7208", "25001", "34400", "4011", "4011"),
   poa = c("Y", NA, "N", "Y", "X", "Y", "E"),
   stringsAsFactors = FALSE
   )
patients

“DM”表示糖尿病，“DMcx”表示伴有并发症的糖尿病，如视网膜病变或肾功能衰竭。这是美国AHRQ对标准Elixhauser类别的修改
当您有疾病状态的二进制标志时，您可以在任何统计或机器学习模型中使用它们。
您好，谢谢您的回复，数据只是我试图提供的一个样本，以供理解。您好，谢谢您的回复，这些数据只是我试图提供的一个样本，以供理解。非常感谢您在方法上的帮助。非常感谢您在方法上的帮助。