Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/72.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 对于决策树使用C50模型,我得到了一个错误_R_Decision Tree - Fatal编程技术网

R 对于决策树使用C50模型,我得到了一个错误

R 对于决策树使用C50模型,我得到了一个错误,r,decision-tree,R,Decision Tree,我的数据集有NA值 > str(ICU) tibble [32,992 x 11] (S3: tbl_df/tbl/data.frame) $ Date-of-error : Factor w/ 91 levels "01/01/2000","01/02/2000",..: 3 3 3 3 2 2 1 2 2 6 ... $ Time-of-error : Factor w/ 484 lev

我的数据集有NA值

> str(ICU)
tibble [32,992 x 11] (S3: tbl_df/tbl/data.frame)
 $ Date-of-error               : Factor w/ 91 levels "01/01/2000","01/02/2000",..: 3 3 3 3 2 2 1 2 2 6 ...
 $ Time-of-error               : Factor w/ 484 levels "00:00","00:01",..: 1 1 264 264 396 396 336 220 220 82 ...
 $ Day-of-week                 : Factor w/ 7 levels "Friday","Monday",..: 2 2 2 2 4 4 3 4 4 5 ...
 $ Type-of-error               : Factor w/ 697 levels "(infused too fast)",..: 425 425 425 425 596 596 NA 425 425 596 ...
 $ Cause-of-error              : Factor w/ 569 levels "(physician unsure of doses/freq of these prior to admission)",..: 59 88 59 88 413 413 100 413 446 413 ...
 $ Contributing-factor         : Factor w/ 388 levels "'Tired'","1sou",..: NA NA NA NA NA NA NA NA NA NA ...
 $ Location-of-error           : Factor w/ 29 levels "Cardiac Catheterization Laboratory",..: 16 16 16 16 7 7 23 16 16 7 ...
 $ Description-of-error        : Factor w/ 7733 levels "\r\r\nfolic Acid 1 mg ordered for 15 days.  Pharmacy failed to put stop date on MARS..Also missed by 11-7 MAR c"| __truncated__,..: 366 366 355 355 6211 6211 2988 3980 3980 1869 ...
 $ Medication-process-node     : Factor w/ 6 levels "Administering",..: 6 6 3 3 1 1 2 3 3 1 ...
 $ Staff-type-initiated-error  : Factor w/ 133 levels "(unknown)","1 South",..: 86 86 86 86 61 61 81 61 61 61 ...
 $ Staff-type-perpetuated-error: Factor w/ 79 levels "& LVN","(question whether order reached pharmacy)",..: NA NA NA NA NA NA NA 79 79 NA ...
我首先使用set seed分割数据集

> set.seed(12345)
> ICUrand <- ICU[order(runif(10000)),]
> ICUtrain <- ICUrand[1:7000,]
> ICUtest <- ICUrand[7001:10000,]
我在C50上加载了

library(C50)
然后我得到了这个错误

> icumodel <- C5.0(ICUtrain[-11],ICUtrain$`Type-of-error`)
c50 code called exit with value 1
> icumodel

Call:
C5.0.default(x = ICUtrain[-11], y = ICUtrain$`Type-of-error`)

Classification Tree
Number of samples: 7000 
Number of predictors: 10 

Tree size: 0 

Non-standard options: attempt to group attributes
>icumodel icumodel
电话:
C5.0.默认值(x=ICUtrain[-11],y=ICUtrain$`Type of error`)
分类树
样本数目:7000
预测数:10
树大小:0
非标准选项:尝试对属性进行分组

我做错了什么?决策树是否与包含NA值的列一起工作?

您是否将
ICUtrain$
错误类型`包括在预测值中并将其用作目标?也许可以尝试
C5.0(ICUtrain[c(-4,-11)],ICUtrain$`Type of error`)
是的,我想让模型预测是哪些员工引发了这类错误。我已经尝试了代码,但仍然得到了相同的“c50代码,名为exit with value 1”消息。是NA值阻止了这样做吗?我应该将NA值重命名为missing吗?我从未使用过此软件包,但是
?C5.0.default
帮助页面说它使用了
NA.action
参数,该参数“一个函数,指示当数据包含NA时应该发生什么。默认值是包含缺少的值,因为模型可以容纳它们。”因此,帮助页面上说缺少值不是问题。我确实想知道您的时间和日期变量都被编码为因子——看起来您可能想使用9:00接近9:30,但远离1:30的信息,这需要时间作为数字。同样,对于date,我不确定理论上你期望添加的日期是什么,而不是一周中的某一天。但我不知道这是否与你的问题有关。不过,隔离问题的一个好方法是,看看是否可以拟合一个更简单的模型,然后添加到模型中,直到出现问题为止。这也可能与不存在的因子水平有关。您的培训数据有7000行,但
错误描述
有7733个唯一级别,因此显然有可能存在培训数据中不存在的错误描述值。忽略该列,或者将其分为更合理的级别,可能是一个好主意。
> icumodel <- C5.0(ICUtrain[-11],ICUtrain$`Type-of-error`)
c50 code called exit with value 1
> icumodel

Call:
C5.0.default(x = ICUtrain[-11], y = ICUtrain$`Type-of-error`)

Classification Tree
Number of samples: 7000 
Number of predictors: 10 

Tree size: 0 

Non-standard options: attempt to group attributes