Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/elixir/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 如何制作一个很好的可复制的例子_R_R Faq - Fatal编程技术网

R 如何制作一个很好的可复制的例子

R 如何制作一个很好的可复制的例子,r,r-faq,R,R Faq,当与同事讨论性能、教学、发送错误报告或在邮件列表和StackOverflow上搜索指导时,经常会问a,并且总是很有帮助的 关于创建一个优秀的例子,你有什么建议?如何以文本格式从中粘贴数据结构?您还应该包括哪些其他信息 除了使用dput()、dump()或structure(),还有其他技巧吗?什么时候应该包含library()或require()语句?除了c、df、data等,应该避免哪些保留字 如何制作一个好的可重复的例子?这里有一个好例子 最重要的一点是:只需确保编写一小段代码,我们就可以运

当与同事讨论性能、教学、发送错误报告或在邮件列表和StackOverflow上搜索指导时,经常会问a,并且总是很有帮助的

关于创建一个优秀的例子,你有什么建议?如何以文本格式从中粘贴数据结构?您还应该包括哪些其他信息

除了使用
dput()
dump()
structure()
,还有其他技巧吗?什么时候应该包含
library()
require()
语句?除了
c
df
data
等,应该避免哪些保留字

如何制作一个好的可重复的例子?

这里有一个好例子

最重要的一点是:只需确保编写一小段代码,我们就可以运行它来查看问题所在。一个有用的函数是
dput()
,但是如果您有非常大的数据,您可能需要制作一个小样本数据集,或者只使用前10行左右

编辑:

还要确保你自己确定了问题所在。该示例不应是带有“第200行有错误”的整个R脚本。如果您使用R(我喜欢
browser()
)和Google中的调试工具,您应该能够真正确定问题所在,并重现一个相同问题出现错误的小例子。

我个人更喜欢“一”行程序。大致如下:

my.df <- data.frame(col1 = sample(c(1,2), 10, replace = TRUE),
        col2 = as.factor(sample(10)), col3 = letters[1:10],
        col4 = sample(c(TRUE, FALSE), 10, replace = TRUE))
my.list <- list(list1 = my.df, list2 = my.df[3], list3 = letters)
别忘了提及您可能正在使用的任何特殊软件包

如果你想在更大的物体上演示一些东西,你可以试试

my.df2 <- data.frame(a = sample(10e6), b = sample(letters, 10e6, replace = TRUE))
如果需要在
sp
中实现某些空间对象,可以通过“空间”包中的外部文件(如ESRI shapefile)获取一些数据集(请参见任务视图中的空间视图)

库(rgdal)
奥格德里弗斯()
dsn基本上,a应该使其他人能够在他们的机器上准确地再现您的问题

MRE由以下项目组成:

  • 演示问题所需的最小数据集
  • 再现错误所需的最小可运行代码,可在给定数据集上运行
  • 所用软件包、R版本及其运行操作系统的所有必要信息
  • 在随机过程的情况下,一个种子(由
    set.seed()
    设置)用于再现性
有关良好MRE的示例,请参阅所用函数帮助文件底部的“示例”部分。只需在R控制台中键入例如
help(mean)
,或short
?mean

提供最小数据集 通常,共享庞大的数据集是没有必要的,而且可能会阻碍其他人阅读您的问题。因此,最好使用内置数据集或创建一个类似于原始数据的小“玩具”示例,这实际上就是最小值的含义。如果出于某种原因,您确实需要共享原始数据,那么您应该使用一种方法,例如
dput()
,允许其他人获得您数据的精确副本

内置数据集 您可以使用其中一个内置数据集。可以通过
data()
查看内置数据集的全面列表。每个数据集都有一个简短的描述,并且可以获得更多信息,例如,对于R随附的“iris”数据集,可以使用
?iris
。安装的软件包可能包含其他数据集

创建示例数据集 初步说明:有时您可能需要特殊格式(即类),例如因子、日期或时间序列。对于这些,请使用如下函数:
as.factor
as.Date
as.xts
。。。例如:

向量

x <- rnorm(10)  ## random vector normal distributed
x <- runif(10)  ## random vector uniformly distributed    
x <- sample(1:100, 10)  ## 10 random draws out of 1, 2, ..., 100    
x <- sample(LETTERS, 10)  ## 10 random draws out of built-in latin alphabet
m <- matrix(1:12, 3, 4, dimnames=list(LETTERS[1:3], LETTERS[1:4]))
m
#   A B C  D
# A 1 4 7 10
# B 2 5 8 11
# C 3 6 9 12
set.seed(42)  ## for sake of reproducibility
n <- 6
dat <- data.frame(id=1:n, 
                  date=seq.Date(as.Date("2020-12-26"), as.Date("2020-12-31"), "day"),
                  group=rep(LETTERS[1:2], n/2),
                  age=sample(18:30, n, replace=TRUE),
                  type=factor(paste("type", 1:n)),
                  x=rnorm(n))
dat
#   id       date group age   type         x
# 1  1 2020-12-26     A  27 type 1 0.0356312
# 2  2 2020-12-27     B  19 type 2 1.3149588
# 3  3 2020-12-28     A  20 type 3 0.9781675
# 4  4 2020-12-29     B  26 type 4 0.8817912
# 5  5 2020-12-30     A  26 type 5 0.4822047
# 6  6 2020-12-31     B  28 type 6 0.9657529
将数据子集化

要共享子集,请使用
head()
subset()
或索引
iris[1:4,]
。然后将其包装到
dput()
中,以便为其他人提供可以立即放入R中的内容。范例

要在您的问题中共享的控制台输出:

使用
dput
时,您可能还希望只包括相关列,例如dput(mtcars[1:3,c(2,5,6)])

注意:如果您的数据框中有一个具有多个级别的因子,那么
dput
输出可能会很麻烦,因为它仍然会列出所有可能的因子级别,即使它们不在数据子集中。要解决此问题,可以使用
droplevels()
函数。请注意,物种是一个只有一个级别的因子,例如,
dput(液滴级别(iris[1:4,])
dput
的另一个警告是,它不适用于键控的
数据表
对象或来自
tidyverse
的分组
tbl_-df
(class
grouped_-df
)。在这些情况下,您可以在共享之前转换回常规数据帧,
dput(as.data.frame(my_data))

生成最小代码 结合最少的数据(见上文),您的代码应该通过简单的复制和粘贴在另一台机器上准确地再现问题

这应该是容易的部分,但通常不是。你不应该做的事情:

  • 显示各种数据转换;确保提供的数据格式正确(当然,除非这是问题所在)
  • 复制粘贴在某个地方出现错误的整个脚本。请尝试查找导致错误的行。通常情况下,你会发现问题出在你自己身上
你应该做什么:

  • 添加您使用的软件包(使用
    library()
  • 在新的R会话中测试运行代码,以确保代码可运行。人们应该能够在控制台中复制粘贴您的数据和代码,并获得与您相同的结果
  • 如果你打开连接
    d <- as.Date("2020-12-30")
    
    class(d)
    # [1] "Date"
    
    x <- rnorm(10)  ## random vector normal distributed
    x <- runif(10)  ## random vector uniformly distributed    
    x <- sample(1:100, 10)  ## 10 random draws out of 1, 2, ..., 100    
    x <- sample(LETTERS, 10)  ## 10 random draws out of built-in latin alphabet
    
    m <- matrix(1:12, 3, 4, dimnames=list(LETTERS[1:3], LETTERS[1:4]))
    m
    #   A B C  D
    # A 1 4 7 10
    # B 2 5 8 11
    # C 3 6 9 12
    
    set.seed(42)  ## for sake of reproducibility
    n <- 6
    dat <- data.frame(id=1:n, 
                      date=seq.Date(as.Date("2020-12-26"), as.Date("2020-12-31"), "day"),
                      group=rep(LETTERS[1:2], n/2),
                      age=sample(18:30, n, replace=TRUE),
                      type=factor(paste("type", 1:n)),
                      x=rnorm(n))
    dat
    #   id       date group age   type         x
    # 1  1 2020-12-26     A  27 type 1 0.0356312
    # 2  2 2020-12-27     B  19 type 2 1.3149588
    # 3  3 2020-12-28     A  20 type 3 0.9781675
    # 4  4 2020-12-29     B  26 type 4 0.8817912
    # 5  5 2020-12-30     A  26 type 5 0.4822047
    # 6  6 2020-12-31     B  28 type 6 0.9657529
    
      id       date group age   type         x
    1  1 2020-12-26     A  27 type 1 0.0356312
    2  2 2020-12-27     B  19 type 2 1.3149588
    3  3 2020-12-28     A  20 type 3 0.9781675
    
    dput(iris[1:4, ]) # first four rows of the iris data set
    
    structure(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6), Sepal.Width = c(3.5, 
    3, 3.2, 3.1), Petal.Length = c(1.4, 1.4, 1.3, 1.5), Petal.Width = c(0.2, 
    0.2, 0.2, 0.2), Species = structure(c(1L, 1L, 1L, 1L), .Label = c("setosa", 
    "versicolor", "virginica"), class = "factor")), row.names = c(NA, 
    4L), class = "data.frame")
    
    set.seed(42)
    rnorm(3)
    # [1]  1.3709584 -0.5646982  0.3631284
    
    set.seed(42)
    rnorm(3)
    # [1]  1.3709584 -0.5646982  0.3631284
    
      > x <- matrix(1:8, nrow=4, ncol=2,
                    dimnames=list(c("A","B","C","D"), c("x","y"))
      > x
        x y
      A 1 5
      B 2 6
      C 3 7
      D 4 8
      >
    
      > x.df
         row col value
      1    A   x      1
    
      > x.df <- reshape(data.frame(row=rownames(x), x), direction="long",
                        varying=list(colnames(x)), times=colnames(x),
                        v.names="value", timevar="col", idvar="row")
    
     df <- read.table(header=TRUE, 
      text="Sepal.Length Sepal.Width Petal.Length Petal.Width Species
    1          5.1         3.5          1.4         0.2  setosa
    2          4.9         3.0          1.4         0.2  setosa
    3          4.7         3.2          1.3         0.2  setosa
    4          4.6         3.1          1.5         0.2  setosa
    5          5.0         3.6          1.4         0.2  setosa
    6          5.4         3.9          1.7         0.4  setosa
    ") 
    
    code
    code
    code
    code
    code (40 or so lines of it)
    
    data(mtcars)
    
    names(mtcars)
    your problem demostrated on the mtcars data set
    
    dput(read.table("clipboard",sep="\t",header=TRUE))
    
    dput(read.table("clipboard",sep="",header=TRUE))
    
    install.packages("devtools")
    library(devtools)
    source_url("https://raw.github.com/rsaporta/pubR/gitbranch/reproduce.R")
    
    reproduce(myData)
    
    # sample data
    DF <- data.frame(id=rep(LETTERS, each=4)[1:100], replicate(100, sample(1001, 100)), Class=sample(c("Yes", "No"), 100, TRUE))
    
    reproduce(DF, cols=c("id", "X1", "X73", "Class"))  # I could also specify the column number. 
    
    This is what the sample looks like: 
    
        id  X1 X73 Class
    1    A 266 960   Yes
    2    A 373 315    No            Notice the selection split 
    3    A 573 208    No           (which can be turned off)
    4    A 907 850   Yes
    5    B 202  46   Yes         
    6    B 895 969   Yes   <~~~ 70 % of selection is from the top rows
    7    B 940 928    No
    98   Y 371 171   Yes          
    99   Y 733 364   Yes   <~~~ 30 % of selection is from the bottom rows.  
    100  Y 546 641    No        
    
    
        ==X==============================================================X==
             Copy+Paste this part. (If on a Mac, it is already copied!)
        ==X==============================================================X==
    
     DF <- structure(list(id = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 25L, 25L, 25L), .Label = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y"), class = "factor"), X1 = c(266L, 373L, 573L, 907L, 202L, 895L, 940L, 371L, 733L, 546L), X73 = c(960L, 315L, 208L, 850L, 46L, 969L, 928L, 171L, 364L, 641L), Class = structure(c(2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L), .Label = c("No", "Yes"), class = "factor")), .Names = c("id", "X1", "X73", "Class"), class = "data.frame", row.names = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 98L, 99L, 100L)) 
    
        ==X==============================================================X==
    
        ==X==============================================================X==
             Copy+Paste this part. (If on a Mac, it is already copied!)
        ==X==============================================================X==
    
     DF <- structure(list(id = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 25L,25L, 25L), .Label
          = c("A", "B", "C", "D", "E", "F", "G", "H","I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U","V", "W", "X", "Y"), class = "factor"),
          X1 = c(809L, 81L, 862L,747L, 224L, 721L, 310L, 53L, 853L, 642L),
          X2 = c(926L, 409L,825L, 702L, 803L, 63L, 319L, 941L, 598L, 830L),
          X16 = c(447L,164L, 8L, 775L, 471L, 196L, 30L, 420L, 47L, 327L),
          X22 = c(335L,164L, 503L, 407L, 662L, 139L, 111L, 721L, 340L, 178L)), .Names = c("id","X1",
          "X2", "X16", "X22"), class = "data.frame", row.names = c(1L,2L, 3L, 4L, 5L, 6L, 7L, 98L, 99L, 100L))
    
        ==X==============================================================X==
    
    d <- read.table("http://pastebin.com/raw.php?i=m1ZJuKLH")
    
    mydata <- data.frame(a=character(0), b=numeric(0),  c=numeric(0), d=numeric(0))
    
    >fix(mydata)
    
    install.packages("SciencesPo")
    
    dt <- data.frame(
        Z = sample(LETTERS,10),
        X = sample(1:10),
        Y = sample(c("yes", "no"), 10, replace = TRUE)
    )
    
    > dt
       Z  X   Y
    1  D  8  no
    2  T  1 yes
    3  J  7  no
    4  K  6  no
    5  U  2  no
    6  A 10 yes
    7  Y  5  no
    8  M  9 yes
    9  X  4 yes
    10 Z  3  no
    
    > anonymize(dt)
         Z    X  Y
    1   b2  2.5 c1
    2   b6 -4.5 c2
    3   b3  1.5 c1
    4   b4  0.5 c1
    5   b7 -3.5 c1
    6   b1  4.5 c2
    7   b9 -0.5 c1
    8   b5  3.5 c2
    9   b8 -1.5 c2
    10 b10 -2.5 c1
    
        # sample two variables without replacement
    > anonymize(sample.df(dt,5,vars=c("Y","X")))
       Y    X
    1 a1 -0.4
    2 a1  0.6
    3 a2 -2.4
    4 a1 -1.4
    5 a2  3.6
    
    dput(droplevels(head(mydata)))
    
    set.seed(1)  # important to make random data reproducible
    myData <- data.frame(a=sample(letters[1:5], 20, rep=T), b=runif(20))
    
       cyl   mean.hp
    1:   6 122.28571
    2:   4  82.63636
    3:   8 209.21429
    
    if (!require("pacman")) install.packages("pacman")
    pacman::p_load_gh("trinker/wakefield")
    
    r_data_frame(
        n = 500,
        id,
        race,
        age,
        sex,
        hour,
        iq,
        height,
        died
    )
    
        ID  Race Age    Sex     Hour  IQ Height  Died
    1  001 White  33   Male 00:00:00 104     74  TRUE
    2  002 White  24   Male 00:00:00  78     69 FALSE
    3  003 Asian  34 Female 00:00:00 113     66  TRUE
    4  004 White  22   Male 00:00:00 124     73  TRUE
    5  005 White  25 Female 00:00:00  95     72  TRUE
    6  006 White  26 Female 00:00:00 104     69  TRUE
    7  007 Black  30 Female 00:00:00 111     71 FALSE
    8  008 Black  29 Female 00:00:00 100     64  TRUE
    9  009 Asian  25   Male 00:30:00 106     70 FALSE
    10 010 White  27   Male 00:30:00 121     68 FALSE
    .. ...   ... ...    ...      ... ...    ...   ...
    
    mydf1<- matrix(rnorm(20),nrow=20,ncol=5)
    
    class(mydf1)
    # this shows the type of the data you have 
    dim(mydf1)
    # this shows the dimension of your data
    
    #found based on the following 
    typeof(mydf1), what it is.
    length(mydf1), how many elements it contains.
    attributes(mydf1), additional arbitrary metadata.
    
    #If you cannot share your original data, you can str it and give an idea about the structure of your data
    head(str(mydf1))
    
    If I have a matrix x as follows:
    > x <- matrix(1:8, nrow=4, ncol=2,
                dimnames=list(c("A","B","C","D"), c("x","y")))
    > x
      x y
    A 1 5
    B 2 6
    C 3 7
    D 4 8
    >
    
    How can I turn it into a dataframe with 8 rows, and three
    columns named `row`, `col`, and `value`, which have the
    dimension names as the values of `row` and `col`, like this:
    > x.df
        row col value
    1    A   x      1
    ...
    (To which the answer might be:
    > x.df <- reshape(data.frame(row=rownames(x), x), direction="long",
    +                varying=list(colnames(x)), times=colnames(x),
    +                v.names="value", timevar="col", idvar="row")
    )
    
    #If I have a matrix x as follows:
    x <- matrix(1:8, nrow=4, ncol=2,
                dimnames=list(c("A","B","C","D"), c("x","y")))
    x
    #  x y
    #A 1 5
    #B 2 6
    #C 3 7
    #D 4 8
    
    # How can I turn it into a dataframe with 8 rows, and three
    # columns named `row`, `col`, and `value`, which have the
    # dimension names as the values of `row` and `col`, like this:
    
    #x.df
    #    row col value
    #1    A   x      1
    #...
    #To which the answer might be:
    
    x.df <- reshape(data.frame(row=rownames(x), x), direction="long",
                    varying=list(colnames(x)), times=colnames(x),
                    v.names="value", timevar="col", idvar="row")
    
    library(testthat)
    # code defining x and y
    if (y >= 10) {
        expect_equal(x, 1.23)
    } else {
        expect_equal(x, 3.21)
    }
    
    library(reprex)
    y <- 1:4
    mean(y)
    reprex()