Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/71.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在R中的数据帧中存储双精度_R_Dataframe_Double - Fatal编程技术网

在R中的数据帧中存储双精度

在R中的数据帧中存储双精度,r,dataframe,double,R,Dataframe,Double,我有一个数据框,如下所示: fitnorm <- data.frame(dataset=0,mean=0,sd=0,normopl=0) fitnorm 数据帧不应该能够容纳double类型的对象吗?我做错了什么 它可以。但这不是你要做的 排队 ## I've assumed i <- 1 fitnorm[i,1] <- normdat 更新 根据您的评论,您不能在data.frame的单个项目中存储向量,您需要使用以下列表: lst <- list(dataset

我有一个数据框,如下所示:

fitnorm <- data.frame(dataset=0,mean=0,sd=0,normopl=0)
fitnorm
数据帧不应该能够容纳double类型的对象吗?我做错了什么

它可以。但这不是你要做的

排队

## I've assumed i <- 1
fitnorm[i,1] <- normdat
更新

根据您的评论,您不能在data.frame的单个项目中存储向量,您需要使用以下列表:

lst <- list(dataset = normdat,
            mean = mean(normdat),
            sd = sd(normdat),
            normopl = qnorm(1-(400/1000), mean=fitnorm[i,2], sd=fitnorm[i,3]))

## Which gives
lst
$dataset
 [1] 33.43470 28.66693 29.41060 32.95761 32.66531 29.86056 31.61961 29.32424 28.07063 31.80155
[11] 32.88489 31.90562 31.81625 24.62625 31.19141 27.41913 31.43993 29.60108 29.73310 23.77482
[21] 28.50347 27.22960 24.65698 27.13001 35.85981

$mean
[1] 29.82336

$sd
[1] 2.981638

$normopl
[1] 30.57875
按OP编辑

上述代码有效。然而,由于该列表必须是迭代的,所以我做了一点修改

fitnorm <- list(dataset=list(),mean=list(),sd=list(),normopl=list())

for (i in 1:5000){
    normdat <- rnorm(25, mean = 30, sd = sqrt(9))
    fitnorm$dataset[[i]] <- normdat
    fitnorm$mean[[i]]<- mean(normdat)
    fitnorm$sd[[i]] <- sd(normdat)
    fitnorm$normopl[[i]] <- qnorm(1-(400/1000), mean=fitnorm$mean[[i]], sd=fitnorm$sd[[i]])
    }

fitnorm$dataset[1]
[[1]]
 [1] 33.43470 28.66693 29.41060 32.95761 32.66531 29.86056 31.61961 29.32424 28.07063 31.80155
[11] 32.88489 31.90562 31.81625 24.62625 31.19141 27.41913 31.43993 29.60108 29.73310 23.77482
[21] 28.50347 27.22960 24.65698 27.13001 35.85981

fitnorm$mean[1]
[[1]]
[1] 29.82336

fitnorm$sd[1]
[[1]]
[1] 2.981638

fitnorm$normopl[1]
[[1]]
[1] 30.57875
以及一点快速的基准测试:

Unit: milliseconds
           expr      min       lq     mean   median       uq      max neval
   fun_lapply() 220.2830 236.1661 252.7315 249.1904 267.1123 337.0799   100
 fun_for_loop() 373.5972 399.8972 427.1629 421.7407 442.4626 593.7227   100
最终,本例中的收益是微乎其微的,但值得记住

更新-SymbolX 2

如果您愿意使用它们,还可以创建单个
data.frame

这里我使用了
data.table
包来计算它提供的速度

library(data.table)
lst <- lapply(1:5000, function(x){

  normdat <- rnorm(25, mean = 30, sd = sqrt(9))
  data.table(id = x,
             dataset = normdat,
             mean = mean(normdat),
             sd = sd(normdat),
             normopl = qnorm(1-(400/1000), mean=mean(normdat), sd=sd(normdat)))
})

##lst is now a list of data.tables, so we can 'rbind' them together
dt <- rbindlist(lst)

## now we have one data.table, and the 'id' column indicates 
## which dataset each row belongs too
dt
# id  dataset     mean       sd  normopl
# 1:    1 24.09486 29.46829 3.261638 30.29462
# 2:    1 26.30732 29.46829 3.261638 30.29462
# 3:    1 31.42603 29.46829 3.261638 30.29462
# 4:    1 29.69081 29.46829 3.261638 30.29462
# 5:    1 30.01235 29.46829 3.261638 30.29462
# ---                                         
# 124996: 5000 28.13584 30.39716 2.591752 31.05377
# 124997: 5000 27.44665 30.39716 2.591752 31.05377
# 124998: 5000 29.79728 30.39716 2.591752 31.05377
# 124999: 5000 28.73398 30.39716 2.591752 31.05377
# 125000: 5000 27.83779 30.39716 2.591752 31.05377
库(data.table)

lst我试图在一个元素中保存所有25个值。然后你需要使用列表,而不是数据。frameI试图避免:)谢谢。@krthkskmr-我已经更新了我的答案,以显示我的意思。归根结底,
lappy
是一个“迭代”过程,但是,对于
R
@Symbolox中的
循环,我更喜欢它,不,我将接受您的建议并使用列表,这肯定是一种更有效的方法。
fitnorm <- data.frame(dataset = normdat,
                  mean = mean(normdat),
                  sd = sd(normdat),
                  normopl = qnorm(1-(400/1000), mean=fitnorm[i,2], sd=fitnorm[i,3]))

head(fitnorm)
#   dataset     mean       sd  normopl
#1 33.43470 29.82336 2.981638 30.57875
#2 28.66693 29.82336 2.981638 30.57875
#3 29.41060 29.82336 2.981638 30.57875
#4 32.95761 29.82336 2.981638 30.57875
#5 32.66531 29.82336 2.981638 30.57875
#6 29.86056 29.82336 2.981638 30.57875
fitnorm <- list(dataset=list(),mean=list(),sd=list(),normopl=list())

for (i in 1:5000){
    normdat <- rnorm(25, mean = 30, sd = sqrt(9))
    fitnorm$dataset[[i]] <- normdat
    fitnorm$mean[[i]]<- mean(normdat)
    fitnorm$sd[[i]] <- sd(normdat)
    fitnorm$normopl[[i]] <- qnorm(1-(400/1000), mean=fitnorm$mean[[i]], sd=fitnorm$sd[[i]])
    }

fitnorm$dataset[1]
[[1]]
 [1] 33.43470 28.66693 29.41060 32.95761 32.66531 29.86056 31.61961 29.32424 28.07063 31.80155
[11] 32.88489 31.90562 31.81625 24.62625 31.19141 27.41913 31.43993 29.60108 29.73310 23.77482
[21] 28.50347 27.22960 24.65698 27.13001 35.85981

fitnorm$mean[1]
[[1]]
[1] 29.82336

fitnorm$sd[1]
[[1]]
[1] 2.981638

fitnorm$normopl[1]
[[1]]
[1] 30.57875
lst <- lapply(1:5000, function(x){

    normdat <- rnorm(25, mean = 30, sd = sqrt(9))

    list(fitnorm = list(dataset = normdat,
                        mean = mean(normdat),
                        sd = sd(normdat),
                        normopl = qnorm(1-(400/1000), mean = mean(normdat), sd = sd(normdat))
    ))
  }) 
Unit: milliseconds
           expr      min       lq     mean   median       uq      max neval
   fun_lapply() 220.2830 236.1661 252.7315 249.1904 267.1123 337.0799   100
 fun_for_loop() 373.5972 399.8972 427.1629 421.7407 442.4626 593.7227   100
library(data.table)
lst <- lapply(1:5000, function(x){

  normdat <- rnorm(25, mean = 30, sd = sqrt(9))
  data.table(id = x,
             dataset = normdat,
             mean = mean(normdat),
             sd = sd(normdat),
             normopl = qnorm(1-(400/1000), mean=mean(normdat), sd=sd(normdat)))
})

##lst is now a list of data.tables, so we can 'rbind' them together
dt <- rbindlist(lst)

## now we have one data.table, and the 'id' column indicates 
## which dataset each row belongs too
dt
# id  dataset     mean       sd  normopl
# 1:    1 24.09486 29.46829 3.261638 30.29462
# 2:    1 26.30732 29.46829 3.261638 30.29462
# 3:    1 31.42603 29.46829 3.261638 30.29462
# 4:    1 29.69081 29.46829 3.261638 30.29462
# 5:    1 30.01235 29.46829 3.261638 30.29462
# ---                                         
# 124996: 5000 28.13584 30.39716 2.591752 31.05377
# 124997: 5000 27.44665 30.39716 2.591752 31.05377
# 124998: 5000 29.79728 30.39716 2.591752 31.05377
# 124999: 5000 28.73398 30.39716 2.591752 31.05377
# 125000: 5000 27.83779 30.39716 2.591752 31.05377