Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/67.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R合并函数无法找到数据帧之间的共享匹配_R_Dataframe_Merge - Fatal编程技术网

R合并函数无法找到数据帧之间的共享匹配

R合并函数无法找到数据帧之间的共享匹配,r,dataframe,merge,R,Dataframe,Merge,嗨,我有以下两个数据帧: # dataframe 1 --> clst1_trimmed > head(clst1_trimmed) # A tibble: 6 x 2 GeneName Clst.1 <fct> <dbl> 1 Cd74 1.20 2 Lyz2 1.02 3 Malat1 0.196 4 Ftl1 0.577 5 H2-Ab1 1.04 6 B2m 0.639`

嗨,我有以下两个数据帧:

# dataframe 1 --> clst1_trimmed

> head(clst1_trimmed)
# A tibble: 6 x 2
  GeneName Clst.1
  <fct>     <dbl>
1 Cd74      1.20 
2 Lyz2      1.02 
3 Malat1    0.196
4 Ftl1      0.577
5 H2-Ab1    1.04 
6 B2m       0.639`

# dataframe2 --> immgen_trimmed
> head(immgen_trimmed)
# A tibble: 6 x 6
  ProbeSetID GeneName Description                                      Cell.A Cell.B Cell.C
       <int> <fct>    <fct>                                             <dbl>  <dbl>  <dbl>
1   10344620 Cd74     " predicted gene 10568"                            15.6   15.3   17.2
2   10344622 Cd74     " predicted gene 10568"                           240.   255.   224. 
3   10344624 Lyz2     " lysophospholipase 1"                            421.   474.   349. 
4   10344633 Malat1   " transcription elongation factor A (SII) 1"      802.   950.   864. 
5   10344637 Flt1     " ATPase H+ transporting lysosomal V1 subunit H"  199.   262.   167. 
6   10344653 Cd3e     " opioid receptor kappa 1"                         14.8   12.8   18.0
但是,使用相同方法合并两个大数据帧失败:

> dim(sel_clst)
[1] 984   2
> dim(immgen_log2)
[1] 24922   212

merge2 <- merge(sel_clst, immgen_log2)
  str(merged2)
'data.frame':   0 obs. of  213 variables:
 $ GeneName                      : Factor w/ 984 levels "0610012G03Rik",..: 
 $ Cluster.1.Log2.Fold.Change    : num 
 $ ProbeSetID                    : int 
 $ Description                   : Factor w/ 21246 levels " "," 1-acylglycerol-3-phosphate O-acyltransferase 1 (lysophosphatidic acid acyltransferase alpha)",..: 
 $ X.proB_CLP_BM.                : num 
 $ X.proB_CLP_FL.                : num 
 $ X.proB_FrA_BM.                : num 
知道为什么会失败吗?

尝试一下(在备份这些数据帧之后):

有一个
选项
-参数可以通过
default.stringsAsFactors()
访问,它可以避免新手对因子创建的许多困惑,但是没有可以为
strip.white
调整的默认设置

查看此成绩单:

> dat <- read.csv(text= "hd1 , hd2, hd3\n 1, a ,   c\n1,b,d\n")
> dat
  hd1 hd2  hd3
1   1  a     c
2   1   b    d
> dput(dat)
structure(list(hd1 = c(1L, 1L), hd2 = structure(1:2, .Label = c(" a ", 
"b"), class = "factor"), hd3 = structure(1:2, .Label = c("   c", 
"d"), class = "factor")), .Names = c("hd1", "hd2", "hd3"), class = "data.frame", row.names = c(NA, 
-2L))
> dat <- data.frame(
             lapply(read.csv(text= "hd1 , hd2, hd3\n 1, a ,   c\n1,b,d\n"), 
                    trimws)
                    )
# could also have used a two step process starting with the original `dat` 
# dat[] <- lapply(dat, trimws)   .... the `[]` preserves structure

> dat
  hd1 hd2 hd3
1   1   a   c
2   1   b   d
> dput(dat)
structure(list(hd1 = structure(c(1L, 1L), .Label = "1", class = "factor"), 
    hd2 = structure(1:2, .Label = c("a", "b"), class = "factor"), 
    hd3 = structure(1:2, .Label = c("c", "d"), class = "factor")), .Names = c("hd1", 
"hd2", "hd3"), row.names = c(NA, -2L), class = "data.frame")
>dat-dat
hd1 hd2 hd3
1 a c
2 1 b d
>dput(dat)
结构(列表(hd1=c(1L,1L),hd2=structure(1:2,Label=c(“a”),
“b”),class=“factor”),hd3=结构(1:2,.Label=c(“c”,
“d”),class=“factor”),.Names=c(“hd1”、“hd2”、“hd3”),class=“data.frame”,row.Names=c(NA,
-2L)
>dat dput(dat)
结构(列表(hd1=结构(c(1L,1L),.Label=“1”,class=“factor”),
hd2=结构(1:2,.Label=c(“a”,“b”),class=“factor”),
hd3=结构(1:2,.Label=c(“c”,“d”),class=“factor”),.Names=c(“hd1”,
“hd2”,“hd3”,row.names=c(NA,-2L),class=“data.frame”)

您是否注意到变量值中的前导空格?无论是
“Cd74”
还是
“Cd74”
都不匹配
“Cd74”
。我有一个名为
trim
的函数,用于删除前导空格和尾随空格。我建议首先强制所有关键列为“character”,然后在重新尝试匹配之前修剪您的值。也许还可以查看上游修复的数据导入命令。或者使用
levels(df$var)@Moody\u Mudskipper:我总是对使用
levelsYep有点怀疑,空间就是问题所在。我没听懂。我想随着这种认识,这个问题已经过时了你能在这里把
trimws
作为参数传递给我的代码吗:
immgen\u-dat也许:
immgen\u-dat你能用几句话解释一下为什么你需要把数据框
构造成不兼容的
这样才能工作吗?
lapply
jsut返回一个没有任何其他类属性的列表。使用
data.frame
as_tible
恢复“data.frame”类属性。我建议你调查一下fread。它更快更安全。您可以在
read.table
中使用参数
strip.white=TRUE
,我也可以在
read.csv
中使用。
> "Cd74" %in% sel_clst$GeneName
[1] TRUE
> "Cd74" %in% immgen_log2$GeneName
[1] FALSE
levels(sel_clst$GeneName) <- trimws( levels( sel_clst$GeneName ))
levels(immgen_log2$GeneName) <- trimws( levels( immgen_log2$GeneName ))
merge2 <- merge(sel_clst, immgen_log2)
read.csv <- 
       function ( ...){ utils::read.csv(..., strip.white=TRUE) }
> dat <- read.csv(text= "hd1 , hd2, hd3\n 1, a ,   c\n1,b,d\n")
> dat
  hd1 hd2  hd3
1   1  a     c
2   1   b    d
> dput(dat)
structure(list(hd1 = c(1L, 1L), hd2 = structure(1:2, .Label = c(" a ", 
"b"), class = "factor"), hd3 = structure(1:2, .Label = c("   c", 
"d"), class = "factor")), .Names = c("hd1", "hd2", "hd3"), class = "data.frame", row.names = c(NA, 
-2L))
> dat <- data.frame(
             lapply(read.csv(text= "hd1 , hd2, hd3\n 1, a ,   c\n1,b,d\n"), 
                    trimws)
                    )
# could also have used a two step process starting with the original `dat` 
# dat[] <- lapply(dat, trimws)   .... the `[]` preserves structure

> dat
  hd1 hd2 hd3
1   1   a   c
2   1   b   d
> dput(dat)
structure(list(hd1 = structure(c(1L, 1L), .Label = "1", class = "factor"), 
    hd2 = structure(1:2, .Label = c("a", "b"), class = "factor"), 
    hd3 = structure(1:2, .Label = c("c", "d"), class = "factor")), .Names = c("hd1", 
"hd2", "hd3"), row.names = c(NA, -2L), class = "data.frame")