在R中使用相同名称收集长度不等的列_R

在R中使用相同名称收集长度不等的列

在R中使用相同名称收集长度不等的列,r,R,我想把实验数据从数据框收集到列中。数据格式如下：我想按照下面给出的格式排列数据在R/RStudio中有没有简单的方法可以做到这一点？我尝试了tidyr、rbind和cbind，正如在不同的示例中所建议的那样。但我无法做广告，发现这些都不太相关。如果有人能帮助我理解，那就太好了谢谢我创建了一个类似的数据集，并基于绑定应用了以下代码，因为您在问题中提到了它，它听起来可能很冗长，但它可以让您获得所需的输出： library(dplyr) df <- tibble( runs =

我想把实验数据从数据框收集到列中。数据格式如下：

我想按照下面给出的格式排列数据

在R/RStudio中有没有简单的方法可以做到这一点？我尝试了tidyr、rbind和cbind，正如在不同的示例中所建议的那样。但我无法做广告，发现这些都不太相关。如果有人能帮助我理解，那就太好了

谢谢

我创建了一个类似的数据集，并基于绑定应用了以下代码，因为您在问题中提到了它，它听起来可能很冗长，但它可以让您获得所需的输出：

library(dplyr)

df <- tibble(
  runs = c(1, 2, 3, 4),
  col1 = c(3, 4, 5, 5),
  col2 = c(5, 3, 1, 4), 
  col3 = c(6, 4, 9, 2),
  col1 = c(0, 2, 2, 1),
  col2 = c(2, 3, 1, 7), 
  col3 = c(2, 4, 9, 9),
  col1 = c(3, 4, 5, 7),
  col2 = c(3, 3, 1, 4), 
  col3 = c(3, 2, NA, NA), .name_repair = "minimal")

df %>%
  select(2:4) %>% 
  bind_rows(df %>%
              select(5:7)) %>%
  bind_rows(df %>% 
              select(8:10)) %>%
  select(run, col1:col3)

基本R代码由@编写，对此我非常感谢。

使用

数据。表函数melt
与模式在measure.vars
中，您可以尝试以下操作：
library(data.table)
#Some data in the same format as you have
df <- read.table(text="
1 0.8525099 0.5598105 0.4242143 0 0.06016425 0.678719492 0.4852765  0.4970301 0.1657070
2 0.1237982 0.2853534 0.8281460 0.42586728 0.31214568 0.647306659 0.5445816  0.4250520 0.9975251
3 0.4907858 0.4925835 0.6689135 0.06042183 0.47391134 0.002571686 0.5267215  0.4291427 NA
4 0.8524778 0.1091856 0.6529887 0.24606198 0.44869099 0.540201766 0.6263992  0.1448730 NA
")
#assigning names to columns
colnames(df) <- c("Run", rep(c("Logging", "Salinity","surface"),3))
setDT(df) #converting df into a data.table

df #Similar as your initial data frame
   Run   Logging  Salinity   surface    Logging   Salinity     surface   Logging  Salinity   surface
1:   1 0.8525099 0.5598105 0.4242143 0.00000000 0.06016425 0.678719492 0.4852765 0.4970301 0.1657070
2:   2 0.1237982 0.2853534 0.8281460 0.42586728 0.31214568 0.647306659 0.5445816 0.4250520 0.9975251
3:   3 0.4907858 0.4925835 0.6689135 0.06042183 0.47391134 0.002571686 0.5267215 0.4291427        NA
4:   4 0.8524778 0.1091856 0.6529887 0.24606198 0.44869099 0.540201766 0.6263992 0.1448730        NA

df2 <- melt(df, #melting data, converting from wide to long
        id.vars = 1, # here we attempt to fix the first column "Runs"
        measure.vars = patterns(Logging="Logging", # here we look up for a pattern of column names to convert into measure
                                Salinity= "Salinity",
                                surface="surface")
        )


#Output
df2
    Run variable    Logging   Salinity     surface
 1:   1        1 0.85250990 0.55981050 0.424214300
 2:   2        1 0.12379820 0.28535340 0.828146000
 3:   3        1 0.49078580 0.49258350 0.668913500
 4:   4        1 0.85247780 0.10918560 0.652988700
 5:   1        2 0.00000000 0.06016425 0.678719492
 6:   2        2 0.42586728 0.31214568 0.647306659
 7:   3        2 0.06042183 0.47391134 0.002571686
 8:   4        2 0.24606198 0.44869099 0.540201766
 9:   1        3 0.48527650 0.49703010 0.165707000
10:   2        3 0.54458160 0.42505200 0.997525100
11:   3        3 0.52672150 0.42914270          NA
12:   4        3 0.62639920 0.14487300          NA

您可以再使用tidyr:：pivot\u
。使用@Chriss-Paul的数据
tidyr::pivot_longer(df, cols = -Run, names_to = '.value')

#     Run Logging Salinity  surface
#   <int>   <dbl>    <dbl>    <dbl>
# 1     1  0.853    0.560   0.424  
# 2     1  0        0.0602  0.679  
# 3     1  0.485    0.497   0.166  
# 4     2  0.124    0.285   0.828  
# 5     2  0.426    0.312   0.647  
# 6     2  0.545    0.425   0.998  
# 7     3  0.491    0.493   0.669  
# 8     3  0.0604   0.474   0.00257
# 9     3  0.527    0.429  NA      
#10     4  0.852    0.109   0.653  
#11     4  0.246    0.449   0.540  
#12     4  0.626    0.145  NA      

tidyr:：pivot_更长（df，cols=-Run，names_to='.value'）
#在地表进行测井
#              
# 1     1  0.853    0.560   0.424  
# 2     1  0        0.0602  0.679  
# 3     1  0.485    0.497   0.166  
# 4     2  0.124    0.285   0.828  
# 5     2  0.426    0.312   0.647  
# 6     2  0.545    0.425   0.998  
# 7     3  0.491    0.493   0.669  
# 8     3  0.0604   0.474   0.00257
#930.5270429NA
#10     4  0.852    0.109   0.653  
#11     4  0.246    0.449   0.540  
#12 4 0.626 0.145纳

PS-不建议使用重复列名的数据。您好，亲爱的Sadaf。如果您可以共享一个可复制的数据或数据集的一小部分摘录，那就太好了。此外，如果我们知道您最终想用第二种数据格式做什么，那么弄清楚使用哪种方法会有所帮助。@SleepyMiles，数据与实验的运行不同。我想看看变量最终会如何演变，试着绘制Z分数图或一些时间序列。当然@Sadaf，你的问题很有趣，下次试着把你的数据样本作为问题的一部分。最好的！完全同意@Ronak Shah。事实上，我不得不强制这些名字，因为最初R在阅读时使它们不同。我只是想尽可能接近萨达夫的问题。
library(data.table)
#Some data in the same format as you have
df <- read.table(text="
1 0.8525099 0.5598105 0.4242143 0 0.06016425 0.678719492 0.4852765  0.4970301 0.1657070
2 0.1237982 0.2853534 0.8281460 0.42586728 0.31214568 0.647306659 0.5445816  0.4250520 0.9975251
3 0.4907858 0.4925835 0.6689135 0.06042183 0.47391134 0.002571686 0.5267215  0.4291427 NA
4 0.8524778 0.1091856 0.6529887 0.24606198 0.44869099 0.540201766 0.6263992  0.1448730 NA
")
#assigning names to columns
colnames(df) <- c("Run", rep(c("Logging", "Salinity","surface"),3))
setDT(df) #converting df into a data.table

df #Similar as your initial data frame
   Run   Logging  Salinity   surface    Logging   Salinity     surface   Logging  Salinity   surface
1:   1 0.8525099 0.5598105 0.4242143 0.00000000 0.06016425 0.678719492 0.4852765 0.4970301 0.1657070
2:   2 0.1237982 0.2853534 0.8281460 0.42586728 0.31214568 0.647306659 0.5445816 0.4250520 0.9975251
3:   3 0.4907858 0.4925835 0.6689135 0.06042183 0.47391134 0.002571686 0.5267215 0.4291427        NA
4:   4 0.8524778 0.1091856 0.6529887 0.24606198 0.44869099 0.540201766 0.6263992 0.1448730        NA

df2 <- melt(df, #melting data, converting from wide to long
        id.vars = 1, # here we attempt to fix the first column "Runs"
        measure.vars = patterns(Logging="Logging", # here we look up for a pattern of column names to convert into measure
                                Salinity= "Salinity",
                                surface="surface")
        )


#Output
df2
    Run variable    Logging   Salinity     surface
 1:   1        1 0.85250990 0.55981050 0.424214300
 2:   2        1 0.12379820 0.28535340 0.828146000
 3:   3        1 0.49078580 0.49258350 0.668913500
 4:   4        1 0.85247780 0.10918560 0.652988700
 5:   1        2 0.00000000 0.06016425 0.678719492
 6:   2        2 0.42586728 0.31214568 0.647306659
 7:   3        2 0.06042183 0.47391134 0.002571686
 8:   4        2 0.24606198 0.44869099 0.540201766
 9:   1        3 0.48527650 0.49703010 0.165707000
10:   2        3 0.54458160 0.42505200 0.997525100
11:   3        3 0.52672150 0.42914270          NA
12:   4        3 0.62639920 0.14487300          NA

#Removing column variable (second column in df2) you get your result
df2[, -2]
    Run    Logging   Salinity     surface
 1:   1 0.85250990 0.55981050 0.424214300
 2:   2 0.12379820 0.28535340 0.828146000
 3:   3 0.49078580 0.49258350 0.668913500
 4:   4 0.85247780 0.10918560 0.652988700
 5:   1 0.00000000 0.06016425 0.678719492
 6:   2 0.42586728 0.31214568 0.647306659
 7:   3 0.06042183 0.47391134 0.002571686
 8:   4 0.24606198 0.44869099 0.540201766
 9:   1 0.48527650 0.49703010 0.165707000
10:   2 0.54458160 0.42505200 0.997525100
11:   3 0.52672150 0.42914270          NA
12:   4 0.62639920 0.14487300          NA

tidyr::pivot_longer(df, cols = -Run, names_to = '.value')

#     Run Logging Salinity  surface
#   <int>   <dbl>    <dbl>    <dbl>
# 1     1  0.853    0.560   0.424  
# 2     1  0        0.0602  0.679  
# 3     1  0.485    0.497   0.166  
# 4     2  0.124    0.285   0.828  
# 5     2  0.426    0.312   0.647  
# 6     2  0.545    0.425   0.998  
# 7     3  0.491    0.493   0.669  
# 8     3  0.0604   0.474   0.00257
# 9     3  0.527    0.429  NA      
#10     4  0.852    0.109   0.653  
#11     4  0.246    0.449   0.540  
#12     4  0.626    0.145  NA