在R中使用相同名称收集长度不等的列
我想把实验数据从数据框收集到列中。数据格式如下: 我想按照下面给出的格式排列数据 在R/RStudio中有没有简单的方法可以做到这一点?我尝试了tidyr、rbind和cbind,正如在不同的示例中所建议的那样。但我无法做广告,发现这些都不太相关。 如果有人能帮助我理解,那就太好了在R中使用相同名称收集长度不等的列,r,R,我想把实验数据从数据框收集到列中。数据格式如下: 我想按照下面给出的格式排列数据 在R/RStudio中有没有简单的方法可以做到这一点?我尝试了tidyr、rbind和cbind,正如在不同的示例中所建议的那样。但我无法做广告,发现这些都不太相关。 如果有人能帮助我理解,那就太好了 谢谢我创建了一个类似的数据集,并基于绑定应用了以下代码,因为您在问题中提到了它,它听起来可能很冗长,但它可以让您获得所需的输出: library(dplyr) df <- tibble( runs =
谢谢我创建了一个类似的数据集,并基于绑定应用了以下代码,因为您在问题中提到了它,它听起来可能很冗长,但它可以让您获得所需的输出:
library(dplyr)
df <- tibble(
runs = c(1, 2, 3, 4),
col1 = c(3, 4, 5, 5),
col2 = c(5, 3, 1, 4),
col3 = c(6, 4, 9, 2),
col1 = c(0, 2, 2, 1),
col2 = c(2, 3, 1, 7),
col3 = c(2, 4, 9, 9),
col1 = c(3, 4, 5, 7),
col2 = c(3, 3, 1, 4),
col3 = c(3, 2, NA, NA), .name_repair = "minimal")
df %>%
select(2:4) %>%
bind_rows(df %>%
select(5:7)) %>%
bind_rows(df %>%
select(8:10)) %>%
select(run, col1:col3)
基本R代码由@编写,对此我非常感谢。使用
数据。表函数melt
与模式在measure.vars
中,您可以尝试以下操作:
library(data.table)
#Some data in the same format as you have
df <- read.table(text="
1 0.8525099 0.5598105 0.4242143 0 0.06016425 0.678719492 0.4852765 0.4970301 0.1657070
2 0.1237982 0.2853534 0.8281460 0.42586728 0.31214568 0.647306659 0.5445816 0.4250520 0.9975251
3 0.4907858 0.4925835 0.6689135 0.06042183 0.47391134 0.002571686 0.5267215 0.4291427 NA
4 0.8524778 0.1091856 0.6529887 0.24606198 0.44869099 0.540201766 0.6263992 0.1448730 NA
")
#assigning names to columns
colnames(df) <- c("Run", rep(c("Logging", "Salinity","surface"),3))
setDT(df) #converting df into a data.table
df #Similar as your initial data frame
Run Logging Salinity surface Logging Salinity surface Logging Salinity surface
1: 1 0.8525099 0.5598105 0.4242143 0.00000000 0.06016425 0.678719492 0.4852765 0.4970301 0.1657070
2: 2 0.1237982 0.2853534 0.8281460 0.42586728 0.31214568 0.647306659 0.5445816 0.4250520 0.9975251
3: 3 0.4907858 0.4925835 0.6689135 0.06042183 0.47391134 0.002571686 0.5267215 0.4291427 NA
4: 4 0.8524778 0.1091856 0.6529887 0.24606198 0.44869099 0.540201766 0.6263992 0.1448730 NA
df2 <- melt(df, #melting data, converting from wide to long
id.vars = 1, # here we attempt to fix the first column "Runs"
measure.vars = patterns(Logging="Logging", # here we look up for a pattern of column names to convert into measure
Salinity= "Salinity",
surface="surface")
)
#Output
df2
Run variable Logging Salinity surface
1: 1 1 0.85250990 0.55981050 0.424214300
2: 2 1 0.12379820 0.28535340 0.828146000
3: 3 1 0.49078580 0.49258350 0.668913500
4: 4 1 0.85247780 0.10918560 0.652988700
5: 1 2 0.00000000 0.06016425 0.678719492
6: 2 2 0.42586728 0.31214568 0.647306659
7: 3 2 0.06042183 0.47391134 0.002571686
8: 4 2 0.24606198 0.44869099 0.540201766
9: 1 3 0.48527650 0.49703010 0.165707000
10: 2 3 0.54458160 0.42505200 0.997525100
11: 3 3 0.52672150 0.42914270 NA
12: 4 3 0.62639920 0.14487300 NA
您可以再使用tidyr::pivot\u
。使用@Chriss-Paul的数据
tidyr::pivot_longer(df, cols = -Run, names_to = '.value')
# Run Logging Salinity surface
# <int> <dbl> <dbl> <dbl>
# 1 1 0.853 0.560 0.424
# 2 1 0 0.0602 0.679
# 3 1 0.485 0.497 0.166
# 4 2 0.124 0.285 0.828
# 5 2 0.426 0.312 0.647
# 6 2 0.545 0.425 0.998
# 7 3 0.491 0.493 0.669
# 8 3 0.0604 0.474 0.00257
# 9 3 0.527 0.429 NA
#10 4 0.852 0.109 0.653
#11 4 0.246 0.449 0.540
#12 4 0.626 0.145 NA
tidyr::pivot_更长(df,cols=-Run,names_to='.value')
#在地表进行测井
#
# 1 1 0.853 0.560 0.424
# 2 1 0 0.0602 0.679
# 3 1 0.485 0.497 0.166
# 4 2 0.124 0.285 0.828
# 5 2 0.426 0.312 0.647
# 6 2 0.545 0.425 0.998
# 7 3 0.491 0.493 0.669
# 8 3 0.0604 0.474 0.00257
#930.5270429NA
#10 4 0.852 0.109 0.653
#11 4 0.246 0.449 0.540
#12 4 0.626 0.145纳
PS-不建议使用重复列名的数据。您好,亲爱的Sadaf。如果您可以共享一个可复制的数据或数据集的一小部分摘录,那就太好了。此外,如果我们知道您最终想用第二种数据格式做什么,那么弄清楚使用哪种方法会有所帮助。@SleepyMiles,数据与实验的运行不同。我想看看变量最终会如何演变,试着绘制Z分数图或一些时间序列。当然@Sadaf,你的问题很有趣,下次试着把你的数据样本作为问题的一部分。最好的!完全同意@Ronak Shah。事实上,我不得不强制这些名字,因为最初R在阅读时使它们不同。我只是想尽可能接近萨达夫的问题。
library(data.table)
#Some data in the same format as you have
df <- read.table(text="
1 0.8525099 0.5598105 0.4242143 0 0.06016425 0.678719492 0.4852765 0.4970301 0.1657070
2 0.1237982 0.2853534 0.8281460 0.42586728 0.31214568 0.647306659 0.5445816 0.4250520 0.9975251
3 0.4907858 0.4925835 0.6689135 0.06042183 0.47391134 0.002571686 0.5267215 0.4291427 NA
4 0.8524778 0.1091856 0.6529887 0.24606198 0.44869099 0.540201766 0.6263992 0.1448730 NA
")
#assigning names to columns
colnames(df) <- c("Run", rep(c("Logging", "Salinity","surface"),3))
setDT(df) #converting df into a data.table
df #Similar as your initial data frame
Run Logging Salinity surface Logging Salinity surface Logging Salinity surface
1: 1 0.8525099 0.5598105 0.4242143 0.00000000 0.06016425 0.678719492 0.4852765 0.4970301 0.1657070
2: 2 0.1237982 0.2853534 0.8281460 0.42586728 0.31214568 0.647306659 0.5445816 0.4250520 0.9975251
3: 3 0.4907858 0.4925835 0.6689135 0.06042183 0.47391134 0.002571686 0.5267215 0.4291427 NA
4: 4 0.8524778 0.1091856 0.6529887 0.24606198 0.44869099 0.540201766 0.6263992 0.1448730 NA
df2 <- melt(df, #melting data, converting from wide to long
id.vars = 1, # here we attempt to fix the first column "Runs"
measure.vars = patterns(Logging="Logging", # here we look up for a pattern of column names to convert into measure
Salinity= "Salinity",
surface="surface")
)
#Output
df2
Run variable Logging Salinity surface
1: 1 1 0.85250990 0.55981050 0.424214300
2: 2 1 0.12379820 0.28535340 0.828146000
3: 3 1 0.49078580 0.49258350 0.668913500
4: 4 1 0.85247780 0.10918560 0.652988700
5: 1 2 0.00000000 0.06016425 0.678719492
6: 2 2 0.42586728 0.31214568 0.647306659
7: 3 2 0.06042183 0.47391134 0.002571686
8: 4 2 0.24606198 0.44869099 0.540201766
9: 1 3 0.48527650 0.49703010 0.165707000
10: 2 3 0.54458160 0.42505200 0.997525100
11: 3 3 0.52672150 0.42914270 NA
12: 4 3 0.62639920 0.14487300 NA
#Removing column variable (second column in df2) you get your result
df2[, -2]
Run Logging Salinity surface
1: 1 0.85250990 0.55981050 0.424214300
2: 2 0.12379820 0.28535340 0.828146000
3: 3 0.49078580 0.49258350 0.668913500
4: 4 0.85247780 0.10918560 0.652988700
5: 1 0.00000000 0.06016425 0.678719492
6: 2 0.42586728 0.31214568 0.647306659
7: 3 0.06042183 0.47391134 0.002571686
8: 4 0.24606198 0.44869099 0.540201766
9: 1 0.48527650 0.49703010 0.165707000
10: 2 0.54458160 0.42505200 0.997525100
11: 3 0.52672150 0.42914270 NA
12: 4 0.62639920 0.14487300 NA
tidyr::pivot_longer(df, cols = -Run, names_to = '.value')
# Run Logging Salinity surface
# <int> <dbl> <dbl> <dbl>
# 1 1 0.853 0.560 0.424
# 2 1 0 0.0602 0.679
# 3 1 0.485 0.497 0.166
# 4 2 0.124 0.285 0.828
# 5 2 0.426 0.312 0.647
# 6 2 0.545 0.425 0.998
# 7 3 0.491 0.493 0.669
# 8 3 0.0604 0.474 0.00257
# 9 3 0.527 0.429 NA
#10 4 0.852 0.109 0.653
#11 4 0.246 0.449 0.540
#12 4 0.626 0.145 NA