R 从层次结构创建表
我有一个层次结构,我想创建一个表来捕获最后一个值、前一个值和列号 我不知道从哪里开始。我可以计算列数作为一个新列,然后我想我需要融化数据,但我不能计算出什么变量,我不知道搜索什么 我的输入如下所示:R 从层次结构创建表,r,R,我有一个层次结构,我想创建一个表来捕获最后一个值、前一个值和列号 我不知道从哪里开始。我可以计算列数作为一个新列,然后我想我需要融化数据,但我不能计算出什么变量,我不知道搜索什么 我的输入如下所示: input = structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "ASIA PACIFIC", class = "factor"), V2 = structure(c(1L, 1L
input = structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = "ASIA PACIFIC", class = "factor"), V2 = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "AUSTRALIA", class = "factor"),
V3 = structure(c(1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("ACT",
"NEW SOUTH WALES"), class = "factor"), V4 = structure(c(1L,
3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 4L), .Label = c("CANBERRA",
"NEWCASTLE", "SYDNEY", "WOLLONGONG"), class = "factor"),
V5 = structure(c(9L, 2L, 6L, 4L, 7L, 3L, 5L, 8L, 10L, 1L), .Label = c("###",
"BONDI", "CAMPBELLTOWN", "GEORGE ST", "MAIN ST", "NEWTOWN",
"PITT ST", "POKOLBIN", "SMITH ST", "STRANGE PDE"), class = "factor"),
V6 = structure(c(1L, 2L, 3L, 1L, 1L, 5L, 1L, 4L, 1L, 1L), .Label = c("###",
"CHARLES AVE", "FRANCIS ST", "TOM ST", "TONY LANE"), class = "factor"),
V7 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "###", class = "factor")), class = "data.frame", row.names = c(NA,
-10L))
我希望创建以下输出:
output =
structure(list(V1 = structure(c(10L, 3L, 4L, 5L, 9L, 14L, 6L,
13L, 11L, 15L, 2L, 12L, 8L, 7L, 1L), .Label = c("AUSTRALIA",
"CANBERRA", "CHARLES AVE", "FRANCIS ST", "GEORGE ST", "MAIN ST",
"NEW SOUTH WALES", "NEWCASTLE", "PITT ST", "SMITH ST", "STRANGE PDE",
"SYDNEY", "TOM ST", "TONY LANE", "WOLLONGONG"), class = "factor"),
V2 = structure(c(6L, 4L, 9L, 11L, 11L, 5L, 8L, 10L, 8L, 7L,
1L, 7L, 7L, 3L, 2L), .Label = c("ACT", "ASIA PACIFIC", "AUSTRALIA",
"BONDI", "CAMPBELLTOWN", "CANBERRA", "NEW SOUTH WALES", "NEWCASTLE",
"NEWTOWN", "POKOLBIN", "SYDNEY"), class = "factor"), V3 = c(5L,
6L, 6L, 5L, 5L, 6L, 5L, 6L, 5L, 4L, 3L, 3L, 3L, 2L, 1L)), class = "data.frame", row.names = c(NA,
-15L))
我不知道我应该搜索什么来开始。如果有人能建议我从哪里开始,我将不胜感激。通常这种缺失的值用NA表示,但在这里用表示,我们可以从每行中删除这些值,并使用tail选择最后两个值并返回lengthnew_x 同样的,一个整洁的方法可能是
library(tidyverse)
input %>%
mutate(row = row_number()) %>%
gather(col, value, -row) %>%
filter(value != "###") %>%
group_by(row) %>%
mutate(last_value = row_number()) %>%
slice(c(n(), n() - 1)) %>%
mutate(col = c("col1", "col2"),
last_value = max(last_value)) %>%
spread(col, value) %>%
ungroup() %>%
select(-row)
# A tibble: 9 x 3
# last_value col1 col2
# <int> <chr> <chr>
#1 6 CHARLES AVE BONDI
#2 6 FRANCIS ST NEWTOWN
#3 5 GEORGE ST SYDNEY
#4 5 PITT ST SYDNEY
#5 6 TONY LANE CAMPBELLTOWN
#6 5 MAIN ST NEWCASTLE
#7 6 TOM ST POKOLBIN
#8 5 STRANGE PDE NEWCASTLE
#9 4 WOLLONGONG NEW SOUTH WALES
通常这种缺失值用NA表示,但在这里用表示,我们可以从每行中删除这些值,然后使用tail选择最后两个值,并返回lengthnew_x 同样的,一个整洁的方法可能是
library(tidyverse)
input %>%
mutate(row = row_number()) %>%
gather(col, value, -row) %>%
filter(value != "###") %>%
group_by(row) %>%
mutate(last_value = row_number()) %>%
slice(c(n(), n() - 1)) %>%
mutate(col = c("col1", "col2"),
last_value = max(last_value)) %>%
spread(col, value) %>%
ungroup() %>%
select(-row)
# A tibble: 9 x 3
# last_value col1 col2
# <int> <chr> <chr>
#1 6 CHARLES AVE BONDI
#2 6 FRANCIS ST NEWTOWN
#3 5 GEORGE ST SYDNEY
#4 5 PITT ST SYDNEY
#5 6 TONY LANE CAMPBELLTOWN
#6 5 MAIN ST NEWCASTLE
#7 6 TOM ST POKOLBIN
#8 5 STRANGE PDE NEWCASTLE
#9 4 WOLLONGONG NEW SOUTH WALES
谢谢我不认为我遵循了这个例子,如果我想把它一直扩展到顶部,我已经编辑了输出,因为我不能在这里发布,我如何避免重复?第一个值是多少?史密斯街,堪培拉哪个是数据框的列名?为什么没有选择列的最后两个值?抱歉@RonakShah-我已经塞满了,忘记在输入中添加header=F。@nicshah如果您想避免重复,那么为什么您的输出中会重复一些值,如新南威尔士、澳大利亚、悉尼?谢谢。我不认为我遵循了这个例子,如果我想把它一直扩展到顶部,我已经编辑了输出,因为我不能在这里发布,我如何避免重复?第一个值是多少?史密斯街,堪培拉哪个是数据框的列名?为什么没有选择列的最后两个值?抱歉@RonakShah-我已经塞满了,忘记在输入中添加header=F。@nicshah如果您想避免重复,那么为什么在输出中重复一些值,如新南威尔士、澳大利亚、悉尼?