tidyr:重复发行

tidyr:重复发行,r,dplyr,pivot,tidyr,R,Dplyr,Pivot,Tidyr,我正在尝试使用pivot wide来减少数据中的行数并添加新列。但是,列的数量会增加,但行的数量保持不变。理想情况下,每个“指标”应该是一个观测值,其中数据年、公司、市场、国家等列相同。我认为这个问题可能是由于重复的观察结果,但我不明白指示符列是如何解决这个问题的 我的数据示例: LongTest <- structure(list(DataYear = c(2018L, 2017L, 2016L, 2018L, 2017L, 2016L, 2018L, 2017L, 2016L

我正在尝试使用pivot wide来减少数据中的行数并添加新列。但是,列的数量会增加,但行的数量保持不变。理想情况下,每个“指标”应该是一个观测值,其中数据年、公司、市场、国家等列相同。我认为这个问题可能是由于重复的观察结果,但我不明白指示符列是如何解决这个问题的

我的数据示例:

    LongTest <- structure(list(DataYear = c(2018L, 2017L, 2016L, 2018L, 2017L, 
2016L, 2018L, 2017L, 2016L), Company = structure(c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L), .Label = "One", class = "factor"), Market = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Total", class = "factor"), 
    Country = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "ALL", class = "factor"), 
    ISO = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "ALL", class = "factor"), 
    Sector = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Insurance", class = "factor"), 
    Division = c(NA, NA, NA, NA, NA, NA, NA, NA, NA), Furtherdetails1 = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA), Furtherdetails2 = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA), Indicator = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("Tax Avoidance", 
    "Turnover"), class = "factor"), IndicatorID = c(20L, 20L, 
    20L, 20L, 20L, 20L, 26L, 26L, 26L), InputName = structure(c(3L, 
    3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("Number of employees", 
    "Profit before tax (Attributable to shareholder profit)", 
    "Tax Paid"), class = "factor"), InputCode = structure(c(2L, 
    2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("InputA", "InputB"
    ), class = "factor"), UnitRequired = structure(c(2L, 2L, 
    2L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("#", "GBP"), class = "factor"), 
    Value = c(4.47e+08, 6.2e+08, 6.47e+08, 2.129e+09, 2.003e+09, 
    1.193e+09, 37628, 42431, 39833.44), UniqueID = 1:9), class = "data.frame", row.names = c(NA, 
-9L))
理想输出如下:

    structure(list(DataYear = c(2018L, 2017L, 2016L, 2018L, 2017L, 
2016L), Company = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "One", class = "factor"), 
    Market = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "Total", class = "factor"), 
    Country = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "ALL", class = "factor"), 
    ISO = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "ALL", class = "factor"), 
    Sector = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "Insurance", class = "factor"), 
    Division = c(NA, NA, NA, NA, NA, NA), Furtherdetails1 = c(NA, 
    NA, NA, NA, NA, NA), Furtherdetails2 = c(NA, NA, NA, NA, 
    NA, NA), Indicator = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("Tax Avoidance", 
    "Turnover"), class = "factor"), IndicatorID = c(20L, 20L, 
    20L, 26L, 26L, 26L), Value_InputA = c(2129000000L, 2003000000L, 
    1193000000L, NA, NA, NA), InputName_InputA = structure(c(2L, 
    2L, 2L, 1L, 1L, 1L), .Label = c("", "Profit before tax (Attributable to shareholder profit)"
    ), class = "factor"), UnitRequired_InputA = structure(c(2L, 
    2L, 2L, 1L, 1L, 1L), .Label = c("", "GBP"), class = "factor"), 
    Value_InputB = c(4.47e+08, 6.2e+08, 6.47e+08, 37628, 42431, 
    39833.44), InputName_InputB = structure(c(2L, 2L, 2L, 1L, 
    1L, 1L), .Label = c("Number of employees", "Tax Paid"), class = "factor"), 
    UnitRequired_InputB = structure(c(2L, 2L, 2L, 1L, 1L, 1L), .Label = c("#", 
    "GBP"), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L))
任何帮助都将不胜感激


感谢使用@Ronak Shah的建议创建一个

列,下面的内容似乎可以做到这一点。我添加了第二个分组列,
Indicator

library(tidyverse)

LongTest %>%
  group_by(InputCode, Indicator) %>% 
  mutate(row = row_number()) %>%
  pivot_wider(id_cols = c(row, Indicator),
              names_from = InputCode, 
              values_from = c(Value, UnitRequired, InputName)) %>%
  select(-row)

你能展示你的预期产出吗?您是否需要
LongTest%%>%groupby(InputCode)%%>%mutate(row=row\u number())%%>%pivot\u加宽(names\u from=InputCode,values\u from=c(Value,UnitRequired,InputName))
?来自@Rui Barradas的答案有效,但重复项被分成多行。如果您想保留重复的值,请将每个唯一标识符折叠成一行,然后为我工作。我不完全理解这段代码在做什么,但它工作起来很有魅力。另外,我知道这是一个非常晚的答案,但我想我会把它包括在任何其他正在寻找解决这个问题的方法,但还没有看到其他帖子的人身上。
    structure(list(DataYear = c(2018L, 2017L, 2016L, 2018L, 2017L, 
2016L), Company = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "One", class = "factor"), 
    Market = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "Total", class = "factor"), 
    Country = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "ALL", class = "factor"), 
    ISO = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "ALL", class = "factor"), 
    Sector = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "Insurance", class = "factor"), 
    Division = c(NA, NA, NA, NA, NA, NA), Furtherdetails1 = c(NA, 
    NA, NA, NA, NA, NA), Furtherdetails2 = c(NA, NA, NA, NA, 
    NA, NA), Indicator = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("Tax Avoidance", 
    "Turnover"), class = "factor"), IndicatorID = c(20L, 20L, 
    20L, 26L, 26L, 26L), Value_InputA = c(2129000000L, 2003000000L, 
    1193000000L, NA, NA, NA), InputName_InputA = structure(c(2L, 
    2L, 2L, 1L, 1L, 1L), .Label = c("", "Profit before tax (Attributable to shareholder profit)"
    ), class = "factor"), UnitRequired_InputA = structure(c(2L, 
    2L, 2L, 1L, 1L, 1L), .Label = c("", "GBP"), class = "factor"), 
    Value_InputB = c(4.47e+08, 6.2e+08, 6.47e+08, 37628, 42431, 
    39833.44), InputName_InputB = structure(c(2L, 2L, 2L, 1L, 
    1L, 1L), .Label = c("Number of employees", "Tax Paid"), class = "factor"), 
    UnitRequired_InputB = structure(c(2L, 2L, 2L, 1L, 1L, 1L), .Label = c("#", 
    "GBP"), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L))
library(tidyverse)

LongTest %>%
  group_by(InputCode, Indicator) %>% 
  mutate(row = row_number()) %>%
  pivot_wider(id_cols = c(row, Indicator),
              names_from = InputCode, 
              values_from = c(Value, UnitRequired, InputName)) %>%
  select(-row)