R 在没有聚合的情况下透视数据帧
其目的是将数据帧(表示一对多关系:一台计算机对多台监视器)转换为更广泛的表示 数据帧(缩写)可以是:R 在没有聚合的情况下透视数据帧,r,tidyr,R,Tidyr,其目的是将数据帧(表示一对多关系:一台计算机对多台监视器)转换为更广泛的表示 数据帧(缩写)可以是: 库(tidyverse) df% 分组依据(CPU ID)%>% 过滤器(行号()==1)%>% 解组()%>% 使用(~paste0(“monitor1_uuu”),重命名_,.cols=!CPU_ID)%>% 左联合( df%>% 分组依据(CPU ID)%>% 过滤器(行号()==2)%>% 解组()%>% 使用(~paste0(“监视器2_uuuuuuuuuuuuuuuuuuuuuuuu
库(tidyverse)
df%
分组依据(CPU ID)%>%
过滤器(行号()==1)%>%
解组()%>%
使用(~paste0(“monitor1_uuu”),重命名_,.cols=!CPU_ID)%>%
左联合(
df%>%
分组依据(CPU ID)%>%
过滤器(行号()==2)%>%
解组()%>%
使用(~paste0(“监视器2_uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu,
by=“CPU\u ID”
)
#>#A tibble:8 x 13
#>CPU\u ID监视器1\u ID监视器1\u配置~monitor1\u名称监视器1\u Alloca~monitor1\u模型监视器1\u供应商
#>
#>1 182434 195251 101142000825 COMP00572 2014-04-10 HP ELITE DISP~Hewlett-Packard
#>2 182436 183607 101142000008 COMP000008 2014-04-18惠普精英DISP~Hewlett-Packard
#>3 182437 228469 1142006861 COMP020117 2018-03-05 S22C45KBW三星
#>4 182438 205930 101142001009 COMP05002 2019-05-20 S22C45KBW三星
#>5 182439240546 1142008622 COMP05131 2016-09-16三星同步~三星
#>6 182462 184114 101142000515 COMP00515 2019-08-27惠普精英展示
#>7 182463 184113 101142000514 COMP000514 2019-08-28 HP ELITE DISP~Hewlett-Packard
#>8 182464 184106 101142000507 COMP000507 2019-08-27 HP ELITE DISP~Hewlett-Packard
#> # ... 还有6个变量:monitor2\u ID、monitor2\u CONFIGITEM\u编号、,
#>#监视器2_名称、监视器2_分配日期、监视器2_型号、监视器2_供应商
但是在真实的数据帧中,有些情况下每台计算机有两个以上的监视器,因此这个公式需要许多左键联接
我试着写一个备选方案,比如:
df%>%
分组依据(CPU ID)%>%
变异(监视器n=行数())%>%
解组()%>%
支点更宽(
id\u cols=CPU\u id,
name\u from=监视器,
值\u from=!CPU\u ID
) %>%
选择(-start_with(“monitor_n”))%>%
使用(函数(colname)重命名_
str_replace(colname,“^(.*)_(\\d)$”,“monitor\\2\\1”),
.cols=!CPU\u ID)
#>#A tibble:8 x 13
#>CPU\u ID监视器1\u ID监视器2\u ID监视器1\u配置~监视器2\u配置~监视器1\u名称监视器2\u名称
#>
#>1 182434 195251 405022 101142000825 1142027261 COMP000572 COMP030500
#>2 182436 183607 NA 101142000008 NA COMP000008
#>3 182437 228469 341806 1142006861 1142019822 COMP020117 COMP05244
#>4 182438 205930 NA 101142001009 NA COMP05002
#>5182439240546 NA 1142008622 NA COMP05131
#>6 182462 184114 NA 101142000515 NA COMP0000515
#>7 182463 184113 NA 101142000514 NA COMP0000514
#>8 182464 184106 NA 101142000507 NA COMP0000507
#> # ... 还有6个变量:monitor1\u AllocationDate、monitor2\u AllocationDate、,
#>#监视器1_模型、监视器2_模型、监视器1_供应商、监视器2_供应商
但是我需要按照与原始数据帧相同的顺序来保存列
你能推荐其他更简单(更整洁)的替代方案吗?也许是类似的
df %>%
group_by(CPU_ID) %>%
mutate(rowno = row_number()) %>%
ungroup %>%
gather(var, val, -CPU_ID, -rowno) %>%
mutate(newcolname = paste0("monitor", rowno, "_", var)) %>%
select(-c(var, rowno)) %>%
pivot_wider(names_from = newcolname, values_from = val)
# A tibble: 8 x 13
CPU_ID monitor1_ID monitor2_ID monitor1_CONFIG~ monitor2_CONFIG~ monitor1_NAME monitor2_NAME monitor1_Alloca~ monitor2_Alloca~ monitor1_Model
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 182434 195251 405022 101142000825 1142027261 COMP000572 COMP030500 2014-04-10 2020-12-02 HP ELITE DISP~
2 182436 183607 NA 101142000008 NA COMP000008 NA 2014-04-18 NA HP ELITE DISP~
3 182437 228469 341806 1142006861 1142019822 COMP020117 COMP050244 2018-03-05 2019-01-09 S22C45KBW
4 182438 205930 NA 101142001009 NA COMP050002 NA 2019-05-20 NA S22C45KBW
5 182439 240546 NA 1142008622 NA COMP050131 NA 2016-09-16 NA SAMSUNG SYNCM~
6 182462 184114 NA 101142000515 NA COMP000515 NA 2019-08-27 NA HP ELITE DISP~
7 182463 184113 NA 101142000514 NA COMP000514 NA 2019-08-28 NA HP ELITE DISP~
8 182464 184106 NA 101142000507 NA COMP000507 NA 2019-08-27 NA HP ELITE DISP~
# ... with 3 more variables: monitor2_Model <chr>, monitor1_Vendor <chr>, monitor2_Vendor <chr>
与@Lenny的第二个解决方案类似,我建议先旋转更长的轴,然后再旋转更宽的轴。一个潜在的缺点是,您需要至少暂时使它们都是相同的类型,例如字符,但如果需要,您可以在最后将其中任何一个转换回
df %>%
pivot_longer(cols = -CPU_ID, names_to = "variable", values_to = "value",
values_transform = list(value = as.character)) %>%
group_by(CPU_ID, variable) %>%
mutate(variable = paste(variable, row_number(), sep = "_")) %>%
ungroup() %>%
pivot_wider(names_from = variable, values_from = value)
# A tibble: 8 x 13
CPU_ID ID_1 CONFIGITEM_NUMBER… NAME_1 AllocationDate_1 Model_1 Vendor_1 ID_2 CONFIGITEM_NUMBE… NAME_2 AllocationDate_2 Model_2 Vendor_2
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 182434 195251 101142000825 COMP000… 2014-04-10 HP ELITE DISP… Hewlett-Pa… 4050… 1142027261 COMP03… 2020-12-02 V173A ACER
2 182436 183607 101142000008 COMP000… 2014-04-18 HP ELITE DISP… Hewlett-Pa… NA NA NA NA NA NA
3 182437 228469 1142006861 COMP020… 2018-03-05 S22C45KBW Samsung 3418… 1142019822 COMP05… 2019-01-09 L1940T HP
4 182438 205930 101142001009 COMP050… 2019-05-20 S22C45KBW Samsung NA NA NA NA NA NA
5 182439 240546 1142008622 COMP050… 2016-09-16 SAMSUNG SYNCM… SAMSUNG NA NA NA NA NA NA
6 182462 184114 101142000515 COMP000… 2019-08-27 HP ELITE DISP… Hewlett-Pa… NA NA NA NA NA NA
7 182463 184113 101142000514 COMP000… 2019-08-28 HP ELITE DISP… Hewlett-Pa… NA NA NA NA NA NA
8 182464 184106 101142000507 COMP000… 2019-08-27 HP ELITE DISP… Hewlett-Pa… NA NA NA NA NA NA
df%>%
pivot_更长(cols=-CPU_ID,name_to=“variable”,values_to=“value”,
值\u transform=list(值=as.character))%>%
分组依据(CPU ID,变量)%>%
变异(变量=粘贴(变量,行号(),sep=“”))%>%
解组()%>%
透视图(名称从=变量,值从=值)
#一个tibble:8x13
CPU\u ID\u 1配置项\u编号…名称\u 1分配日期\u 1型号\u 1供应商\u 1 ID\u 2配置项\u编号…名称\u 2分配日期\u 2型号\u 2供应商\u 2
1 182434 195251 101142000825 COMP000…2014-04-10惠普ELITE DISP…Hewlett-Pa…4050…1142027261 COMP03…2020-12-02 V173A宏碁
2 182436 183607 101142000008 COMP000…2014-04-18 HP ELITE DISP…Hewlett-Pa…不适用
3 182437 228469 1142006861 COMP020…2018-03-05 S22C45KBW三星3418…1142019822 COMP05…2019-01-09 L1940T HP
4 182438 205930 101142001009 COMP050…2019-05-20 S22C45KBW三星NA
5 182439240546 1142008622 COMP050…2016-09-16三星SYNCM…三星NA-NA-NA-NA
6 182462 184114 101142000515 COMP000…2019-08-27 HP ELITE DISP…Hewlett-Pa…不适用
7 182463 184113 101142000514 COMP000…2019-08-28 HP ELITE DISP…Hewlett-Pa…不适用
8 182464 184106 101142000507 COMP000…2019-08-27 HP ELITE DISP…Hewlett-Pa…不适用
我最后使用的记录是(基于和答案):
df%>%
再长一点(
cols=!CPU_ID,
name_to=“variable”,
value_to=“value”,
值\u转换=列表(值=as.character)
) %>%
分组依据(CPU ID,变量)%>%
突变(变量=paste0(“监视器”,行号(),“,”变量))%>%
解组()%>%
P
df %>%
pivot_longer(cols = -CPU_ID, names_to = "variable", values_to = "value",
values_transform = list(value = as.character)) %>%
group_by(CPU_ID, variable) %>%
mutate(variable = paste(variable, row_number(), sep = "_")) %>%
ungroup() %>%
pivot_wider(names_from = variable, values_from = value)
# A tibble: 8 x 13
CPU_ID ID_1 CONFIGITEM_NUMBER… NAME_1 AllocationDate_1 Model_1 Vendor_1 ID_2 CONFIGITEM_NUMBE… NAME_2 AllocationDate_2 Model_2 Vendor_2
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 182434 195251 101142000825 COMP000… 2014-04-10 HP ELITE DISP… Hewlett-Pa… 4050… 1142027261 COMP03… 2020-12-02 V173A ACER
2 182436 183607 101142000008 COMP000… 2014-04-18 HP ELITE DISP… Hewlett-Pa… NA NA NA NA NA NA
3 182437 228469 1142006861 COMP020… 2018-03-05 S22C45KBW Samsung 3418… 1142019822 COMP05… 2019-01-09 L1940T HP
4 182438 205930 101142001009 COMP050… 2019-05-20 S22C45KBW Samsung NA NA NA NA NA NA
5 182439 240546 1142008622 COMP050… 2016-09-16 SAMSUNG SYNCM… SAMSUNG NA NA NA NA NA NA
6 182462 184114 101142000515 COMP000… 2019-08-27 HP ELITE DISP… Hewlett-Pa… NA NA NA NA NA NA
7 182463 184113 101142000514 COMP000… 2019-08-28 HP ELITE DISP… Hewlett-Pa… NA NA NA NA NA NA
8 182464 184106 101142000507 COMP000… 2019-08-27 HP ELITE DISP… Hewlett-Pa… NA NA NA NA NA NA