R 如何将我的数据集分散到不同的类别下
我对R还是个新手。有人能帮我弄清楚如何将我的数据集分为不同的类别吗 这是我在R得到的 带有“项目代码”的值属于“大类”,其他值属于子类(实际项目)。现在,我想在子类别中分离值(即,R 如何将我的数据集分散到不同的类别下,r,dplyr,tidyverse,R,Dplyr,Tidyverse,我对R还是个新手。有人能帮我弄清楚如何将我的数据集分为不同的类别吗 这是我在R得到的 带有“项目代码”的值属于“大类”,其他值属于子类(实际项目)。现在,我想在子类别中分离值(即,NAcode的值),并将其放入第三列(如第1列是“项目代码”,第2列是“广泛类别”,第3列是“特定项目”) 更具体地说,我希望最终结果如下: 我正在考虑使用spread()命令,但它似乎不起作用。有人能给我一些关于以下步骤的建议吗 (我想指定“泛型”作为变量,然后“子类别”作为另一个变量,然后,我可以扩展表,不确定
NA
code的值),并将其放入第三列(如第1列是“项目代码”,第2列是“广泛类别”,第3列是“特定项目”)
更具体地说,我希望最终结果如下:
我正在考虑使用spread()
命令,但它似乎不起作用。有人能给我一些关于以下步骤的建议吗
(我想指定“泛型”作为变量,然后“子类别”作为另一个变量,然后,我可以扩展表,不确定)
< P>这里是一个<代码> TiyVelue<代码> >您可以考虑。我将添加示例数据,以便其他人可以提供备选方案。library(tidyverse)
df %>%
fill(Item.Code) %>%
group_by(Item.Code) %>%
mutate(Category = first(Item)) %>%
slice(2:n())
输出
# A tibble: 12 x 3
# Groups: Item.Code [3]
Item.Code Item Category
<dbl> <fct> <fct>
1 221 Prunus amygdalus Almonds, with shell
2 221 Almond (Prunus dulcis or Amygdalus communis Almonds, with shell
3 711 Pimpinella anisum (aniseed) Anise, badian, fennel, coriander
4 711 Illicium verum (star anise) Anise, badian, fennel, coriander
5 711 Carum carvi Anise, badian, fennel, coriander
6 711 Coriandrum sativum (coriander Anise, badian, fennel, coriander
7 711 Cuminum cyminum (cumin) Anise, badian, fennel, coriander
8 711 Foeniculum vulgare (fennel) Anise, badian, fennel, coriander
9 711 Juniperus communis (common juniper) Anise, badian, fennel, coriander
10 800 Agave Agave fibres nes
11 800 Agave fourcroydes (Henequen) Agave fibres nes
12 800 Agave americana (century plant) Agave fibres nes
#一个tible:12 x 3
#分组:项目代码[3]
项目。项目类别代码
1 221个杏仁核杏仁,带壳
221个杏仁(带壳的杜氏李或扁桃)
3711茴香、八角、茴香、芫荽
4711八角、八角、茴香、芫荽
5711茴香、巴旦、茴香、芫荽
6711芫荽(芫荽、八角、茴香、芫荽)
7711孜然茴香、八角、茴香、芫荽
8711小茴香(茴香)茴香、巴旦、茴香、芫荽
9711刺柏(普通刺柏)茴香、巴旦、茴香、芫荽
10800龙舌兰纤维
11 800龙舌兰四醛(Henequen)龙舌兰纤维
12800龙舌兰美洲(世纪植物)龙舌兰纤维
数据
df <- data.frame(
Item.Code = c(800, NA, NA, NA, 221, NA, NA, 711, NA, NA, NA, NA, NA, NA, NA),
Item = c("Agave fibres nes", "Agave", "Agave fourcroydes (Henequen)", "Agave americana (century plant)", "Almonds, with shell",
"Prunus amygdalus", "Almond (Prunus dulcis or Amygdalus communis", "Anise, badian, fennel, coriander",
"Pimpinella anisum (aniseed)", "Illicium verum (star anise)", "Carum carvi", "Coriandrum sativum (coriander",
"Cuminum cyminum (cumin)", "Foeniculum vulgare (fennel)", "Juniperus communis (common juniper)")
)
<代码> DF
这里有一个<>代码> TiyVelue解决方案,你可以考虑。我将添加示例数据,以便其他人可以提供备选方案。
library(tidyverse)
df %>%
fill(Item.Code) %>%
group_by(Item.Code) %>%
mutate(Category = first(Item)) %>%
slice(2:n())
输出
# A tibble: 12 x 3
# Groups: Item.Code [3]
Item.Code Item Category
<dbl> <fct> <fct>
1 221 Prunus amygdalus Almonds, with shell
2 221 Almond (Prunus dulcis or Amygdalus communis Almonds, with shell
3 711 Pimpinella anisum (aniseed) Anise, badian, fennel, coriander
4 711 Illicium verum (star anise) Anise, badian, fennel, coriander
5 711 Carum carvi Anise, badian, fennel, coriander
6 711 Coriandrum sativum (coriander Anise, badian, fennel, coriander
7 711 Cuminum cyminum (cumin) Anise, badian, fennel, coriander
8 711 Foeniculum vulgare (fennel) Anise, badian, fennel, coriander
9 711 Juniperus communis (common juniper) Anise, badian, fennel, coriander
10 800 Agave Agave fibres nes
11 800 Agave fourcroydes (Henequen) Agave fibres nes
12 800 Agave americana (century plant) Agave fibres nes
#一个tible:12 x 3
#分组:项目代码[3]
项目。项目类别代码
1 221个杏仁核杏仁,带壳
221个杏仁(带壳的杜氏李或扁桃)
3711茴香、八角、茴香、芫荽
4711八角、八角、茴香、芫荽
5711茴香、巴旦、茴香、芫荽
6711芫荽(芫荽、八角、茴香、芫荽)
7711孜然茴香、八角、茴香、芫荽
8711小茴香(茴香)茴香、巴旦、茴香、芫荽
9711刺柏(普通刺柏)茴香、巴旦、茴香、芫荽
10800龙舌兰纤维
11 800龙舌兰四醛(Henequen)龙舌兰纤维
12800龙舌兰美洲(世纪植物)龙舌兰纤维
数据
df <- data.frame(
Item.Code = c(800, NA, NA, NA, 221, NA, NA, 711, NA, NA, NA, NA, NA, NA, NA),
Item = c("Agave fibres nes", "Agave", "Agave fourcroydes (Henequen)", "Agave americana (century plant)", "Almonds, with shell",
"Prunus amygdalus", "Almond (Prunus dulcis or Amygdalus communis", "Anise, badian, fennel, coriander",
"Pimpinella anisum (aniseed)", "Illicium verum (star anise)", "Carum carvi", "Coriandrum sativum (coriander",
"Cuminum cyminum (cumin)", "Foeniculum vulgare (fennel)", "Juniperus communis (common juniper)")
)
df我们也可以使用data.table
library(data.table)
library(zoo)
setDT(df)[, c(.SD[-1], .(Category = first(Item))),.(Item.Code = na.locf0(Item.Code))]
# Item.Code Item Category
# 1: 800 Agave Agave fibres nes
# 2: 800 Agave fourcroydes (Henequen) Agave fibres nes
# 3: 800 Agave americana (century plant) Agave fibres nes
# 4: 221 Prunus amygdalus Almonds, with shell
# 5: 221 Almond (Prunus dulcis or Amygdalus communis Almonds, with shell
# 6: 711 Pimpinella anisum (aniseed) Anise, badian, fennel, coriander
# 7: 711 Illicium verum (star anise) Anise, badian, fennel, coriander
# 8: 711 Carum carvi Anise, badian, fennel, coriander
# 9: 711 Coriandrum sativum (coriander Anise, badian, fennel, coriander
#10: 711 Cuminum cyminum (cumin) Anise, badian, fennel, coriander
#11: 711 Foeniculum vulgare (fennel) Anise, badian, fennel, coriander
#12: 711 Juniperus communis (common juniper) Anise, badian, fennel, coriander
数据
df我们也可以使用data.table
library(data.table)
library(zoo)
setDT(df)[, c(.SD[-1], .(Category = first(Item))),.(Item.Code = na.locf0(Item.Code))]
# Item.Code Item Category
# 1: 800 Agave Agave fibres nes
# 2: 800 Agave fourcroydes (Henequen) Agave fibres nes
# 3: 800 Agave americana (century plant) Agave fibres nes
# 4: 221 Prunus amygdalus Almonds, with shell
# 5: 221 Almond (Prunus dulcis or Amygdalus communis Almonds, with shell
# 6: 711 Pimpinella anisum (aniseed) Anise, badian, fennel, coriander
# 7: 711 Illicium verum (star anise) Anise, badian, fennel, coriander
# 8: 711 Carum carvi Anise, badian, fennel, coriander
# 9: 711 Coriandrum sativum (coriander Anise, badian, fennel, coriander
#10: 711 Cuminum cyminum (cumin) Anise, badian, fennel, coriander
#11: 711 Foeniculum vulgare (fennel) Anise, badian, fennel, coriander
#12: 711 Juniperus communis (common juniper) Anise, badian, fennel, coriander
数据
df欢迎使用Stack Overflow!您可能希望签出以帮助其他人帮助您;特别是,粘贴数据图像并没有多大帮助,如果您的数据以文本形式提供,我们可以复制您的问题并帮助您更快地解决问题。一种方法是在您的问题中包含R命令的输出dput(head(df))
(将df
替换为数据帧的实际名称)欢迎使用Stack Overflow!您可能希望签出以帮助其他人帮助您;特别是,粘贴数据图像并不是很有帮助,如果您的数据以文本形式提供,我们可以复制您的问题并帮助您更快地解决问题。一种方法是将R命令的输出dput(head(df))
(用数据帧的实际名称替换df
)它可以工作!太好了!非常感谢!基于您的代码,我还找到了另一种方法使它看起来更漂亮。下面是我得到的:crop.sorted%%>%mutate(Category=ifelse(!is.na(code),Item,code))%%>%fill(代码,类别)%%>%group\U by(代码)%%>%slice(2:n())%%>%select(代码,类别,项目)->crop.sortedIt工作!太好了!非常感谢!基于您的代码,我还找到了另一种方法使它看起来更漂亮。下面是我得到的:crop.sorted%%>%mutate(类别=ifelse(!is.na(代码),项目,代码)%%>%填充(代码,类别)%%>%分组依据(代码)%%>%切片(2:n())%%>%选择(代码,类别,项目)->裁剪