如何在R dataframe中使用来自多个列的有序项生成新列

如何在R dataframe中使用来自多个列的有序项生成新列,r,dataframe,sorting,R,Dataframe,Sorting,我在R中有一个数据帧,如下所示: df <- data.frame( "first_col" = c("apple", "apple", "banana", "banana", "cacao", "dough"), "second_col" = c("apple", "apple", "banana", "banana", "apple", "dough"), "third_col" = c("banana", "apple", "banana", "banana", "banana",

我在R中有一个数据帧,如下所示:

df <-
data.frame(
"first_col" = c("apple", "apple", "banana", "banana", "cacao", "dough"),
"second_col" = c("apple", "apple", "banana", "banana", "apple", "dough"),
"third_col" = c("banana", "apple", "banana", "banana", "banana", "apple"),
stringsAsFactors = FALSE
)
df$label <- paste(sort(df$first_col,
                                     df$second_col,
                                     df$third_col),
                              sep = " - ")
  first_col second_col third_col
1     apple      apple    banana
2     apple      apple     apple
3    banana     banana    banana
4    banana     banana    banana
5     cacao      apple    banana
6     dough      dough     apple
  first_col second_col third_col                       label
1     apple      apple    banana      apple - apple - banana
2     apple      apple     apple       apple - apple - apple
3    banana     banana    banana    banana - banana - banana
4    banana     banana    banana    banana - banana - banana
5     cacao      apple    banana      apple - banana - cacao
6     dough      dough     apple       apple - dough - dough
很明显我做错了什么。查看文档,该方法似乎需要一个向量,因此我尝试通过这样做将其向量化

df$label <- paste(sort(c(df$first_col,
                                   df$second_col,
                                   df$third_col)),
                              sep = " - ")
我想得到这样的东西:

df <-
data.frame(
"first_col" = c("apple", "apple", "banana", "banana", "cacao", "dough"),
"second_col" = c("apple", "apple", "banana", "banana", "apple", "dough"),
"third_col" = c("banana", "apple", "banana", "banana", "banana", "apple"),
stringsAsFactors = FALSE
)
df$label <- paste(sort(df$first_col,
                                     df$second_col,
                                     df$third_col),
                              sep = " - ")
  first_col second_col third_col
1     apple      apple    banana
2     apple      apple     apple
3    banana     banana    banana
4    banana     banana    banana
5     cacao      apple    banana
6     dough      dough     apple
  first_col second_col third_col                       label
1     apple      apple    banana      apple - apple - banana
2     apple      apple     apple       apple - apple - apple
3    banana     banana    banana    banana - banana - banana
4    banana     banana    banana    banana - banana - banana
5     cacao      apple    banana      apple - banana - cacao
6     dough      dough     apple       apple - dough - dough

您可以通过查看第5行和第6行来判断是否已排序。

使用
base

df$combined<-apply(df,1,function(x) paste(sort(x),collapse="-"))
 df
  first_col second_col third_col               combined
1     apple      apple    banana   apple-apple-banana
2     apple      apple     apple    apple-apple-apple
3    banana     banana    banana banana-banana-banana
4    banana     banana    banana banana-banana-banana
5     cacao      apple    banana   apple-banana-cacao
6     dough      dough     apple    apple-dough-dough

df$组合使用
dplyr
mutate()
purrr
pmap()

库(dplyr)
图书馆(purrr)
df%
mutate(label=pmap(列表(第一列,第二列,第三列),函数(x,y,z)粘贴(排序(c(x,y,z)),collapse=“-”)
#第一列第二列第三列标签
#1苹果香蕉苹果-苹果-香蕉
#2苹果-苹果-苹果
#3香蕉-香蕉-香蕉
#4香蕉-香蕉-香蕉
#5可可苹果香蕉苹果-香蕉-可可
#6面团苹果-面团-面团

您可以添加您的预期输出吗?@NelsonGon我更新了问题,希望现在更好。谢谢,这就可以了。如果我只想在我拥有的三个列中选择两个列来生成新列,该怎么办?您的解决方案使用所有列。我可以先选择所有列,然后在这里应用该方法,但在我看来这不是最好的方法。如果我只想使用列1和列2,可以使用如下选择:
df[c(1,2)]
。注意,答案名为df
df
as
df3
,因为我有一些其他的
df
用于其他答案。
df <- structure(list(first_col = c("apple", "apple", "banana", "banana", 
"cacao", "dough"), second_col = c("apple", "apple", "banana", 
"banana", "apple", "dough"), third_col = c("banana", "apple", 
"banana", "banana", "banana", "apple"), sorted = c("apple-apple-banana", 
"apple-apple-apple", "banana-banana-banana", "banana-banana-banana", 
"apple-banana-cacao", "apple-dough-dough")), row.names = c(NA, 
-6L), class = "data.frame")
library(dplyr)
library(purrr)

df <-
  data.frame(
    "first_col" = c("apple", "apple", "banana", "banana", "cacao", "dough"),
    "second_col" = c("apple", "apple", "banana", "banana", "apple", "dough"),
    "third_col" = c("banana", "apple", "banana", "banana", "banana", "apple"),
    stringsAsFactors = FALSE
  )

df %>% 
  mutate(label = pmap(list(first_col, second_col, third_col), function(x, y, z) paste(sort(c(x,y,z)), collapse = " - ")))

# first_col second_col third_col                    label
# 1     apple      apple    banana   apple - apple - banana
# 2     apple      apple     apple    apple - apple - apple
# 3    banana     banana    banana banana - banana - banana
# 4    banana     banana    banana banana - banana - banana
# 5     cacao      apple    banana   apple - banana - cacao
# 6     dough      dough     apple    apple - dough - dough