R 将数据帧列表合并到单个数据帧中,或完全避免

R 将数据帧列表合并到单个数据帧中,或完全避免,r,R,我有这样一个数据集: Company,Product,Users MSFT,Office,1000 MSFT,VS,4000 GOOG,gmail,3203 GOOG,appengine,45454 MSFT,Windows,1500 APPL,iOS,6000 APPL,iCloud,3442 公司、产品、用户 微软金融时报,办公室,1000 MSFT,VS,4000 谷歌,gmail,3203 古,阿彭金,45454 微软,视窗,1500 苹果,iOS,6000 苹果,iCloud,3442

我有这样一个数据集:

Company,Product,Users MSFT,Office,1000 MSFT,VS,4000 GOOG,gmail,3203 GOOG,appengine,45454 MSFT,Windows,1500 APPL,iOS,6000 APPL,iCloud,3442 公司、产品、用户 微软金融时报,办公室,1000 MSFT,VS,4000 谷歌,gmail,3203 古,阿彭金,45454 微软,视窗,1500 苹果,iOS,6000 苹果,iCloud,3442 我正在编写一个函数来返回一个数据框,其中包含按“用户”排序的每个公司的第n个产品,因此rankcompany(1)的输出应为:

Company Prodcut Users APPL APPL iOS 6000 GOOG GOOG appengine 45454 MSFT MSFT VS 4000 公司Prodcut用户 苹果苹果iOS 6000 GOOG GOOG appengine 454 MSFT MSFT VS 4000 该函数如下所示:

rankcompany <- function(num=1){

    #Read data file
    company_data <- read.csv("company.csv",stringsAsFactors = FALSE)

    #split by company
    split_data <- split(company_data, company_data$Company)

    #sort and select the nth row
    selected <- lapply(split_data, function(df) {
                                                df <- df[order(-df$Users, df$Product),]
                                                df[num,]
                                                 })

    #compose output data frame
    #this part needs to be smarter??
    len <- length(selected)
    selected_df <- data.frame(Company=character(len),Prodcut=character(len), Users=integer(len),stringsAsFactors = FALSE)
    row.names(selected_df) <- names(selected)


    for (n in names(selected)){
        print(str(selected[[n]]))
        selected_df[n,] <- selected[[n]][1,]

    }

    selected_df
}

rankcompany使用
dplyr
,您可以以一种更简单的方式完成此任务:

rankcompany <- function(d, num=1) {
   d %>% group_by(Company) %>% arrange(desc(Users)) %>% slice(num)
}
或:


基于@DMT的评论 我将合并代码替换为:

    selected_df <- rbindlist(selected)
    selected_df <- as.data.frame(selected_df)
    row.names(selected_df) <- names(selected)
    selected_df

selected_df如果您喜欢
split
lappy
的清晰性,您可以使用更短的函数版本

rankcompany <- function(N){
    byCompany <- split(df, sorted$Company)
    ranks <- lapply(byCompany,
             function(x)
             {
               r <- which(rank(-x$Users)==N)
               x[r,]
             })
    do.call("rbind", ranks)
}

rankcompany(1)

> rankcompany(1)
     Company   Product Users
APPL    MSFT        VS  4000
GOOG    GOOG appengine 45454
MSFT    APPL       iOS  6000

rankcompany如果您使用的是
rbindlist
,在执行此操作之前,您可能不需要转换为
data.frame

library(data.table) ## 1.9.2+
n <- 1L
setDT(company_data)[order(-Users), .SD[n], keyby=Company]
#   Company   Product Users
#1:    APPL       iOS  6000
#2:    GOOG appengine 45454
#3:    MSFT        VS  4000
但之前的解决方案是一种更有效、更容易解决问题的方法

数据
company\u data查看package data.table中的rbindlist,以替换之后的所有代码lapply@DMT:试过了。工作正常,但输出中似乎丢失了行名称。例如,它以索引而不是“APPL”开始。
rankcompany <- function(N){
    byCompany <- split(df, sorted$Company)
    ranks <- lapply(byCompany,
             function(x)
             {
               r <- which(rank(-x$Users)==N)
               x[r,]
             })
    do.call("rbind", ranks)
}

rankcompany(1)

> rankcompany(1)
     Company   Product Users
APPL    MSFT        VS  4000
GOOG    GOOG appengine 45454
MSFT    APPL       iOS  6000
library(data.table) ## 1.9.2+
n <- 1L
setDT(company_data)[order(-Users), .SD[n], keyby=Company]
#   Company   Product Users
#1:    APPL       iOS  6000
#2:    GOOG appengine 45454
#3:    MSFT        VS  4000
DT <- rbindlist(selected)
DT[order(-Users), .SD[n], keyby=Company]
company_data <-  structure(list(Company = c("MSFT", "MSFT", "GOOG", "GOOG", "MSFT", 
"APPL", "APPL"), Product = c("Office", "VS", "gmail", "appengine", 
"Windows", "iOS", "iCloud"), Users = c(1000L, 4000L, 3203L, 45454L, 
1500L, 6000L, 3442L)), .Names = c("Company", "Product", "Users"
), class = "data.frame", row.names = c(NA, -7L))