R 将数据帧列表合并到单个数据帧中，或完全避免_R

R 将数据帧列表合并到单个数据帧中，或完全避免

R 将数据帧列表合并到单个数据帧中，或完全避免,r,R,我有这样一个数据集： Company,Product,Users MSFT,Office,1000 MSFT,VS,4000 GOOG,gmail,3203 GOOG,appengine,45454 MSFT,Windows,1500 APPL,iOS,6000 APPL,iCloud,3442 公司、产品、用户微软金融时报，办公室，1000 MSFT，VS，4000 谷歌，gmail，3203 古，阿彭金，45454 微软，视窗，1500 苹果，iOS，6000 苹果，iCloud，3442

我有这样一个数据集：

Company,Product,Users MSFT,Office,1000 MSFT,VS,4000 GOOG,gmail,3203 GOOG,appengine,45454 MSFT,Windows,1500 APPL,iOS,6000 APPL,iCloud,3442 公司、产品、用户微软金融时报，办公室，1000 MSFT，VS，4000 谷歌，gmail，3203 古，阿彭金，45454 微软，视窗，1500 苹果，iOS，6000 苹果，iCloud，3442 我正在编写一个函数来返回一个数据框，其中包含按“用户”排序的每个公司的第n个产品，因此rankcompany（1）的输出应为：

Company Prodcut Users APPL APPL iOS 6000 GOOG GOOG appengine 45454 MSFT MSFT VS 4000 公司Prodcut用户苹果苹果iOS 6000 GOOG GOOG appengine 454 MSFT MSFT VS 4000 该函数如下所示：

rankcompany <- function(num=1){

    #Read data file
    company_data <- read.csv("company.csv",stringsAsFactors = FALSE)

    #split by company
    split_data <- split(company_data, company_data$Company)

    #sort and select the nth row
    selected <- lapply(split_data, function(df) {
                                                df <- df[order(-df$Users, df$Product),]
                                                df[num,]
                                                 })

    #compose output data frame
    #this part needs to be smarter??
    len <- length(selected)
    selected_df <- data.frame(Company=character(len),Prodcut=character(len), Users=integer(len),stringsAsFactors = FALSE)
    row.names(selected_df) <- names(selected)


    for (n in names(selected)){
        print(str(selected[[n]]))
        selected_df[n,] <- selected[[n]][1,]

    }

    selected_df
}

rankcompany使用dplyr
，您可以以一种更简单的方式完成此任务：
rankcompany <- function(d, num=1) {
   d %>% group_by(Company) %>% arrange(desc(Users)) %>% slice(num)
}

或：
基于@DMT的评论
我将合并代码替换为：
    selected_df <- rbindlist(selected)
    selected_df <- as.data.frame(selected_df)
    row.names(selected_df) <- names(selected)
    selected_df

selected_df如果您喜欢split
和lappy
的清晰性，您可以使用更短的函数版本
rankcompany <- function(N){
    byCompany <- split(df, sorted$Company)
    ranks <- lapply(byCompany,
             function(x)
             {
               r <- which(rank(-x$Users)==N)
               x[r,]
             })
    do.call("rbind", ranks)
}

rankcompany(1)

> rankcompany(1)
     Company   Product Users
APPL    MSFT        VS  4000
GOOG    GOOG appengine 45454
MSFT    APPL       iOS  6000

rankcompany如果您使用的是rbindlist
，在执行此操作之前，您可能不需要转换为data.frame
：
library(data.table) ## 1.9.2+
n <- 1L
setDT(company_data)[order(-Users), .SD[n], keyby=Company]
#   Company   Product Users
#1:    APPL       iOS  6000
#2:    GOOG appengine 45454
#3:    MSFT        VS  4000

但之前的解决方案是一种更有效、更容易解决问题的方法
数据
company\u data查看package data.table中的rbindlist，以替换之后的所有代码lapply@DMT：试过了。工作正常，但输出中似乎丢失了行名称。例如，它以索引而不是“APPL”开始。
rankcompany <- function(N){
    byCompany <- split(df, sorted$Company)
    ranks <- lapply(byCompany,
             function(x)
             {
               r <- which(rank(-x$Users)==N)
               x[r,]
             })
    do.call("rbind", ranks)
}

rankcompany(1)

> rankcompany(1)
     Company   Product Users
APPL    MSFT        VS  4000
GOOG    GOOG appengine 45454
MSFT    APPL       iOS  6000

library(data.table) ## 1.9.2+
n <- 1L
setDT(company_data)[order(-Users), .SD[n], keyby=Company]
#   Company   Product Users
#1:    APPL       iOS  6000
#2:    GOOG appengine 45454
#3:    MSFT        VS  4000

DT <- rbindlist(selected)
DT[order(-Users), .SD[n], keyby=Company]

company_data <-  structure(list(Company = c("MSFT", "MSFT", "GOOG", "GOOG", "MSFT", 
"APPL", "APPL"), Product = c("Office", "VS", "gmail", "appengine", 
"Windows", "iOS", "iCloud"), Users = c(1000L, 4000L, 3203L, 45454L, 
1500L, 6000L, 3442L)), .Names = c("Company", "Product", "Users"
), class = "data.frame", row.names = c(NA, -7L))