R 将数据帧列表合并到单个数据帧中,或完全避免
我有这样一个数据集: Company,Product,Users MSFT,Office,1000 MSFT,VS,4000 GOOG,gmail,3203 GOOG,appengine,45454 MSFT,Windows,1500 APPL,iOS,6000 APPL,iCloud,3442 公司、产品、用户 微软金融时报,办公室,1000 MSFT,VS,4000 谷歌,gmail,3203 古,阿彭金,45454 微软,视窗,1500 苹果,iOS,6000 苹果,iCloud,3442 我正在编写一个函数来返回一个数据框,其中包含按“用户”排序的每个公司的第n个产品,因此rankcompany(1)的输出应为: Company Prodcut Users APPL APPL iOS 6000 GOOG GOOG appengine 45454 MSFT MSFT VS 4000 公司Prodcut用户 苹果苹果iOS 6000 GOOG GOOG appengine 454 MSFT MSFT VS 4000 该函数如下所示:R 将数据帧列表合并到单个数据帧中,或完全避免,r,R,我有这样一个数据集: Company,Product,Users MSFT,Office,1000 MSFT,VS,4000 GOOG,gmail,3203 GOOG,appengine,45454 MSFT,Windows,1500 APPL,iOS,6000 APPL,iCloud,3442 公司、产品、用户 微软金融时报,办公室,1000 MSFT,VS,4000 谷歌,gmail,3203 古,阿彭金,45454 微软,视窗,1500 苹果,iOS,6000 苹果,iCloud,3442
rankcompany <- function(num=1){
#Read data file
company_data <- read.csv("company.csv",stringsAsFactors = FALSE)
#split by company
split_data <- split(company_data, company_data$Company)
#sort and select the nth row
selected <- lapply(split_data, function(df) {
df <- df[order(-df$Users, df$Product),]
df[num,]
})
#compose output data frame
#this part needs to be smarter??
len <- length(selected)
selected_df <- data.frame(Company=character(len),Prodcut=character(len), Users=integer(len),stringsAsFactors = FALSE)
row.names(selected_df) <- names(selected)
for (n in names(selected)){
print(str(selected[[n]]))
selected_df[n,] <- selected[[n]][1,]
}
selected_df
}
rankcompany使用dplyr
,您可以以一种更简单的方式完成此任务:
rankcompany <- function(d, num=1) {
d %>% group_by(Company) %>% arrange(desc(Users)) %>% slice(num)
}
或:
基于@DMT的评论
我将合并代码替换为:
selected_df <- rbindlist(selected)
selected_df <- as.data.frame(selected_df)
row.names(selected_df) <- names(selected)
selected_df
selected_df如果您喜欢split
和lappy
的清晰性,您可以使用更短的函数版本
rankcompany <- function(N){
byCompany <- split(df, sorted$Company)
ranks <- lapply(byCompany,
function(x)
{
r <- which(rank(-x$Users)==N)
x[r,]
})
do.call("rbind", ranks)
}
rankcompany(1)
> rankcompany(1)
Company Product Users
APPL MSFT VS 4000
GOOG GOOG appengine 45454
MSFT APPL iOS 6000
rankcompany如果您使用的是rbindlist
,在执行此操作之前,您可能不需要转换为data.frame
:
library(data.table) ## 1.9.2+
n <- 1L
setDT(company_data)[order(-Users), .SD[n], keyby=Company]
# Company Product Users
#1: APPL iOS 6000
#2: GOOG appengine 45454
#3: MSFT VS 4000
但之前的解决方案是一种更有效、更容易解决问题的方法
数据
company\u data查看package data.table中的rbindlist,以替换之后的所有代码lapply@DMT:试过了。工作正常,但输出中似乎丢失了行名称。例如,它以索引而不是“APPL”开始。
rankcompany <- function(N){
byCompany <- split(df, sorted$Company)
ranks <- lapply(byCompany,
function(x)
{
r <- which(rank(-x$Users)==N)
x[r,]
})
do.call("rbind", ranks)
}
rankcompany(1)
> rankcompany(1)
Company Product Users
APPL MSFT VS 4000
GOOG GOOG appengine 45454
MSFT APPL iOS 6000
library(data.table) ## 1.9.2+
n <- 1L
setDT(company_data)[order(-Users), .SD[n], keyby=Company]
# Company Product Users
#1: APPL iOS 6000
#2: GOOG appengine 45454
#3: MSFT VS 4000
DT <- rbindlist(selected)
DT[order(-Users), .SD[n], keyby=Company]
company_data <- structure(list(Company = c("MSFT", "MSFT", "GOOG", "GOOG", "MSFT",
"APPL", "APPL"), Product = c("Office", "VS", "gmail", "appengine",
"Windows", "iOS", "iCloud"), Users = c(1000L, 4000L, 3203L, 45454L,
1500L, 6000L, 3442L)), .Names = c("Company", "Product", "Users"
), class = "data.frame", row.names = c(NA, -7L))