通过ddply对数据帧进行子集设置，然后在子集R上应用adply函数_R_Plyr

通过ddply对数据帧进行子集设置，然后在子集R上应用adply函数

通过ddply对数据帧进行子集设置，然后在子集R上应用adply函数,r,plyr,R,Plyr,我在使用plyr制定逻辑代码时遇到了一些问题。我的问题涉及两个不同长度的大数据帧，示例如下： dfSample <- structure(list(Type = structure(c(8L, 100L, 86L, 86L, 86L, 86L, 33L, 8L, 105L, 44L, 36L, 107L, 107L, 78L, 33L, 105L, 99L, 10L, 16L, 75L), .Label = c("Alumni Services", "Anti-Virus and

我在使用plyr制定逻辑代码时遇到了一些问题。我的问题涉及两个不同长度的大数据帧，示例如下：

dfSample <-
 structure(list(Type = structure(c(8L, 100L, 86L, 86L, 86L, 86L, 
 33L, 8L, 105L, 44L, 36L, 107L, 107L, 78L, 33L, 105L, 99L, 10L, 
 16L, 75L), .Label = c("Alumni Services", "Anti-Virus and Malware", 
 "Application Integration", "Application Monitoring", "Application Testing", 
 "Audio Visual Support", "Audio Visual Support - CLS", "Audio Visual Support - Non-CLS", 
 "Backup Services", "Banner", "Bus and Law", "Business Analysis", 
 "Careers", "Common Learning Spaces", "Communication and Marketing", 
 "Computer Aided Assessment", "Conference Accounts", "Content Management", 
 "Database Services", "Datacentre", "Desktop Monitoring", "Desktop Software", 
 "Document Management", "Email", "Email Programs", "Encryption", 
 "Eng and the Enviro", "Equipment Disposal", "Estates and Facilities", 
 "Examination Papers", "Faculty Engagement", "Filestore Support Services", 
 "Finance Services", "General Admin Services", "General InfoSec Advice", 
 "Generic Accounts", "Grid Accounts (HPC)", "Health Sciences", 
 "High Performance Computing (HPC)", "Hosted webspace (LAMP/IIS)", 
 "HR and Payroll Services", "HR General", "HR Recruitment", "HR Systems", 
 "Hub Rooms", "Humanities", "ICT Facilities", "ID Card Services", 
 "Identity Management (User accounts)", "Identity Services", "Information Policy Breaches", 
 "Information Risk Analysis", "iSolutions Admin Services", "iSolutions Administration", 
 "IT Training and Development", "Large File Transfer", "Lecture Capture", 
 "Lecture Capture - CLS", "Lecture Capture - Non-CLS", "Legacy Corporate Systems", 
 "Library Services", "Licence Management", "Managed Print Service", 
 "Management Servers", "Media Asset Management", "Media Support", 
 "Medicine", "Meet and Greet", "Misuse and Security Incidents", 
 "Misuse Of Systems", "Mobile Apps", "Mobile Devices", "Natural and Enviro Sci", 
 "Network Access Services", "Network Services", "OS Builds", "Other Learning Systems", 
 "Personal Filestore", "Personal web pages", "Phys and Applied", 
 "Printing (Managed)", "Printing (Not MPS)", "Project Management and Resourcing", 
 "Repair", "Reporting Services", "Request for Software", "Research Filestore", 
 "Research Governance", "Research Management", "Research Output", 
  "Resource Filestore", "Risk Analysis and Assessment", "Security", 
 "Self Service Help", "Server Monitoring", "Service Hosting", 
 "ServiceLine", "Soc and Human Sci", "Software Configuration Management", 
 "Software Licensing and Management", "Software Services", "SportRec", 
 "Staff Accounts", "Staff Desktop Deployment", "Staff Desktop Services", 
 "Staff Desktop Services (Not UoS Build)", "Student Accounts", 
 "Student Admin Services", "Student Personal Workstations", "SUSSED", 
 "Switchboard", "Switchboard Infrastructure", "System Access Request", 
 "Telephony", "University Admin Services", "Unmanaged Printing", 
 "Videoconferencing", "Videoconferencing - CLS", "Videoconferencing - Non-CLS", 
 "Virtual Learning Environment (VLE)", "Visitor Accounts", "Web Statistics", 
 "Windows Core Environment"), class = "factor"), Tkt.Category = structure(c(19L, 
 17L, 17L, 17L, 17L, 17L, 2L, 19L, 5L, 2L, 9L, 9L, 9L, 4L, 2L, 
 5L, 20L, 2L, 19L, 20L), .Label = c("Communication and Collaboration", 
 "Corporate Services", "Data Centre", "Data Storage Services", 
 "Desktop IT", "Faculty IT", "Help Services", "HR", "Identity Management (User accounts)", 
 "Information Security", "Logistics", "Programmes and Projects", 
 "Quality and Testing", "Research Services", "Security", "SLO Corporate Services", 
 "Software", "Standard", "Teaching Services", "Underpinning Services", 
 "Web Services"), class = "factor"), `CreateDateTime` = structure(c(1370087940, 
 1370156160, 1370162340, 1370178840, 1370190000, 1370240400, 1370242920, 
 1370243040, 1370243040, 1370243280, 1370243280, 1370243520, 1370243580, 
 1370243880, 1370243880, 1370244000, 1370244120, 1370244240, 1370244300, 
 1370244360), class = c("POSIXct", "POSIXt")), `ClosingDateTime` = structure(c(1374501300, 
 1372068300, 1379062020, 1390487100, 1379062080, 1375090560, 1373984760, 
 1370856420, 1370440140, 1370508240, 1370338080, 1370243820, 1370243700, 
 1370255520, 1370341440, 1370248680, 1370353560, 1370338800, 1370257140, 
 1374222600), class = c("POSIXct", "POSIXt"))), .Names = c("Type", 
 "Tkt.Category", "CreateDateTime", "ClosingDateTime"
 ), row.names = c(NA, 20L), class = "data.frame")

QCalc <- function(m) {
  adply(DF2, 1, transform, q=as.character(
                               nrow(subset(m, CreateDateTime <= DateTime & 
                                              ClosingDateTime >= DateTime))))
}

ServiceQueue <- ddply(dfSample, .(Tkt.Category), QCalc)

使用

ddply

时，我收到的错误消息是找不到对象“

”。有人能给我指出解决这个问题的正确方向吗？

如果我们能重申你的问题，我想我们可以找到一个更简单的方法来解决它。您需要计算列表中每种类型的票证类别和每个时间戳的票证开始时间、结束时间和拥有该类别的票证数量。在SQL中，我们将编写如下内容：

SELECT Tkt.Category, DateTime, count(*)
FROM dfSample join DF2 on
CreateDateTime<= DateTime 
and ClosingDateTime>= DateTime
GROUP BY Tkt.Category, DateTime

选择Tkt.类别、日期时间、计数（*）
从dfSample连接上的DF2
CreateDateTime=DateTime
按Tkt分组。类别，日期时间

但这不是SQL（虽然可能应该是，您是从关系数据库中提取数据吗？），它的R和base R不允许我们使用不等式进行合并。因此，我们可以用merge做一些小技巧，同时避免plyr：

dfSample$id <- rownames(dfSample)
DFc <- merge(dfSample,DF2)
DFlimited <- DFc[DFc$CreateDateTime <= DFc$DateTime & DFc$ClosingDateTime >= DFc$DateTime,]
DFagg <- aggregate(id ~ Tkt.Category + DateTime, data = DFlimited, length)

dfSample$id合并两个数据帧时，我遇到了一个问题，因为它们是不同长度的a=（一个有70816行，另一个有2921行）。我尝试过使用all=TRUE，但它一直冻结我的计算机，还有其他方法吗？我想使用plyr，因为接下来我将不得不按类型和Tkt.Category对计数进行分组。
SELECT Tkt.Category, DateTime, count(*)
FROM dfSample join DF2 on
CreateDateTime<= DateTime 
and ClosingDateTime>= DateTime
GROUP BY Tkt.Category, DateTime

dfSample$id <- rownames(dfSample)
DFc <- merge(dfSample,DF2)
DFlimited <- DFc[DFc$CreateDateTime <= DFc$DateTime & DFc$ClosingDateTime >= DFc$DateTime,]
DFagg <- aggregate(id ~ Tkt.Category + DateTime, data = DFlimited, length)