R 检查csv的行格式_R_Data Structures_Formatting_Subset

R 检查csv的行格式

r data-structures formatting

R 检查csv的行格式,r,data-structures,formatting,subset,R,Data Structures,Formatting,Subset,我正在尝试导入一些数据（如下），并检查是否有适当的行数，以供以后分析 repexample <- structure(list(QueueName = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L ), .Label = c(" Overall", "CCM4.usci_retention_eng", "usci_helpdesk" ),

我正在尝试导入一些数据（如下），并检查是否有适当的行数，以供以后分析

repexample <- structure(list(QueueName = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), .Label = c(" Overall", "CCM4.usci_retention_eng", "usci_helpdesk"
), class = "factor"), X8Tile = structure(c(1L, 2L, 3L, 4L, 5L, 
6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 
9L), .Label = c(" Average", "1", "2", "3", "4", "5", "6", "7", 
"8"), class = "factor"), Actual = c(508.1821504, 334.6994838, 
404.9048759, 469.4068667, 489.2800416, 516.5744106, 551.7966176, 
601.5103783, 720.9810622, 262.4622533, 250.2777778, 264.8281938, 
272.2807882, 535.2466968, 278.25, 409.9285714, 511.6635101, 553, 
641, 676.1111111, 778.5517241, 886.3666667), Calls = c(54948L, 
6896L, 8831L, 7825L, 5768L, 7943L, 5796L, 8698L, 3191L, 1220L, 
360L, 454L, 406L, 248L, 11L, 9L, 94L, 1L, 65L, 9L, 29L, 30L), 
Pop = c(41L, 6L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 3L, 1L, 1L, 
1L, 11L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L)), .Names = c("QueueName", 
"X8Tile", "Actual", "Calls", "Pop"), class = "data.frame", row.names = c(NA, 
-22L))

repexample使用在plyr
包中实现的Split-Apply-Combine范例可以轻松解决这两个优先级问题
优先级1：识别QueueName中没有足够行的值
优先级2：删除QueueName没有足够行的行
repexample2这是一种对数据排序方式做出一些假设的方法。如果假设不符合以下条件，则可以对其进行修改（或对数据重新排序）：
## Paste together the values from your "X8tile" column
##   If all is in order, you should have "Average12345678"
##   If anything is missing, you won't....
myMatch <- names(
  which(with(repexample, tapply(X8Tile, QueueName, FUN=function(x) 
    gsub("^\\s+|\\s+$", "", paste(x, collapse = "")))) 
        == "Average12345678"))

## Use that to subset...
repexample[repexample$QueueName %in% myMatch, ]
#                  QueueName   X8Tile   Actual Calls Pop
# 1                  Overall  Average 508.1822 54948  41
# 2                  Overall        1 334.6995  6896   6
# 3                  Overall        2 404.9049  8831   5
# 4                  Overall        3 469.4069  7825   5
# 5                  Overall        4 489.2800  5768   5
# 6                  Overall        5 516.5744  7943   5
# 7                  Overall        6 551.7966  5796   5
# 8                  Overall        7 601.5104  8698   5
# 9                  Overall        8 720.9811  3191   5
# 14 CCM4.usci_retention_eng  Average 535.2467   248  11
# 15 CCM4.usci_retention_eng        1 278.2500    11   2
# 16 CCM4.usci_retention_eng        2 409.9286     9   2
# 17 CCM4.usci_retention_eng        3 511.6635    94   2
# 18 CCM4.usci_retention_eng        4 553.0000     1   1
# 19 CCM4.usci_retention_eng        5 641.0000    65   1
# 20 CCM4.usci_retention_eng        6 676.1111     9   1
# 21 CCM4.usci_retention_eng        7 778.5517    29   1
# 22 CCM4.usci_retention_eng        8 886.3667    30   1

##将“X8tile”列中的值粘贴在一起
##如果一切正常，你应该有“平均12345678”
##如果缺少什么，你就不会。。。。
我的对手
rowSummary[rowSummary$numRows !=9, ] 

repexample2 <- ddply(repexample, .(QueueName), transform, numRows=length(QueueName))
repexampleEdit <- repexample2[repexample2$numRows ==9, ]
print(repxampleEdit)

## Paste together the values from your "X8tile" column
##   If all is in order, you should have "Average12345678"
##   If anything is missing, you won't....
myMatch <- names(
  which(with(repexample, tapply(X8Tile, QueueName, FUN=function(x) 
    gsub("^\\s+|\\s+$", "", paste(x, collapse = "")))) 
        == "Average12345678"))

## Use that to subset...
repexample[repexample$QueueName %in% myMatch, ]
#                  QueueName   X8Tile   Actual Calls Pop
# 1                  Overall  Average 508.1822 54948  41
# 2                  Overall        1 334.6995  6896   6
# 3                  Overall        2 404.9049  8831   5
# 4                  Overall        3 469.4069  7825   5
# 5                  Overall        4 489.2800  5768   5
# 6                  Overall        5 516.5744  7943   5
# 7                  Overall        6 551.7966  5796   5
# 8                  Overall        7 601.5104  8698   5
# 9                  Overall        8 720.9811  3191   5
# 14 CCM4.usci_retention_eng  Average 535.2467   248  11
# 15 CCM4.usci_retention_eng        1 278.2500    11   2
# 16 CCM4.usci_retention_eng        2 409.9286     9   2
# 17 CCM4.usci_retention_eng        3 511.6635    94   2
# 18 CCM4.usci_retention_eng        4 553.0000     1   1
# 19 CCM4.usci_retention_eng        5 641.0000    65   1
# 20 CCM4.usci_retention_eng        6 676.1111     9   1
# 21 CCM4.usci_retention_eng        7 778.5517    29   1
# 22 CCM4.usci_retention_eng        8 886.3667    30   1