Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/72.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 将特定列值拆分为多列_R - Fatal编程技术网

R 将特定列值拆分为多列

R 将特定列值拆分为多列,r,R,-数据集 ID<-c(1,2,3,4,5,6,7) method<-c("cheque","DD","DD","Cheque","NetBank","NetBank","Cash") type<-c("Type1","Type1","Type2","Type2","Type3","Type3","Type4") aid<-c("A1","A1","A2","A2","A3","A3","A4") month<-c("JAN","JAN","FEB","F

-数据集

ID<-c(1,2,3,4,5,6,7)
method<-c("cheque","DD","DD","Cheque","NetBank","NetBank","Cash")
type<-c("Type1","Type1","Type2","Type2","Type3","Type3","Type4")    
aid<-c("A1","A1","A2","A2","A3","A3","A4")  
month<-c("JAN","JAN","FEB","FEB","MAR","MAR","APR")
year<-c(2016,2016,2015,2015,2017,2017,2018)
Outcome<-c("Positive","Positive","Negative","Negative","Medium","Medium","Neutral")
ser_no<-c("A00001","A00001","A00002","A00002","A00003","A00003","A00004")
Units<-c(100,200,300,400,500,600,700)
amt<-c(1000,1500,2000,3000,4000,2500,6000)
user_cnt<-c(20,20,15,15,32,32,44)

data<-data.frame(ID=ID,type=type,aid=aid,month=month,year=year,Outcome=Outcome,ser_no=ser_no,Units=Units,amt=amt,user_cnt=user_cnt,method=method)  
  • 我想在输出中添加来自
    数据的方法列

    方法列只能有四个值1.支票2.DD 3.NetBank 4.空白表示现金

我想在下面的输出中添加一个方法列值(检查最后四列)。不使用sqldf是否可以执行此操作。 我正在尝试查找组中方法值的出现情况

示例:根据GROUP BY子句,第一行有
1张支票
1张DD
值,因此计数显示为
1
<代码>净银行和
现金
值不存在,因此计数为0。 根据GROUP BY子句,第三行有
2个Netbank
值,因此计数显示为
2
,并且没有
Netbank
Cash
check
值,因此计数为0

 type aid month year  Outcome ser_no members entries UNITS  amt LowestAmt HighestAmount Mean user_cnt Suggestion    Cheque      DD  Netbank     Cash
 Type1  A1   JAN 2016 Positive A00001       2       2   300 2500      1000          1500 1250       20 0.10000000      1         1      0         0
 Type2  A2   FEB 2015 Negative A00002       2       2   700 5000      2000          3000 2500       15 0.13333333      1         1      0         0
 Type3  A3   MAR 2017   Medium A00003       2       2  1100 6500      2500          4000 3250       32 0.06250000      0         0      2         0
 Type4  A4   APR 2018  Neutral A00004       1       1   700 6000      6000          6000 6000       44 0.02272727      0         0      0         1
使用数据表:

library(data.table)
DT <- setDT(data)
DT[,method := tolower(method)] # to avoid different count with upper and lower case
plouf<-dcast(DT[,.N, by = .(type,method)],type~ method)
plouf[is.na(plouf)]<-0 

    type cash cheque dd netbank
1: Type1   0     1    1      0
2: Type2   0     1    1      0
3: Type3   0     0    0      2
4: Type4   1     0    0      0
使用数据表:

library(data.table)
DT <- setDT(data)
DT[,method := tolower(method)] # to avoid different count with upper and lower case
plouf<-dcast(DT[,.N, by = .(type,method)],type~ method)
plouf[is.na(plouf)]<-0 

    type cash cheque dd netbank
1: Type1   0     1    1      0
2: Type2   0     1    1      0
3: Type3   0     0    0      2
4: Type4   1     0    0      0

我无法解决“支票”上的案例问题,因为tolower不在sqldf下工作。因此包括两个选项

sqldf("select type
,aid
,month
,year
,Outcome
,ser_no
,count(distinct ID) as members
,count(type) as entries
,sum(UNITS) as UNITS
,sum(amt) as amt
,min(amt) as LowestAmt
,max(amt) as HighestAmount
,AVG(amt) as Mean
,user_cnt
,cast (count(distinct ID) as real)/user_cnt as Suggestion
,count(case when lower(method)='cheque' then method  end ) as cheque 
,count(case when method ='DD' then method  end ) as DD 
,count(case when method ='NetBank' then method  end ) as NetBank 
,count(case when method ='Cash' then method  end ) as Cash 
from data
group by type,aid,month,year,Outcome,ser_no")

我无法解决“支票”上的案例问题,因为tolower不在sqldf下工作。因此包括两个选项

sqldf("select type
,aid
,month
,year
,Outcome
,ser_no
,count(distinct ID) as members
,count(type) as entries
,sum(UNITS) as UNITS
,sum(amt) as amt
,min(amt) as LowestAmt
,max(amt) as HighestAmount
,AVG(amt) as Mean
,user_cnt
,cast (count(distinct ID) as real)/user_cnt as Suggestion
,count(case when lower(method)='cheque' then method  end ) as cheque 
,count(case when method ='DD' then method  end ) as DD 
,count(case when method ='NetBank' then method  end ) as NetBank 
,count(case when method ='Cash' then method  end ) as Cash 
from data
group by type,aid,month,year,Outcome,ser_no")

整个聚合可以使用
数据在一条语句中完成。表

library(data.table)
setDT(data)[
  , .(members = uniqueN(ID), entries = .N, UNITS = sum(Units), amt = sum(amt), 
      LowestAmt = min(amt), HighestAmount = max(amt), Mean = mean(amt), 
      user_cnt = first(user_cnt), Suggestion =  uniqueN(ID) / first(user_cnt),
      Cheque = sum(tolower(method) == "cheque"), DD = sum(tolower(method) == "dd"), 
      NetBank = sum(tolower(method) == "netbank"), 
      Cash = sum(tolower(method) %in% c("cash", ""))), 
  by = .(type, aid, month, year, Outcome, ser_no)]

如果
method
中有4个以上的不同值,我建议使用其他方法,如
dcast()
和join

整个聚合可以使用
数据在一条语句中完成。表

library(data.table)
setDT(data)[
  , .(members = uniqueN(ID), entries = .N, UNITS = sum(Units), amt = sum(amt), 
      LowestAmt = min(amt), HighestAmount = max(amt), Mean = mean(amt), 
      user_cnt = first(user_cnt), Suggestion =  uniqueN(ID) / first(user_cnt),
      Cheque = sum(tolower(method) == "cheque"), DD = sum(tolower(method) == "dd"), 
      NetBank = sum(tolower(method) == "netbank"), 
      Cash = sum(tolower(method) %in% c("cash", ""))), 
  by = .(type, aid, month, year, Outcome, ser_no)]

如果
method
中有4个以上的不同值,我建议使用其他方法,如
dcast()
和join

有没有办法在一个查询本身中获得预期的输出,而不是附加它?我在六个列上分组,分别是
类型、援助、月、年、结果、序号
,而不仅仅是
类型
dcast(data,type+month+year+Outcome+aid+seru no~method,fun.aggregate=length)
您可以通过创建一个分组变量来使用相同的代码:
DT[,grp:=粘贴(type,aid,month,year,Outcome,seru no,sep=“”)]
然后
dcast(DT[,.N,by=(grp,method)],grp~method)
有没有办法在一个查询本身中获得预期的输出,而不是附加它?我在六个列上分组,分别是
类型、援助、月、年、结果、序号
,而不仅仅是
类型
dcast(data,type+month+year+Outcome+aid+seru no~method,fun.aggregate=length)
您可以通过创建一个分组变量来使用相同的代码:
DT[,grp:=粘贴(type,aid,month,year,Outcome,seru no,sep=“”)]
然后
dcast(DT[,.N,by=(grp,method)],grp~method)
在140万行的数据集上,它比PIG答案慢得多。清管器回答需要17秒,数据表需要1分14秒@在140万行的数据集上,UweIt比PIG答案慢得多。清管器回答需要17秒,数据表需要1分14秒@Uwe
    type aid month year  Outcome ser_no members entries UNITS  amt LowestAmt HighestAmount Mean user_cnt Suggestion Cheque DD NetBank Cash
1: Type1  A1   JAN 2016 Positive A00001       2       2   300 2500      1000          1500 1250       20 0.10000000      1  1       0    0
2: Type2  A2   FEB 2015 Negative A00002       2       2   700 5000      2000          3000 2500       15 0.13333333      1  1       0    0
3: Type3  A3   MAR 2017   Medium A00003       2       2  1100 6500      2500          4000 3250       32 0.06250000      0  0       2    0
4: Type4  A4   APR 2018  Neutral A00004       1       1   700 6000      6000          6000 6000       44 0.02272727      0  0       0    1