r-将一个数据帧强制转换为另一个数据帧的结构
我希望根据某些标准,强制一个数据帧与另一个数据帧的结构相匹配 示例数据r-将一个数据帧强制转换为另一个数据帧的结构,r,R,我希望根据某些标准,强制一个数据帧与另一个数据帧的结构相匹配 示例数据 ## to be populated: df_final <- data.frame("a"=numeric(), "b"=numeric(), "c"=numeric(), "l"=integer(), "m"=integer(), "n"=integer(), "x"=numeric(), "y"=numeric(),
## to be populated:
df_final <- data.frame("a"=numeric(), "b"=numeric(), "c"=numeric(),
"l"=integer(), "m"=integer(), "n"=integer(),
"x"=numeric(), "y"=numeric(), "z"=numeric())
> df_final
[1] a b c l m n x y z
<0 rows> (or 0-length row.names)
## data to coerce into df_final
df_data <- data.frame(col1=c(21.3,23.1,22.2),
col2=c(23.22,64.2,46.2),
col3=c(NA_integer_,2L,3L),
col4=c(23.2, 90.2,9.1))
> df_data
col1 col2 col3 col4
1 21.3 23.22 NA 23.2
2 23.1 64.20 2 90.2
3 22.2 46.20 3 9.1
我不确定这样做的最佳方式;目前我正在考虑在每一行上使用正则表达式,在“整数”之前查找所有“十进制”numer,然后查找所有整数,然后在整数之后查找所有“小数”,但目前这似乎过于复杂,我希望有一种更简单的方法我忽略了 此解决方案仅依赖于R能够识别
df_数据中的整数列
。如果其中一列未被读取为整数(例如,如果其中满是NAs),则可能会失败
在我看来,用NAs预先分配
df_final
,然后索引分配df_data
中的列是最有意义的。唯一的技巧是确定需要分配哪些列
我看到您想要右对齐(可以说)列集中的列。因此,该要求相当于我所说的df_final
的反转列类型中df_data
的反转列类型的“累积匹配”。换句话说,您需要从右向左遍历df_data
和df_final
的列类型,并找到下一个匹配项(从向右方向)
我知道R中有各种非累积/累积的函数对,例如sum()
/cumsum()
,prod()
/cumprod()
,min()
/cummin()
和max()
(实际上我认为只有这些函数),但似乎没有任何类型的函数“累积匹配”函数。所以我写了自己的:
cummatch <- function(small,big) {
cur <- 1L;
res <- integer();
biglen <- length(big);
for (s in small) {
if (cur > biglen) break;
rescur <- match(s,big[cur:biglen])+cur-1L;
if (is.na(rescur)) break;
res[length(res)+1L] <- rescur;
cur <- rescur+1L;
};
length(res) <- length(small);
return(res);
};
您如何知道缺少哪些列?例如,如果中间没有整数列,您将如何分配它们?此外,您没有将
l
、m
、n
设置为整数列。也没有将col3
设置为整数列。它应该是col3=c(NA_integer\uu2l,3L)
@DavidArenburg修复了整数。在数字十进制列之间始终至少有一个整数列(但可能有NA
s)。我感谢您的思考和努力!
nr <- nrow(df_data)
# Define rows corresponding to sets 1,2,3
set2 <- which(sapply(df_data, class) == "integer")
set1 <- 1:(min(set2)-1)
set3 <- (max(set2)+1):length(df_data)
# Build the three components of df_final
part1 <- cbind(matrix(NA_real_, nrow=nr, ncol=3-length(set1)), df_data[,set1])
part2 <- cbind(matrix(NA_integer_, nrow=nr, ncol=3-length(set2)), df_data[,set2])
part3 <- cbind(matrix(NA_integer_, nrow=nr, ncol=3-length(set3)), df_data[,set3])
# Put it together and save column names
df_final <- data.frame(part1, part2, part3)
colnames(df_final) <- c("a","b","c","l","m","n","x","y","z")
> df_final
a b c l m n x y z
1 NA 21.3 23.22 NA NA NA NA NA 23.2
2 NA 23.1 64.20 NA NA 2 NA NA 90.2
3 NA 22.2 46.20 NA NA 3 NA NA 9.1
cummatch <- function(small,big) {
cur <- 1L;
res <- integer();
biglen <- length(big);
for (s in small) {
if (cur > biglen) break;
rescur <- match(s,big[cur:biglen])+cur-1L;
if (is.na(rescur)) break;
res[length(res)+1L] <- rescur;
cur <- rescur+1L;
};
length(res) <- length(small);
return(res);
};
cis <- ncol(df_final)+1L-rev(cummatch(rev(sapply(df_data,typeof)),rev(sapply(df_final,typeof))));
cis;
## [1] 2 3 6 9
df_final[nrow(df_data),1] <- NA; ## preallocate rows of NA
df_final;
## a b c l m n x y z
## 1 NA NA NA NA NA NA NA NA NA
## 2 NA NA NA NA NA NA NA NA NA
## 3 NA NA NA NA NA NA NA NA NA
df_final[cis] <- df_data;
df_final;
## a b c l m n x y z
## 1 NA 21.3 23.22 NA NA NA NA NA 23.2
## 2 NA 23.1 64.20 NA NA 2 NA NA 90.2
## 3 NA 22.2 46.20 NA NA 3 NA NA 9.1
cppFunction('
using namespace Rcpp;
#define ___RCPP_HANDLE_CASE___2( ___RTYPE___ , ___FUN___ , ___OBJECT___1 , ___OBJECT___2 , ___RCPPTYPE___ ) \\
case ___RTYPE___ : \\
return ___FUN___( ::Rcpp::___RCPPTYPE___< ___RTYPE___ >( ___OBJECT___1 ), ::Rcpp::___RCPPTYPE___< ___RTYPE___ >( ___OBJECT___2 ) ) ;
#define ___RCPP_RETURN___2( __FUN__, __SEXP__1 , __SEXP__2, __RCPPTYPE__ ) \\
SEXP __TMP__1 = __SEXP__1 ; \\
SEXP __TMP__2 = __SEXP__2 ; \\
unsigned int __TMP__1_TYPE = TYPEOF( __TMP__1 ); \\
unsigned int __TMP__2_TYPE = TYPEOF( __TMP__2 ); \\
unsigned int __TMP__TYPE = __TMP__1_TYPE == RAWSXP ? __TMP__2_TYPE : __TMP__2_TYPE == RAWSXP ? __TMP__1_TYPE : std::max(__TMP__1_TYPE,__TMP__2_TYPE); /* note: the SEXPTYPE enumeration order *almost* aligns with the R type promotion rules; only raw is out-of-order, so we can test for that first, then use std::max() */ \\
if (__TMP__1_TYPE < LGLSXP || __TMP__2_TYPE < LGLSXP) __TMP__TYPE = 0; \\
switch( __TMP__TYPE ) { \\
___RCPP_HANDLE_CASE___2( INTSXP , __FUN__ , __TMP__1 , __TMP__2 , __RCPPTYPE__ ) \\
___RCPP_HANDLE_CASE___2( REALSXP , __FUN__ , __TMP__1 , __TMP__2 , __RCPPTYPE__ ) \\
___RCPP_HANDLE_CASE___2( RAWSXP , __FUN__ , __TMP__1 , __TMP__2 , __RCPPTYPE__ ) \\
___RCPP_HANDLE_CASE___2( LGLSXP , __FUN__ , __TMP__1 , __TMP__2 , __RCPPTYPE__ ) \\
___RCPP_HANDLE_CASE___2( CPLXSXP , __FUN__ , __TMP__1 , __TMP__2 , __RCPPTYPE__ ) \\
___RCPP_HANDLE_CASE___2( STRSXP , __FUN__ , __TMP__1 , __TMP__2 , __RCPPTYPE__ ) \\
/* no == for generic ___RCPP_HANDLE_CASE___2( VECSXP , __FUN__ , __TMP__1 , __TMP__2 , __RCPPTYPE__ ) */ \\
/* no == for expression ___RCPP_HANDLE_CASE___2( EXPRSXP , __FUN__ , __TMP__1 , __TMP__2 , __RCPPTYPE__ ) */ \\
default: \\
throw std::range_error( "not a vector" ) ; \\
}
#define RCPP_RETURN_VECTOR2( _FUN_, _SEXP_1, _SEXP_2 ) ___RCPP_RETURN___2( _FUN_, _SEXP_1, _SEXP_2, Vector )
#define RCPP_RETURN_MATRIX2( _FUN_, _SEXP_1, _SEXP_2 ) ___RCPP_RETURN___2( _FUN_, _SEXP_1, _SEXP_2, Matrix )
template<typename T> IntegerVector cummatch_impl(T small, T big ) {
int smalllen = LENGTH(small);
IntegerVector res(smalllen,NA_INTEGER);
int cur = 0;
int biglen = LENGTH(big);
for (int si = 0; si < smalllen; ++si) {
int rescur = NA_INTEGER;
for (int bi = cur; bi < biglen; ++bi) {
if (small(si) == big(bi)) {
rescur = bi;
break;
}
}
if (rescur == NA_INTEGER) break;
res(si) = rescur+1;
cur = rescur+1;
}
return res;
}
// [[Rcpp::export]]
IntegerVector cummatch(SEXP small, SEXP big ) { RCPP_RETURN_VECTOR2(cummatch_impl,small,big); }
');