Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/logging/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在SparkyR中使用sdf_pivot()并连接字符串?_R_Sparklyr - Fatal编程技术网

如何在SparkyR中使用sdf_pivot()并连接字符串?

如何在SparkyR中使用sdf_pivot()并连接字符串?,r,sparklyr,R,Sparklyr,我试图使用Sparkyr中的sdf_pivot()函数将长格式数据帧“收集”为宽格式。变量的值是我想要连接的字符串 下面是一个简单的例子,我认为应该有效,但没有: library(sparkylr) d <- data.frame(id=c("1", "1", "2", "2", "1", "2"), x=c("200", "200", "200", "201", "201", "201"), y=c("This",

我试图使用Sparkyr中的sdf_pivot()函数将长格式数据帧“收集”为宽格式。变量的值是我想要连接的字符串

下面是一个简单的例子,我认为应该有效,但没有:

library(sparkylr)
d <- data.frame(id=c("1", "1", "2", "2", "1", "2"), 
                 x=c("200", "200", "200", "201", "201", "201"), 
                 y=c("This", "That", "The", "Other", "End", "End"))
d_sdf <- copy_to(sc, d, "d")
sdf_pivot(d_sdf, id ~ x, paste)
不幸的是,这给了我一个错误说明:

Error in as.vector(x, "character") : 
  cannot coerce type 'environment' to vector of type 'character'
我还尝试使用了
“collect\u list”
,这导致了以下错误:

Error: java.lang.IllegalArgumentException: invalid method collect_list 
 for object 641

有什么方法可以实现我想做的吗?

我深入研究了
sdf\u pivot
的测试,似乎您可以在自定义
乐趣中使用
调用
。聚合
函数访问
收集列表
函数:

 fun.aggregate <- function(gdf) {

  expr <- invoke_static(
    sc,
    "org.apache.spark.sql.functions",
    "expr",
    "collect_list(y)" #this is your own "y" variable
  )

  gdf %>% invoke("agg", expr, list())
}
这确实起到了作用:

> d_sdf_wide
Source:     table<sparklyr_tmp_69c14424c5a4> [?? x 3]
Database:   spark connection master=local[8] app=sparklyr local=TRUE

     id      `200`      `201`
  <chr>     <list>     <list>
1     1 <list [2]> <list [1]>
2     2 <list [1]> <list [2]>

(或者,您可以编写一个复杂的sql查询,但我没有尝试过)

您能帮我回答这个问题吗:
d_sdf_wide <- sdf_pivot(d_sdf, id ~ x, fun.aggregate)
> d_sdf_wide
Source:     table<sparklyr_tmp_69c14424c5a4> [?? x 3]
Database:   spark connection master=local[8] app=sparklyr local=TRUE

     id      `200`      `201`
  <chr>     <list>     <list>
1     1 <list [2]> <list [1]>
2     2 <list [1]> <list [2]>
d_sdf_wide %>% mutate(liststring = paste(`200`))

     id      `200`      `201` liststring
  <chr>     <list>     <list>      <chr>
1     1 <list [2]> <list [1]>  This That
2     2 <list [1]> <list [2]>        The