Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/arrays/12.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Arrays spark scala类型安全配置安全迭代特定列名的值_Arrays_Scala_Apache Spark_Typesafe - Fatal编程技术网

Arrays spark scala类型安全配置安全迭代特定列名的值

Arrays spark scala类型安全配置安全迭代特定列名的值,arrays,scala,apache-spark,typesafe,Arrays,Scala,Apache Spark,Typesafe,我在Stackoverflow上发现了类似的帖子。然而,我无法解决我的问题,所以,这就是我写这篇文章的原因 目标 目的是在加载SQL表(我使用SQL Server)时执行列投影[projection=filter columns] 根据scala食谱,这是[使用阵列]过滤列的方法: sqlContext.read.jdbc(url,"person",Array("gender='M'"),prop) 但是,我不想在Scala代码中硬编码数组(“col1”、“col2”、…),这就是为什么我使用

我在Stackoverflow上发现了类似的帖子。然而,我无法解决我的问题,所以,这就是我写这篇文章的原因

目标

目的是在加载SQL表(我使用SQL Server)时执行列投影[projection=filter columns]

根据scala食谱,这是[使用阵列]过滤列的方法:

sqlContext.read.jdbc(url,"person",Array("gender='M'"),prop)
但是,我不想在Scala代码中硬编码数组(“col1”、“col2”、…),这就是为什么我使用带有typesafe的配置文件(见下文)

配置文件

dataset {
    type = sql
    sql{
        url = "jdbc://host:port:user:name:password"
        tablename = "ClientShampooBusinesLimited"
        driver = "driver"
        other = "i have a lot of other single string elements in the config file..."
        columnList = [
        {
            colname = "id"
            colAlias = "identifient"
        }
        {
            colname = "name"
            colAlias = "nom client"
        }
        {
            colname = "age"
            colAlias = "âge client"
        }
        ]
    }
}
lazy val columnList = configFromFile.getList("dataset.sql.columnList")
lazy val dbUrl = configFromFile.getList("dataset.sql.url")
lazy val DbTableName= configFromFile.getList("dataset.sql.tablename")
lazy val DriverName= configFromFile.getList("dataset.sql.driver")
def loadDataSQL(): DataFrame = {

val url = datasetConfig.dbUrl 
val dbTablename = datasetConfig.DbTableName
val dbDriver = datasetConfig.DriverName
val columns = // I need help to solve this


/* EDIT 2 march 2017
   This code should not be used. Have a look at the accepted answer.
*/
sparkSession.read.format("jdbc").options(
    Map("url" -> url,
    "dbtable" -> dbTablename,
    "predicates" -> columns,
    "driver" -> dbDriver))
    .load()
}
让我们关注“columnList”:SQL列的名称与“colname”非常对应colAlias'是我稍后将使用的字段

data.scala文件

dataset {
    type = sql
    sql{
        url = "jdbc://host:port:user:name:password"
        tablename = "ClientShampooBusinesLimited"
        driver = "driver"
        other = "i have a lot of other single string elements in the config file..."
        columnList = [
        {
            colname = "id"
            colAlias = "identifient"
        }
        {
            colname = "name"
            colAlias = "nom client"
        }
        {
            colname = "age"
            colAlias = "âge client"
        }
        ]
    }
}
lazy val columnList = configFromFile.getList("dataset.sql.columnList")
lazy val dbUrl = configFromFile.getList("dataset.sql.url")
lazy val DbTableName= configFromFile.getList("dataset.sql.tablename")
lazy val DriverName= configFromFile.getList("dataset.sql.driver")
def loadDataSQL(): DataFrame = {

val url = datasetConfig.dbUrl 
val dbTablename = datasetConfig.DbTableName
val dbDriver = datasetConfig.DriverName
val columns = // I need help to solve this


/* EDIT 2 march 2017
   This code should not be used. Have a look at the accepted answer.
*/
sparkSession.read.format("jdbc").options(
    Map("url" -> url,
    "dbtable" -> dbTablename,
    "predicates" -> columns,
    "driver" -> dbDriver))
    .load()
}
configFromFile是我自己在另一个自定义类中创建的。但这并不重要。columnList的类型为“ConfigList”。此类型来自typesafe

主文件

dataset {
    type = sql
    sql{
        url = "jdbc://host:port:user:name:password"
        tablename = "ClientShampooBusinesLimited"
        driver = "driver"
        other = "i have a lot of other single string elements in the config file..."
        columnList = [
        {
            colname = "id"
            colAlias = "identifient"
        }
        {
            colname = "name"
            colAlias = "nom client"
        }
        {
            colname = "age"
            colAlias = "âge client"
        }
        ]
    }
}
lazy val columnList = configFromFile.getList("dataset.sql.columnList")
lazy val dbUrl = configFromFile.getList("dataset.sql.url")
lazy val DbTableName= configFromFile.getList("dataset.sql.tablename")
lazy val DriverName= configFromFile.getList("dataset.sql.driver")
def loadDataSQL(): DataFrame = {

val url = datasetConfig.dbUrl 
val dbTablename = datasetConfig.DbTableName
val dbDriver = datasetConfig.DriverName
val columns = // I need help to solve this


/* EDIT 2 march 2017
   This code should not be used. Have a look at the accepted answer.
*/
sparkSession.read.format("jdbc").options(
    Map("url" -> url,
    "dbtable" -> dbTablename,
    "predicates" -> columns,
    "driver" -> dbDriver))
    .load()
}
所以我的问题就是提取“colnames”值,以便将它们放入合适的数组中。有人能帮我写一下右边的“val列”吗


如果您正在寻找一种将
colname
值列表读入Scala数组的方法,请多谢

——我想这可以做到:

import scala.collection.JavaConverters._

val columnList = configFromFile.getConfigList("dataset.sql.columnList")
val colNames: Array[String] = columnList.asScala.map(_.getString("colname")).toArray
对于提供的文件,这将导致
数组(id、名称、年龄)

编辑: 至于您的实际目标,我实际上不知道任何名为
预测的选项(使用Spark 2.0.2,我也无法在源代码中找到证据)

JDBC数据源根据所用查询中选择的实际列执行“投影下推”。换句话说,只有选定的列才会从DB中读取,因此您可以在创建DF后立即在
select
中使用
colNames
数组,例如:

import org.apache.spark.sql.functions._

sparkSession.read
  .format("jdbc")
  .options(Map("url" -> url, "dbtable" -> dbTablename, "driver" -> dbDriver))
  .load()
  .select(colNames.map(col): _*) // selecting only desired columns

亲爱的Tzach Zohar,这正是我想要的。非常感谢您的帮助。但是,我在“prediction”->列中遇到一个错误,它说的是“重载”方法。你知道问题是什么吗?感谢您不确定您所指的错误,但我更新了我的答案,希望能帮助您实现仅从DBHello中读取选定列的实际目标。很抱歉,混淆了它不是“谓词”,而是“谓词”。我会编辑这篇文章。但是,你提供给我的解决方案非常好。它就像一个符咒。非常感谢您提供有关“投影下推”的信息。乍一看我很不情愿,因为我想它会先加载所有的列,然后再进行投影。然而,现在我有了“投影下推”的概念。当做