Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用scala spark在配置单元中插入固定宽度的文件_Scala_Apache Spark_Hive_Apache Spark Sql_Hiveql - Fatal编程技术网

使用scala spark在配置单元中插入固定宽度的文件

使用scala spark在配置单元中插入固定宽度的文件,scala,apache-spark,hive,apache-spark-sql,hiveql,Scala,Apache Spark,Hive,Apache Spark Sql,Hiveql,我有这样的样本文件记录 2018-01-1509.05.540000000000001000000751111EMAIL@AAA.BB.CL 上面的记录来自一个固定长度的文件,我想根据长度进行分割 当我拆分时,我得到一个列表,如下所示 ListBuffer(2018-01-15, 09.05.54, 00000000000010000007, 5, 1111, EMAIL@AAA.BB.CL) 到目前为止,一切看起来都很好。但我不知道为什么在列表中的每个字段中都添加了额外的空间(不是第一个字

我有这样的样本文件记录

2018-01-1509.05.540000000000001000000751111EMAIL@AAA.BB.CL
上面的记录来自一个固定长度的文件,我想根据长度进行分割 当我拆分时,我得到一个列表,如下所示

ListBuffer(2018-01-15, 09.05.54, 00000000000010000007, 5, 1111, EMAIL@AAA.BB.CL)
到目前为止,一切看起来都很好。但我不知道为什么在列表中的每个字段中都添加了额外的空间(不是第一个字段)

我的拆分逻辑如下所示

val lengths = List("10", "8", "20", "1", "4","15")

// Logic to Split the Line based on the lengths
  def splitLineBasedOnLengths(line: String, lengths: List[String]): ListBuffer[Any] = {
    var splittedLine = line
    var split = new ListBuffer[Any]()
    for (i <- lengths) yield {
      var c = i.toInt
      var fi = splittedLine.take(c)
      split += fi
      splittedLine = splittedLine.drop(c)
    }
    split
  }

有谁能帮我解释一下,为什么拆分后每个字段前都会有额外的空间?

这并没有给我空间,而是使用了更惯用的Scala:

def splitThis(line: String, lengths: List[String]): List[String] = {
  def loop(l: String, ls: List[Int], acc: Seq[String]): Seq[String] = 
    if (l.isEmpty || ls.isEmpty) acc else loop(l.drop(ls.head), ls.tail, acc :+ 
l.take(ls.head))
  loop(line, lengths.map(_.toInt), Seq.empty).toList
}

问题在于您的数据,请在下面尝试

在您的数据中,在“,”之间有额外的空间

When we insert into hive because of this issue every column except the first is getting increased by one character

when I use length(COLUMN NAME) it is showing one character extra ie space for every column
def splitThis(line: String, lengths: List[String]): List[String] = {
  def loop(l: String, ls: List[Int], acc: Seq[String]): Seq[String] = 
    if (l.isEmpty || ls.isEmpty) acc else loop(l.drop(ls.head), ls.tail, acc :+ 
l.take(ls.head))
  loop(line, lengths.map(_.toInt), Seq.empty).toList
}
  ListBuffer(2018-01-15,09.05.54,00000000000010000007,5,1111,EMAIL@AAA.BB.CL)