Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
eclipse上的Scala:将csv读取为数据帧会引发java.lang.ArrayIndexOutOfBoundsException_Eclipse_Scala_Csv_Apache Spark_Dataframe - Fatal编程技术网

eclipse上的Scala:将csv读取为数据帧会引发java.lang.ArrayIndexOutOfBoundsException

eclipse上的Scala:将csv读取为数据帧会引发java.lang.ArrayIndexOutOfBoundsException,eclipse,scala,csv,apache-spark,dataframe,Eclipse,Scala,Csv,Apache Spark,Dataframe,试图读取一个简单的csv文件并将其加载到数据帧中会引发java.lang.ArrayIndexOutOfBoundsException。 由于我是Scala的新手,我可能错过了一些琐碎的东西,但是在google和stackoverflow中进行彻底的搜索并不会导致任何结果 代码如下: import org.apache.spark.sql.SparkSession object TransformInitial { def main(args: A

试图读取一个简单的csv文件并将其加载到数据帧中会引发
java.lang.ArrayIndexOutOfBoundsException。

由于我是Scala的新手,我可能错过了一些琐碎的东西,但是在google和stackoverflow中进行彻底的搜索并不会导致任何结果

代码如下:

    import org.apache.spark.sql.SparkSession


        object TransformInitial {
          def main(args: Array[String]): Unit = {

            val session = SparkSession.builder.master("local").appName("test").getOrCreate()
            val df = session.read.format("csv").option("header", "true").option("inferSchema", "true").option("delimiter",",").load("data_sets/small_test.csv")

            df.show()
          }
        }
small_test.csv尽可能简单:

v1,v2,v3
0,1,2
3,4,5
以下是Maven项目的实际pom:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>Scala_tests</groupId>
  <artifactId>Scala_tests</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <build>
    <sourceDirectory>src</sourceDirectory>
    <resources>
      <resource>
        <directory>src</directory>
        <excludes>
          <exclude>**/*.java</exclude>
        </excludes>
      </resource>
    </resources>
    <plugins>
      <plugin>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.8.0</version>
        <configuration>
          <source>1.8</source>
          <target>1.8</target>
        </configuration>
      </plugin>
      <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->


    </plugins>
  </build>
  <dependencies>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.12</artifactId>
        <version>2.4.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.12</artifactId>
        <version>2.4.0</version>
    </dependency>

  </dependencies>
</project>

4.0.0
Scala_检验
Scala_检验
0.0.1-快照
src
src
**/*.爪哇
maven编译器插件
3.8.0
1.8
1.8
org.apache.spark
spark-core_2.12
2.4.0
org.apache.spark
spark-sql_2.12
2.4.0
代码的执行抛出以下命令

java.lang.ArrayIndexOutOfBoundsException:

18/11/09 12:03:31 INFO FileSourceStrategy: Pruning directories with: 
18/11/09 12:03:31 INFO FileSourceStrategy: Post-Scan Filters: (length(trim(value#0, None)) > 0)
18/11/09 12:03:31 INFO FileSourceStrategy: Output Data Schema: struct<value: string>
18/11/09 12:03:31 INFO FileSourceScanExec: Pushed Filters: 
18/11/09 12:03:31 INFO CodeGenerator: Code generated in 413.859722 ms
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 10582
    at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.accept(BytecodeReadingParanamer.java:563)
    at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.access$200(BytecodeReadingParanamer.java:338)
    at com.thoughtworks.paranamer.BytecodeReadingParanamer.lookupParameterNames(BytecodeReadingParanamer.java:103)
    at com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:90)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.getCtorParams(BeanIntrospector.scala:44)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1(BeanIntrospector.scala:58)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1$adapted(BeanIntrospector.scala:58)
    at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:241)
    at scala.collection.Iterator.foreach(Iterator.scala:929)
    at scala.collection.Iterator.foreach$(Iterator.scala:929)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1417)
    at scala.collection.IterableLike.foreach(IterableLike.scala:71)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:70)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at scala.collection.TraversableLike.flatMap(TraversableLike.scala:241)
    at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:238)
    at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.findConstructorParam$1(BeanIntrospector.scala:58)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$19(BeanIntrospector.scala:176)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:234)
    at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32)
    at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:191)
    at scala.collection.TraversableLike.map(TraversableLike.scala:234)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:227)
    at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:191)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$14(BeanIntrospector.scala:170)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$14$adapted(BeanIntrospector.scala:169)
    at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:241)
    at scala.collection.immutable.List.foreach(List.scala:389)
    at scala.collection.TraversableLike.flatMap(TraversableLike.scala:241)
    at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:238)
    at scala.collection.immutable.List.flatMap(List.scala:352)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.apply(BeanIntrospector.scala:169)
    at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$._descriptorFor(ScalaAnnotationIntrospectorModule.scala:22)
    at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$.fieldName(ScalaAnnotationIntrospectorModule.scala:30)
    at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$.findImplicitPropertyName(ScalaAnnotationIntrospectorModule.scala:78)
    at com.fasterxml.jackson.databind.introspect.AnnotationIntrospectorPair.findImplicitPropertyName(AnnotationIntrospectorPair.java:467)
    at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector._addFields(POJOPropertiesCollector.java:351)
    at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.collectAll(POJOPropertiesCollector.java:283)
    at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.getJsonValueMethod(POJOPropertiesCollector.java:169)
    at com.fasterxml.jackson.databind.introspect.BasicBeanDescription.findJsonValueMethod(BasicBeanDescription.java:223)
    at com.fasterxml.jackson.databind.ser.BasicSerializerFactory.findSerializerByAnnotations(BasicSerializerFactory.java:348)
    at com.fasterxml.jackson.databind.ser.BeanSerializerFactory._createSerializer2(BeanSerializerFactory.java:210)
    at com.fasterxml.jackson.databind.ser.BeanSerializerFactory.createSerializer(BeanSerializerFactory.java:153)
    at com.fasterxml.jackson.databind.SerializerProvider._createUntypedSerializer(SerializerProvider.java:1203)
    at com.fasterxml.jackson.databind.SerializerProvider._createAndCacheUntypedSerializer(SerializerProvider.java:1157)
    at com.fasterxml.jackson.databind.SerializerProvider.findValueSerializer(SerializerProvider.java:481)
    at com.fasterxml.jackson.databind.SerializerProvider.findTypedValueSerializer(SerializerProvider.java:679)
    at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:107)
    at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3559)
    at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2927)
    at org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:52)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:142)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
    at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
    at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:339)
    at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
    at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3384)
    at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2545)
    at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3365)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
    at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3365)
    at org.apache.spark.sql.Dataset.head(Dataset.scala:2545)
    at org.apache.spark.sql.Dataset.take(Dataset.scala:2759)
    at org.apache.spark.sql.execution.datasources.csv.TextInputCSVDataSource$.infer(CSVDataSource.scala:232)
    at org.apache.spark.sql.execution.datasources.csv.CSVDataSource.inferSchema(CSVDataSource.scala:68)
    at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:63)
    at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$getOrInferFileFormatSchema$12(DataSource.scala:183)
    at scala.Option.orElse(Option.scala:289)
    at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:180)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
    at TransformInitial$.main(TransformInitial.scala:9)
    at TransformInitial.main(TransformInitial.scala)
java.lang.ArrayIndexOutOfBoundsException:
18/11/09 12:03:31信息文件源策略:使用以下内容修剪目录:
18/11/09 12:03:31信息文件源策略:扫描后过滤器:(长度(修剪(值#0,无))>0)
18/11/09 12:03:31信息文件源策略:输出数据架构:结构
18/11/09 12:03:31信息文件源CanExec:推送筛选器:
18/11/09 12:03:31信息代码生成器:在413.859722毫秒中生成的代码
线程“main”java.lang.ArrayIndexOutOfBoundsException中的异常:10582
在com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.accept上(BytecodeReadingParanamer.java:563)
在com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.access$200(BytecodeReadingParanamer.java:338)
位于com.thoughtworks.paranamer.BytecodeReadingParanamer.lookupParameterNames(BytecodeReadingParanamer.java:103)
位于com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:90)
在com.fasterxml.jackson.module.scala.introspect.beanintrostect$.getCtorParams上(beanintrostect.scala:44)
在com.fasterxml.jackson.module.scala.introspect.beanintrostect$.$anonfun$apply$1(beanintrostect.scala:58)
在com.fasterxml.jackson.module.scala.introspect.beanintrostect$.$anonfun$apply$1$adapted上(beanintrostect.scala:58)
位于scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:241)
位于scala.collection.Iterator.foreach(Iterator.scala:929)
位于scala.collection.Iterator.foreach$(Iterator.scala:929)
位于scala.collection.AbstractIterator.foreach(迭代器.scala:1417)
位于scala.collection.IterableLike.foreach(IterableLike.scala:71)
位于scala.collection.IterableLike.foreach$(IterableLike.scala:70)
位于scala.collection.AbstractIterable.foreach(Iterable.scala:54)
位于scala.collection.TraversableLike.flatMap(TraversableLike.scala:241)
位于scala.collection.TraversableLike.flatMap$(TraversableLike.scala:238)
位于scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
位于com.fasterxml.jackson.module.scala.introspect.BeanTransector$.findConstructorParam$1(BeanTransector.scala:58)
在com.fasterxml.jackson.module.scala.introspect.beanintrostect$.$anonfun$apply$19(beanintrostect.scala:176)
位于scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:234)
在scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32)
位于scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29)
位于scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:191)
位于scala.collection.TraversableLike.map(TraversableLike.scala:234)
位于scala.collection.TraversableLike.map$(TraversableLike.scala:227)
位于scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:191)
在com.fasterxml.jackson.module.scala.introspect.beanintrostect$.$anonfun$apply$14(beanintrostect.scala:170)
在com.fasterxml.jackson.module.scala.introspect.beanintrostect$.$anonfun$apply$14$adapted上(beanintrostect.scala:169)
位于scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:241)
位于scala.collection.immutable.List.foreach(List.scala:389)
位于scala.collection.TraversableLike.flatMap(TraversableLike.scala:241)
位于scala.collection.TraversableLike.flatMap$(TraversableLike.scala:238)
位于scala.collection.immutable.List.flatMap(List.scala:352)
位于com.fasterxml.jackson.module.scala.introspect.beanintrostect$.apply(beanintrostect.scala:169)
位于com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$。\u描述符(ScalaAnnotationIntrospectorModule.scala:22)
位于com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$.fieldName(ScalaAnnotationIntrospectorModule.scala:30)
位于com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$.findimplicitypropertyname(ScalaAnnotationIntrospectorModule.scala:78)
位于com.fasterxml.jackson.databind.introspect.annotationintroscorportpair.findimplicitypropertyname(annotationintroscorportpair.java:467)
位于com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.\u addFields(POJOPropertiesCollector.java:351)
位于com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.collectAll(POJOPropertiesCollector.java:283)
位于com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.getJsonValueMethod(POJOPropertiesCollector.java:169)
在com.fasterxml.jackson.databind.introspect.BasicBeanDescription.findJsonValueMethod(BasicBeanDescription.java:223)上
位于com.fasterxml.jackson.databind.ser.BasicSerializerFactory.FindSerializerByNotations(BasicSerializerFactory.java:348)
在com.fasterxml.jackson.databind.ser.BeanSerializerFactory.\u createSerializer2(BeanSerializerFactory.java:210)
位于com.fasterxml.jackson.databind.ser.BeanSerializerFactory.createSerializer(BeanSerializerFactory.java:153)
位于com.fasterxml.jackson.databind.SerializerProvider.\u createUntypedSerializer(SerializerProvider.java:1203)
访问com.fasterxml.jackson.databi
$ sbt run
import org.apache.spark.sql.SparkSession

// Extending App is more idiomatic than writing a "main" function.
object TransformInitial
extends App {

  val session = SparkSession.builder.master("local").appName("test").getOrCreate()

  // As of Spark 2.0, it's easier to read CSV files.
  val df = session.read.option("header", "true").option("inferSchema", "true").csv("data_sets/small_test.csv")

  df.show()

  // Shutdown gracefully.
  session.stop()
}
<dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.4.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.4.0</version>
    </dependency>