Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/75.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Sql 将CSV支持的数据帧转换为配置单元表将丢失多行支持_Sql_Scala_Csv_Apache Spark_Hive - Fatal编程技术网

Sql 将CSV支持的数据帧转换为配置单元表将丢失多行支持

Sql 将CSV支持的数据帧转换为配置单元表将丢失多行支持,sql,scala,csv,apache-spark,hive,Sql,Scala,Csv,Apache Spark,Hive,我有一个CSV文件,里面有完整的名称和长格式的消息数据。消息数据采用多行格式,数据中嵌入了换行符。字段已被引用,我已成功地将其解析为Spark数据帧,如下所示: scala> val df =spark.read.option("parserLib", "univocity") .option("multiLine", true) .option("header", true) .option("inferSchema", true) .option("quoteAll", true) .

我有一个CSV文件,里面有完整的名称和长格式的消息数据。消息数据采用多行格式,数据中嵌入了换行符。字段已被引用,我已成功地将其解析为Spark数据帧,如下所示:

scala> val df =spark.read.option("parserLib", "univocity")
.option("multiLine", true)
.option("header", true)
.option("inferSchema", true)
.option("quoteAll", true)
.csv("/data.csv");
df: org.apache.spark.sql.DataFrame = [Name: string, Message: string ... 17 more fields]
scala> sqlContext.sql("select Name from s_events limit 10").show();
+--------------------+
|                Name|
+--------------------+
|              foobar|
|              foobar|
|              foobar|
|              foobar|
|              foobar|
|Sent: Tuesday, 30...|
|To: 'personxyz   ...|
|Subject: RE: ABSD...|
|                    |
|     Hello Person,  |
+--------------------+
这将在
名称
列中生成预期数据:

scala> df.limit(10).select("Name").show
+-------+
|   Name|
+-------+
| foobar|
| foobar|
| foobar|
| foobar|
| foobar|
| foobar|
| foobar|
| foobar|
| foobar|
| foobar|
+-------+
scala> sqlContext.sql("select Name from s_events limit 10").show();
+--------------------+
|                Name|
+--------------------+
|              foobar|
|              foobar|
|              foobar|
|              foobar|
|              foobar|
|Sent: Tuesday, 30...|
|To: 'personxyz   ...|
|Subject: RE: ABSD...|
|                    |
|     Hello Person,  |
+--------------------+
当我尝试将其转换为配置单元表时,会出现问题:

scala> df.createOrReplaceTempView("events")
scala> sqlContext.sql("create table s_events as select * from events");
res52: org.apache.spark.sql.DataFrame = []
scala> sqlContext.sql("select Name from s_events limit 10").show();
+--------------------+
|                Name|
+--------------------+
|              foobar|
|              foobar|
|              foobar|
|              foobar|
|              foobar|
|Sent: Tuesday, 30...|
|To: 'personxyz   ...|
|Subject: RE: ABSD...|
|                    |
|     Hello Person,  |
+--------------------+
现在显示数据表明CSV解析器不再转义嵌入的换行符,而是将它们作为行分隔符进行解析:

scala> sqlContext.sql("select Name from s_events limit 10").show();
+--------------------+
|                Name|
+--------------------+
|              foobar|
|              foobar|
|              foobar|
|              foobar|
|              foobar|
|Sent: Tuesday, 30...|
|To: 'personxyz   ...|
|Subject: RE: ABSD...|
|                    |
|     Hello Person,  |
+--------------------+
以前,我曾尝试直接在Hive中加载,但Hive得到了相同的结果。一些谷歌用户告诉我,在CSV中不支持多行记录的配置单元的解决方案是如上所述通过Spark侧向加载,但这似乎也不起作用

scala> sqlContext.sql("select Name from s_events limit 10").show();
+--------------------+
|                Name|
+--------------------+
|              foobar|
|              foobar|
|              foobar|
|              foobar|
|              foobar|
|Sent: Tuesday, 30...|
|To: 'personxyz   ...|
|Subject: RE: ABSD...|
|                    |
|     Hello Person,  |
+--------------------+

有没有办法让Hive相信,带引号的区域中的换行符现在是行分隔符,或者在尝试加载数据之前需要清理数据?

这通常是由于源文件中的格式问题。能否共享源文件的示例数据集或快照?这通常是由于源文件中的格式问题造成的。您可以共享源文件的示例数据集或快照吗?
scala> sqlContext.sql("select Name from s_events limit 10").show();
+--------------------+
|                Name|
+--------------------+
|              foobar|
|              foobar|
|              foobar|
|              foobar|
|              foobar|
|Sent: Tuesday, 30...|
|To: 'personxyz   ...|
|Subject: RE: ABSD...|
|                    |
|     Hello Person,  |
+--------------------+