Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 列值中的逗号_Apache Spark - Fatal编程技术网

Apache spark 列值中的逗号

Apache spark 列值中的逗号,apache-spark,Apache Spark,我正在使用spark 3,下面是我读取CSV文件的代码 package spark.ny.project; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.SparkSession; public class MainApp { public static void main(String[] args) { // TO

我正在使用spark 3,下面是我读取CSV文件的代码

package spark.ny.project;

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;

public class MainApp {

    public static void main(String[] args) {
        // TODO Auto-generated method stub
        
        SparkSession session = SparkSession.builder().appName("Sample App").master("local[*]").getOrCreate();
        
        session.sparkContext().setLogLevel("ERROR");
        
        Dataset<Row> df = session.read().format("com.databricks.spark.csv").option("quote","*" ).option("sep", ",")
                .load("/home/deepak/sample_dataset/*.csv");
        df.printSchema();
        df.show(false);
        
        

    }

}
下面是我得到的输出

root
 |-- CallType : string (nullable = true)
 |--  Methods: string (nullable = true)

+---------+---------------+
|CallType | Methods       |
+---------+---------------+
|Internal | *ApplyChanges |
+---------+---------------+
“方法”列中缺少第二个值。如果我参考文档,当使用quote选项时,引号内的分隔符将被忽略


我不知道为什么会得到这样的输出,如何告诉spark不要将列值中的逗号作为分隔符。

我怀疑这是因为空格。引号必须紧跟在分隔符(逗号)后面。您可以通过指定
ignoreLeadingWhiteSpace
ignoreTrailingWhiteSpace
来解决这个问题

Dataset<Row> df = session.read().format("com.databricks.spark.csv")
                         .option("quote", "*")
                         .option("sep", ",")
                         .option("ignoreLeadingWhiteSpace", "true")
                         .option("ignoreTrailingWhiteSpace", "true")
                         .load("/home/deepak/sample_dataset/*.csv");
Dataset df=session.read().format(“com.databricks.spark.csv”)
.选项(“报价单”、“*”)
.期权(“sep”、“、”)
.选项(“忽略前导空格”、“真”)
.选项(“忽略跟踪空白”、“真”)
.load(“/home/deepak/sample_dataset/*.csv”);
Dataset<Row> df = session.read().format("com.databricks.spark.csv")
                         .option("quote", "*")
                         .option("sep", ",")
                         .option("ignoreLeadingWhiteSpace", "true")
                         .option("ignoreTrailingWhiteSpace", "true")
                         .load("/home/deepak/sample_dataset/*.csv");