Scala：用单引号替换双引号_Scala_Dataframe_Spark Dataframe_Double Quotes_Single Quotes

Scala：用单引号替换双引号

scala dataframe

Scala：用单引号替换双引号,scala,dataframe,spark-dataframe,double-quotes,single-quotes,Scala,Dataframe,Spark Dataframe,Double Quotes,Single Quotes,在Scala中如何用双引号替换单引号？我有一个数据文件，其中有些记录带有“abc”（双引号）。我需要用单引号替换这些引号，并将其转换为数据帧 val customSchema_1 = StructType(Array( StructField("ID", StringType, true), StructField("KEY", StringType, true), StructField("CODE", StringType, true)) val df_1

在Scala中如何用双引号替换单引号？我有一个数据文件，其中有些记录带有“abc”（双引号）。我需要用单引号替换这些引号，并将其转换为数据帧

val customSchema_1 =        
  StructType(Array(
  StructField("ID", StringType, true),
  StructField("KEY", StringType, true),
  StructField("CODE", StringType, true))

val df_1 = sqlContext.read
  .format("com.databricks.spark.csv")
  .option("delimiter", "¦")
  .schema(customSchema_1)
  .load("example")

逐行阅读您的文件，并对每个文件应用以下示例：

val text: String = """Here is a lot of text and "quotes" so you may think that everything is ok until you see something "special" or "weird"
"""

text.replaceAll("\"", "'")

这将为您提供一个带引号而不是双引号的新字符串值。

您可以创建一个简单的自定义项，用单引号替换双引号

下面是一个简单的例子

import org.apache.spark.sql.functions.udf

val removeDoubleQuotes = udf( (x:String) => s.replace("\"","'"))

//If df is the dataframe and use the udf to colName to replace " with '

df.withColumn("colName", removeDoubleQuotes($"colName"))

希望这有帮助

哪一列有双引号？你的spark版本是什么？我使用的是spark core 1.6.0。引号中的数据分散在各处一些数据在列中有引号，而其他数据没有引号。这听起来像是一个问题，使用bash脚本可能更容易解决，但您基本上需要编写一个正则表达式，它将在双引号中查找所有双引号（对于您的列字符串）用单引号代替它们这里有一个使用sed的例子：谢谢你的建议！如果您使用的是数据帧，如何实现这一点？数据帧中是否有允许此操作的函数？如何在PySpark中执行相同的操作，尤其是

val removeDoubleQuotes=udf（（x:String）=>s.replace（“\”，“”）