Scala 如何实现字符串contains with equals ignore case来过滤数据帧的内容?
我有一个json文件,格式如下:Scala 如何实现字符串contains with equals ignore case来过滤数据帧的内容?,scala,apache-spark,Scala,Apache Spark,我有一个json文件,格式如下: {Continent: Asia, Description: Biggest continent in the WORLD}, {Continent: Europe, Description: Coldest countries in the world} {Continent: Africa, Description: Second continent in the WorLD} {Continent: Australia, Description: The
{Continent: Asia, Description: Biggest continent in the WORLD},
{Continent: Europe, Description: Coldest countries in the world}
{Continent: Africa, Description: Second continent in the WorLD}
{Continent: Australia, Description: The only country & continent in the world}
我试图过滤数据框中包含字符串world
的行。
我为此编写了以下代码
val continents = spark.read.json("path/to/input.json")
筛选列说明
中包含单词world
的行
continents.filter($"Description".contains("world"))
仅当world
完全小写时,上述行才会过滤行
有什么方法可以在忽略大小写的情况下应用过滤器吗?为了进行比较,您可以将整个描述转换为
大写
或小写
大小写,在以下情况下,我已将描述转换为小写进行比较
import org.apache.spark.sql.functions.lower
continents.filter(lower($"Description").contains("world"))
与列一起使用可将说明更改为小写,并添加更复杂的函数。筛选出新列
from pyspark.sql.functions import lower, col
continents.withColumn('lower_desc', lower(col('Description'))
continents.filter($"lower_desc".contains("world"))
不需要添加新列