Scala 如何从逗号分隔的字符串中提取最后一个元素?

Scala 如何从逗号分隔的字符串中提取最后一个元素?,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,使用此查询: sql("SELECT _location, count(1) FROM tablaTemporal group by _location order by 2 desc" ) 我收到以下输出: +--------------------------------+--------+ |_location |count(1)| +--------------------------------+--------+ |London, Uni

使用此查询:

sql("SELECT _location, count(1) FROM tablaTemporal group by _location order by 2 desc" )
我收到以下输出:

+--------------------------------+--------+
|_location                       |count(1)|
+--------------------------------+--------+
|London, United Kingdom          |15      |
|United States                   |12      |
|Bangalore, India                |8       |
|Hyderabad, India                |7       |
|Paris, France                   |6       |
|San Francisco, CA, United States|6       |
|Mountain View, CA, United States|4       |
|Pune, India                     |4       |
|Bengaluru, Karnataka, India     |3       |
+--------------------------------+--------+
但我需要的结果是:

+--------------------------------+--------+
|_location                       |count(1)|
+--------------------------------+--------+
|United States                   |22      |
|India                           |22      | 
|United Kingdom                  |15      |
|France                          |6       |
+--------------------------------+--------+
因此,我需要使用以下句子:

sql("SELECT SubstringOfLocationFromCharComma(_location), count(1) FROM tablaTemporal group by _location order by 2 desc" )

如何从逗号分隔的字符串中提取最后一个元素?

您可以使用
regexp\u extract

import org.apache.spark.sql.functions._

val df = Seq(
  "London, United Kingdom", "Bengaluru, Karnataka, India"
).toDF("_location")

df.select(regexp_extract($"_location", ".*,([^,]*)$", 1).alias("country")).show

// +---------------+
// |        country|
// +---------------+
// | United Kingdom|
// |          India|
// +---------------+

您可以使用
regexp\u extract

import org.apache.spark.sql.functions._

val df = Seq(
  "London, United Kingdom", "Bengaluru, Karnataka, India"
).toDF("_location")

df.select(regexp_extract($"_location", ".*,([^,]*)$", 1).alias("country")).show

// +---------------+
// |        country|
// +---------------+
// | United Kingdom|
// |          India|
// +---------------+

由于国家名称是逗号后的最后一个元素,因此也可以执行以下操作:

df.show(false)
+--------------------------------+
|a                               |
+--------------------------------+
|Mountain View, CA, United States|
|Pune, India                     |
|Bengaluru, Karnataka, India     |
+--------------------------------+


df.withColumn("a" , split($"a", ",") ).withColumn("a" , expr("a[ size(a) -1 ] ") ).show
+--------------+
|a             |
+--------------+
| United States|
| India        |
| India        |
+--------------+

然后是一个
groupBy($“a”).agg(sum($“count(1)”).as(“count”)
,以获得所需的结果。

由于国家名称是逗号后的最后一个元素,您还可以执行以下操作:

df.show(false)
+--------------------------------+
|a                               |
+--------------------------------+
|Mountain View, CA, United States|
|Pune, India                     |
|Bengaluru, Karnataka, India     |
+--------------------------------+


df.withColumn("a" , split($"a", ",") ).withColumn("a" , expr("a[ size(a) -1 ] ") ).show
+--------------+
|a             |
+--------------+
| United States|
| India        |
| India        |
+--------------+

接下来是一个
groupBy($“a”).agg(sum($“count(1)”).as(“count”)
,以实现所需的结果。

有一种说法,当您遇到问题并使用正则表达式来解决时,您会立即遇到两个问题:)好的ol'
拆分
函数有什么问题吗?@JacekLaskowski这不是可怜的开发人员所说的吗?StackOverflow工作不正常。scala:106:value$不是StringContext的成员[error]val df2=df。选择(regexp_extract($“\u location”,“*,([^,]*)$”,1)。别名(“locationSplitted”))可能是因为我在使用SBT,因为在作用域中没有隐式转换。有一句话说,当你遇到问题并使用正则表达式来解决它时,你会立即遇到两个问题:)好的ol'
split
函数有什么问题吗?@JacekLaskowski不是可怜的开发人员说的吗?StackOverflow工作不正常。scala:106:value$不是StringContext[error]val df2=df的成员。select(regexp_extract($“\u location”,“*,([^,]*)$”,1)。别名(“locationSplitted”))可能是因为我正在使用SBTBE,因为范围中没有隐式转换