Scala SparkSQL中的WeekOfYear列变为null

Scala SparkSQL中的WeekOfYear列变为null,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,这里我正在为spark.SQL编写SQL语句,但是我无法将WEEKOFYEAR转换为一年中的week,并且无法在输出中获得null 下面我展示了我使用的表达式 输入数据 InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country 536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,01-12-2010 8.26,2.55,17850,Unite

这里我正在为spark.SQL编写SQL语句,但是我无法将WEEKOFYEAR转换为一年中的week,并且无法在输出中获得null 下面我展示了我使用的表达式

输入数据

InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,01-12-2010 8.26,2.55,17850,United Kingdom
536365,71053,WHITE METAL LANTERN,6,01-12-2010 8.26,3.39,17850,United Kingdom
536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,01-12-2010 8.26,2.75,17850,United Kingdom
536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,01-12-2010 8.26,3.39,17850,United Kingdom
SQL代码

val summarySQlTest = spark.sql(
  """
    |select Country,WEEKOFYEAR(InvoiceDate)as WeekNumber,
    |count(distinct(InvoiceNo)) as NumInvoices,
    |sum(Quantity) as TotalQuantity,
    |round(sum(Quantity*UnitPrice),2) as InvoiceValue
    |from sales
    |group by Country,WeekNumber
    |""".stripMargin
).show()
期望输出

     +--------------+----------+-----------+-------------+------------+
     |       Country|WeekNumber|NumInvoices|TotalQuantity|InvoiceValue|
     +--------------+----------+-----------+-------------+------------+
     |         Spain|        49|          1|           67|      174.72|
     |       Germany|        48|         11|         1795|     3309.75|
我得到的输出

    +--------------+----------+-----------+-------------+------------+
    |       Country|WeekNumber|NumInvoices|TotalQuantity|InvoiceValue|
    +--------------+----------+-----------+-------------+------------+
    |         Spain|      null|          1|           67|      174.72|
    |       Germany|      null|         11|         1795|     3309.75|
对于所需的输出,我使用了这个,但我想在spark.sql中解决同样的问题

如果有人能解释一下这里到底发生了什么,那就太好了 (截止日期(“发票日期”),“dd MM yyyy H.MM”)


您需要先将
InvoiceDate
列转换为日期类型(使用
转换为日期),然后才能调用
weekofyear
。我想这也回答了您的最后一个问题

val summarySQlTest = spark.sql(
  """
    |select Country,WEEKOFYEAR(to_date(InvoiceDate,'dd-MM-yyyy H.mm')) as WeekNumber,
    |count(distinct(InvoiceNo)) as NumInvoices,
    |sum(Quantity) as TotalQuantity,
    |round(sum(Quantity*UnitPrice),2) as InvoiceValue
    |from sales
    |group by Country,WeekNumber
    |""".stripMargin
).show()

Spark在这里只是默默地返回一个null,而不是一个异常,因为它试图使用weekofyear和一个字符串?@Andrew我猜它现在正在运行fine@mck我想问你,我们给出了这个格式=>'dd-MM-yyy-H.MM',因为我们需要将已经是日期格式的字符串转换为实际的日期格式我的发票日期不包含H.mm,那么我应该写“dd-mm-yyyy”?帮助我理解这一点是的,日期格式应该与字符串列的日期格式相对应。Spark不能将字符串理解为日期,除非您使用
to_-date
并提供正确的日期格式将其显式转换为日期。
val summarySQlTest = spark.sql(
  """
    |select Country,WEEKOFYEAR(to_date(InvoiceDate,'dd-MM-yyyy H.mm')) as WeekNumber,
    |count(distinct(InvoiceNo)) as NumInvoices,
    |sum(Quantity) as TotalQuantity,
    |round(sum(Quantity*UnitPrice),2) as InvoiceValue
    |from sales
    |group by Country,WeekNumber
    |""".stripMargin
).show()