使用Scala从表中提取特定数据？_Scala_Dataframe

使用Scala从表中提取特定数据？

scala dataframe

使用Scala从表中提取特定数据？,scala,dataframe,Scala,Dataframe,下面是一个示例DF： Date Party name Symbol Buy/Sell indicator # of shares trade price 2011-01-03 American Funds EuPc;A AAPL BUY 2400 332.87 2011-02-14 American Funds CWGI;A

下面是一个示例DF：

Date            Party name                  Symbol  Buy/Sell indicator   # of shares   trade price
2011-01-03      American Funds EuPc;A       AAPL    BUY                     2400          332.87
2011-02-14      American Funds CWGI;A       SLB     BUY                     6700          94.08
2011-01-06      Tudor Investment Corp       ALL     BUY                     11800         31.92
2011-01-20      American Funds Inc;A        AMZN    SELL                    3600          180.14

以下是我希望实现的目标：

Date            Party name                 Symbol  Buy/Sell     # of shares   trade price  trading volume 

2011-04-21      Federated Prime Obl;Inst    MMM     BUY          2600         96.17        250042
2011-01-05      Fortress Investment Group   CMCSA   SELL         29700        21.96        644193
2011-02-28      Dodge & Cox Intl Stock      DELL    SELL         57400        15.67        899458
2011-05-02      American Funds Inc;A        S       BUY          137300       5.19         712587

新的交易量列是股票列*

交易价格

列。有人知道如何自动实现这一点，因为有更多的线？之后我想做的是获取交易量值，并将其显示为降序输出。确切的说明是

美元交易量最大的交易对手，排名前二十

到目前为止，我有：

val dataframe = spark.read.cvs("c:\data")

val newdf = dataframe.select("# of shares","trade price")

任何帮助都将不胜感激。谢谢。

给你：

import org.apache.spark.sql.functions._
val newdf = dataframe.withColumn("trading volume",col("# of shares")*col("trade price"))
                     .select("# of shares","trade price","trading volume")

给你：

import org.apache.spark.sql.functions._
val newdf = dataframe.withColumn("trading volume",col("# of shares")*col("trade price"))
                     .select("# of shares","trade price","trading volume")