使用Scala从表中提取特定数据?

使用Scala从表中提取特定数据?,scala,dataframe,Scala,Dataframe,下面是一个示例DF: Date Party name Symbol Buy/Sell indicator # of shares trade price 2011-01-03 American Funds EuPc;A AAPL BUY 2400 332.87 2011-02-14 American Funds CWGI;A

下面是一个示例DF:

Date            Party name                  Symbol  Buy/Sell indicator   # of shares   trade price
2011-01-03      American Funds EuPc;A       AAPL    BUY                     2400          332.87
2011-02-14      American Funds CWGI;A       SLB     BUY                     6700          94.08
2011-01-06      Tudor Investment Corp       ALL     BUY                     11800         31.92
2011-01-20      American Funds Inc;A        AMZN    SELL                    3600          180.14
以下是我希望实现的目标:

Date            Party name                 Symbol  Buy/Sell     # of shares   trade price  trading volume 

2011-04-21      Federated Prime Obl;Inst    MMM     BUY          2600         96.17        250042
2011-01-05      Fortress Investment Group   CMCSA   SELL         29700        21.96        644193
2011-02-28      Dodge & Cox Intl Stock      DELL    SELL         57400        15.67        899458
2011-05-02      American Funds Inc;A        S       BUY          137300       5.19         712587
新的交易量列是股票列*
交易价格
列。有人知道如何自动实现这一点,因为有更多的线?之后我想做的是获取交易量值,并将其显示为降序输出。确切的说明是

美元交易量最大的交易对手,排名前二十

到目前为止,我有:

val dataframe = spark.read.cvs("c:\data")

val newdf = dataframe.select("# of shares","trade price")
任何帮助都将不胜感激。谢谢。

给你:

import org.apache.spark.sql.functions._
val newdf = dataframe.withColumn("trading volume",col("# of shares")*col("trade price"))
                     .select("# of shares","trade price","trading volume")
给你:

import org.apache.spark.sql.functions._
val newdf = dataframe.withColumn("trading volume",col("# of shares")*col("trade price"))
                     .select("# of shares","trade price","trading volume")