使用Scala从表中提取特定数据?
下面是一个示例DF:使用Scala从表中提取特定数据?,scala,dataframe,Scala,Dataframe,下面是一个示例DF: Date Party name Symbol Buy/Sell indicator # of shares trade price 2011-01-03 American Funds EuPc;A AAPL BUY 2400 332.87 2011-02-14 American Funds CWGI;A
Date Party name Symbol Buy/Sell indicator # of shares trade price
2011-01-03 American Funds EuPc;A AAPL BUY 2400 332.87
2011-02-14 American Funds CWGI;A SLB BUY 6700 94.08
2011-01-06 Tudor Investment Corp ALL BUY 11800 31.92
2011-01-20 American Funds Inc;A AMZN SELL 3600 180.14
以下是我希望实现的目标:
Date Party name Symbol Buy/Sell # of shares trade price trading volume
2011-04-21 Federated Prime Obl;Inst MMM BUY 2600 96.17 250042
2011-01-05 Fortress Investment Group CMCSA SELL 29700 21.96 644193
2011-02-28 Dodge & Cox Intl Stock DELL SELL 57400 15.67 899458
2011-05-02 American Funds Inc;A S BUY 137300 5.19 712587
新的交易量列是股票列*交易价格
列。有人知道如何自动实现这一点,因为有更多的线?之后我想做的是获取交易量值,并将其显示为降序输出。确切的说明是
美元交易量最大的交易对手,排名前二十
到目前为止,我有:
val dataframe = spark.read.cvs("c:\data")
val newdf = dataframe.select("# of shares","trade price")
任何帮助都将不胜感激。谢谢。给你:
import org.apache.spark.sql.functions._
val newdf = dataframe.withColumn("trading volume",col("# of shares")*col("trade price"))
.select("# of shares","trade price","trading volume")
给你:
import org.apache.spark.sql.functions._
val newdf = dataframe.withColumn("trading volume",col("# of shares")*col("trade price"))
.select("# of shares","trade price","trading volume")