Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/amazon-web-services/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala Spark-多次使用数据帧,无需多次卸载_Scala_Amazon Web Services_Apache Spark_Amazon Redshift - Fatal编程技术网

Scala Spark-多次使用数据帧,无需多次卸载

Scala Spark-多次使用数据帧,无需多次卸载,scala,amazon-web-services,apache-spark,amazon-redshift,Scala,Amazon Web Services,Apache Spark,Amazon Redshift,我有个问题。如何复制数据帧而不将其再次卸载到redshift val companiesData = spark.read.format("com.databricks.spark.redshift") .option("url","jdbc:redshift://xxxx:5439/cf?user="+user+"&password="+password) .option("query","select * from cf_core.company") //.option("dbtab

我有个问题。如何复制数据帧而不将其再次卸载到redshift

val companiesData = spark.read.format("com.databricks.spark.redshift")
.option("url","jdbc:redshift://xxxx:5439/cf?user="+user+"&password="+password)
.option("query","select * from cf_core.company")
//.option("dbtable",schema+"."+table)
.option("aws_iam_role","arn:aws:iam::xxxxxx:role/somerole")
.option("tempdir","s3a://xxxxx/Spark")
.load()

import class.companiesData
class test {
val secondDF = filteredDF(companiesData)

 def filteredDF(df: Dataframe): Dataframe {
   val result = df.select("companynumber")
  result
 }
}
在这种情况下,这将卸载数据两次。首先从表中选择*,然后通过仅选择公司编号卸载。如何一次卸载数据并多次对其进行操作?这对我来说是个严重的问题。谢谢你的帮助,你说的“卸载”是指读取数据吗?如果是这样,为什么你确定它被读了两遍?事实上,您的代码中没有任何操作,因此我甚至不确定是否正在读取数据。如果您试图访问代码中其他地方的secondDF,spark应该只读取您在类“test”中选择的列。我不是100%确定这一点,因为我以前从未使用红移将数据加载到spark中

通常,如果要重用数据帧,应该使用

companiesData.cache()
然后,无论何时调用数据帧上的操作,都会将其缓存到内存中