Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/cassandra/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 从Spark中删除Cassandra中的特定列_Apache Spark_Cassandra_Datastax - Fatal编程技术网

Apache spark 从Spark中删除Cassandra中的特定列

Apache spark 从Spark中删除Cassandra中的特定列,apache-spark,cassandra,datastax,Apache Spark,Cassandra,Datastax,我能够用RDDAPI删除特定的列- sc.cassandraTable("books_ks", "books") .deleteFromCassandra("books_ks", "books",SomeColumns("book_price")) 我很难用DataFrameAPI做到这一点 有人可以分享一个例子吗?您不能通过DF API删除,通过RDD API删除是不自然的。RDD和DFs是不可变的,这意味着没有修改。您可以过滤它们以减少它们,但这会生成一个新的RDD/DF 说到这里您可

我能够用RDDAPI删除特定的列-

sc.cassandraTable("books_ks", "books")
  .deleteFromCassandra("books_ks", "books",SomeColumns("book_price"))
我很难用DataFrameAPI做到这一点


有人可以分享一个例子吗?

您不能通过DF API删除,通过RDD API删除是不自然的。RDD和DFs是不可变的,这意味着没有修改。您可以过滤它们以减少它们,但这会生成一个新的RDD/DF

说到这里您可以做的是过滤掉您想要删除的行,然后构建一个C*客户端来执行删除:

//Spark和C*连接的导入 导入org.apache.spark.sql.cassandra_ 导入com.datastax.spark.connector.cql.CassandraConnectorConf

spark.setCassandraConf("Test Cluster", CassandraConnectorConf.ConnectionHostParam.option("localhost"))
val df = spark.read.format("org.apache.spark.sql.cassandra").options(Map("keyspace" -> "books_ks", "table" -> "books")).load()
val dfToDelete = df.filter($"price" < 3).select($"price");
dfToDelete.show();


// import for C* client
import com.datastax.driver.core._

// build a C* client (part of the dependency of the scala driver)
val clusterBuilder = Cluster.builder().addContactPoints("127.0.0.1");
val cluster  = clusterBuilder.build();
val session = cluster.connect();

// loop over everything that you filtered in the DF and delete specified row.
for(price <- dfToDelete.collect())
    session.execute("DELETE FROM books_ks.books WHERE price=" + price.get(0).toString);
spark.setCassandraConf(“测试集群”,CassandraConnectorConf.ConnectionHostParam.option(“本地主机”))
val df=spark.read.format(“org.apache.spark.sql.cassandra”).options(映射(“键空间”->“books”;“表”->“books”)).load()
val dftodelet=df.filter($“price”<3)。选择($“price”);
dfToDelete.show();
//为C*客户端导入
导入com.datastax.driver.core_
//构建C*客户机(scala驱动程序依赖项的一部分)
val clusterBuilder=Cluster.builder().addContactPoints(“127.0.0.1”);
val cluster=clusterBuilder.build();
val session=cluster.connect();
//在DF中过滤的所有内容上循环并删除指定行。
为了(价格)