Apache spark select(“设备”)在Spark查询中做什么?
这里有一个例子Apache spark select(“设备”)在Spark查询中做什么?,apache-spark,pyspark,Apache Spark,Pyspark,这里有一个例子 df = ... # streaming DataFrame with IOT device data with schema { device: string, deviceType: string, signal: double, time: DateType } # Select the devices which have signal more than 10 df.select("device").where("signal >
df = ... # streaming DataFrame with IOT device data with schema { device: string, deviceType: string, signal: double, time: DateType }
# Select the devices which have signal more than 10
df.select("device").where("signal > 10")
选择(“设备”)
部件做什么
如果它是由信号
字段值选择的,那么为什么要提到设备
字段
为什么不直接写呢
df.where("signal > 10")
或
?
此选项仅选择“设备”列
它只选择一列“设备”,忽略“信号>10”的所有其他列,而df.where(“信号>10”)将选择信号>10的所有列。
df.select("time").where("signal > 10")
select("device")
df.show
+----------+-------------------+
|signal | B | C | D | E | F |
+----------+---+---+---+---+---+
|10 | 4 | 1 | 0 | 3 | 1 |
|15 | 6 | 4 | 3 | 2 | 0 |
+----------+---+---+---+---+---+
df.select("device").show
+----------+
|signal |
+----------+
|10 |
|15 |
+----------+