pyspark:通过插座的火花流

pyspark:通过插座的火花流,pyspark,spark-streaming,databricks,Pyspark,Spark Streaming,Databricks,我正在尝试从套接字(ps.pndsn.com)读取数据流,并将其写入temp_表以进行进一步处理,但目前我面临的问题是,作为writeStream的一部分创建的temp_表是空的,即使流正在实时发生。所以在这方面寻求帮助 下面是代码片段: # Create DataFrame representing the stream of input streamingDF from connection to ps.pndsn.com:9999 streamingDF = spark \ .re

我正在尝试从套接字(ps.pndsn.com)读取数据流,并将其写入temp_表以进行进一步处理,但目前我面临的问题是,作为writeStream的一部分创建的temp_表是空的,即使流正在实时发生。所以在这方面寻求帮助

下面是代码片段:

# Create DataFrame representing the stream of input streamingDF from connection to ps.pndsn.com:9999
streamingDF = spark \
    .readStream \
    .format("socket") \
    .option("host", "ps.pndsn.com") \
    .option("port", 9999) \
    .load()

# Is this DF actually a streaming DF?
streamingDF.isStreaming


spark.conf.set("spark.sql.shuffle.partitions", "2")  # keep the size of shuffles small

query = (
  streamingDF
    .writeStream
    .format("memory")       
    .queryName("temp_table")     # temp_table = name of the in-memory table
    .outputMode("Append")  # Append = OutputMode in which only the new rows in the streaming DataFrame/Dataset will be written to the sink
    .start()
)
流式输出:

{'channel': 'pubnub-sensor-network',
 'message': {'ambient_temperature': '1.361',
             'humidity': '81.1392',
             'photosensor': '758.82',
             'radiation_level': '200',
             'sensor_uuid': 'probe-84d85b75',
             'timestamp': 1581332619},
 'publisher': None,
 'subscription': None,
 'timetoken': 15813326199534409,
 'user_metadata': None}
temp_表的输出为空