pyspark：通过插座的火花流_Pyspark_Spark Streaming_Databricks

pyspark：通过插座的火花流

pyspark

pyspark：通过插座的火花流,pyspark,spark-streaming,databricks,Pyspark,Spark Streaming,Databricks,我正在尝试从套接字（ps.pndsn.com）读取数据流，并将其写入temp_表以进行进一步处理，但目前我面临的问题是，作为writeStream的一部分创建的temp_表是空的，即使流正在实时发生。所以在这方面寻求帮助下面是代码片段： # Create DataFrame representing the stream of input streamingDF from connection to ps.pndsn.com:9999 streamingDF = spark \ .re

我正在尝试从套接字（ps.pndsn.com）读取数据流，并将其写入temp_表以进行进一步处理，但目前我面临的问题是，作为writeStream的一部分创建的temp_表是空的，即使流正在实时发生。所以在这方面寻求帮助

下面是代码片段：

# Create DataFrame representing the stream of input streamingDF from connection to ps.pndsn.com:9999
streamingDF = spark \
    .readStream \
    .format("socket") \
    .option("host", "ps.pndsn.com") \
    .option("port", 9999) \
    .load()

# Is this DF actually a streaming DF?
streamingDF.isStreaming


spark.conf.set("spark.sql.shuffle.partitions", "2")  # keep the size of shuffles small

query = (
  streamingDF
    .writeStream
    .format("memory")       
    .queryName("temp_table")     # temp_table = name of the in-memory table
    .outputMode("Append")  # Append = OutputMode in which only the new rows in the streaming DataFrame/Dataset will be written to the sink
    .start()
)

流式输出：

{'channel': 'pubnub-sensor-network',
 'message': {'ambient_temperature': '1.361',
             'humidity': '81.1392',
             'photosensor': '758.82',
             'radiation_level': '200',
             'sensor_uuid': 'probe-84d85b75',
             'timestamp': 1581332619},
 'publisher': None,
 'subscription': None,
 'timetoken': 15813326199534409,
 'user_metadata': None}

temp_表的输出为空