Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 将Spark Structured Dataframe中的记录写入mysql后,mysql中的记录不可见_Apache Spark_Pyspark_Spark Streaming_Spark Structured Streaming_Spark Streaming Kafka - Fatal编程技术网

Apache spark 将Spark Structured Dataframe中的记录写入mysql后,mysql中的记录不可见

Apache spark 将Spark Structured Dataframe中的记录写入mysql后,mysql中的记录不可见,apache-spark,pyspark,spark-streaming,spark-structured-streaming,spark-streaming-kafka,Apache Spark,Pyspark,Spark Streaming,Spark Structured Streaming,Spark Streaming Kafka,我使用下面的代码将spark Streaming dataframe写入MQSQL DB。下面是kafka主题JSON数据格式和MYSQL表模式。列名和类型相同。但我无法看到MYSQL表中写入的记录。表为空,没有记录。请建议 卡夫卡主题数据表 ssingh@RENLTP2N073:/mnt/d/confluent-6.0.0/bin$ ./kafka-console-consumer --topic sarvtopic --from-beginning --bootstrap-server lo

我使用下面的代码将spark Streaming dataframe写入MQSQL DB。下面是kafka主题JSON数据格式和MYSQL表模式。列名和类型相同。但我无法看到MYSQL表中写入的记录。表为空,没有记录。请建议

卡夫卡主题数据表

ssingh@RENLTP2N073:/mnt/d/confluent-6.0.0/bin$ ./kafka-console-consumer --topic sarvtopic --from-beginning --bootstrap-server localhost:9092 
{"id":1,"firstname":"James ","middlename":"","lastname":"Smith","dob_year":2018,"dob_month":1,"gender":"M","salary":3000} 
{"id":2,"firstname":"Michael ","middlename":"Rose","lastname":"","dob_year":2010,"dob_month":3,"gender":"M","salary":4000}
质疑

Out[14]:
MYSQL表架构:
创建表格sparkkafka(
id int,
firstname VARCHAR(40)不为空,
middlename VARCHAR(40)不为空,
lastname VARCHAR(40)不为空,
dob_year int(40)不为空,
dob_月整数(40)不为空,
性别VARCHAR(40)不为空,
薪资整数(40)不为空,
主键(id)
);
import pyspark

from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .appName("SSKafka") \
    .getOrCreate()
    dsraw = spark \
  .readStream \
  .format("kafka") \
  .option("kafka.bootstrap.servers", "localhost:9092") \
  .option("subscribe", "sarvtopic") \
  .option("startingOffsets", "earliest") \
  .load()

ds = dsraw.selectExpr("CAST(value AS STRING)")
dsraw.printSchema()

from pyspark.sql.types import StructField, StructType, StringType,LongType
from pyspark.sql.functions import *

custom_schema = StructType([
    StructField("id", LongType(), True),
    StructField("firstname", StringType(), True),
    StructField("middlename", StringType(), True),
    StructField("lastname", StringType(), True),
    StructField("dob_year", StringType(), True),
    StructField("dob_month", LongType(), True),
    StructField("gender", StringType(), True),
    StructField("salary", LongType(), True),
])
      
Person_details_df2 = ds\
        .select(from_json(col("value"), custom_schema).alias("Person_details"))
        
Person_details_df3 = Person_details_df2.select("Person_details.*")


from pyspark.sql import DataFrameWriter

def foreach_batch_function(df, epoch_id):
    Person_details_df3.write.jdbc(url='jdbc:mysql://172.16.23.27:30038/securedb', driver='com.mysql.jdbc.Driver', dbtable="sparkkafka",  user='root',password='root$1234')
    pass

query = Person_details_df3.writeStream.trigger(processingTime='20 seconds').outputMode("append").foreachBatch(foreach_batch_function).start()
Out[14]: <pyspark.sql.streaming.StreamingQuery at 0x1fb25503b08>

MYSQL table Schema:

create table sparkkafka(
   id int,
   firstname VARCHAR(40) NOT NULL,
   middlename VARCHAR(40) NOT NULL,
   lastname VARCHAR(40) NOT NULL,
   dob_year int(40) NOT NULL,
   dob_month int(40) NOT NULL,
   gender VARCHAR(40) NOT NULL,
   salary int(40) NOT NULL,   
   PRIMARY KEY (id)
);