Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/321.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Wireshark json捕捉到火花_Python_Json_Apache Spark_Wireshark - Fatal编程技术网

Python Wireshark json捕捉到火花

Python Wireshark json捕捉到火花,python,json,apache-spark,wireshark,Python,Json,Apache Spark,Wireshark,我有一个来自wireshark的JSON文件,需要加载到Spark。我正在使用PySpark 我需要从这些JSON文件中提取数据,然后将数据作为JSON文件输出 问题是,我似乎无法正确加载JSON文件,以帮助我查找每个数据。我尝试了json.loads,也是Spark中的SQLContext。 spark中的Sqlcontext不会有多大帮助,因为我想将其适应spark流媒体模块。 Json文件如下所示: [ { "_index": "packets-2017-07-27",

我有一个来自wireshark的JSON文件,需要加载到Spark。我正在使用PySpark

我需要从这些JSON文件中提取数据,然后将数据作为JSON文件输出

问题是,我似乎无法正确加载JSON文件,以帮助我查找每个数据。我尝试了json.loads,也是Spark中的SQLContext。 spark中的Sqlcontext不会有多大帮助,因为我想将其适应spark流媒体模块。 Json文件如下所示:

[
  {
    "_index": "packets-2017-07-27",
    "_type": "pcap_file",
    "_score": null,
    "_source": {
      "layers": {
        "frame": {
          "frame.encap_type": "1",
          "frame.time": "May 13, 2004 11:17:09.864896000 Afr. centrale Ouest",
          "frame.offset_shift": "0.000000000",
          "frame.time_epoch": "1084443429.864896000",
          "frame.time_delta": "0.000000000",
          "frame.time_delta_displayed": "0.000000000",
          "frame.time_relative": "2.553672000",
          "frame.number": "13",
          "frame.len": "89",
          "frame.cap_len": "89",
          "frame.marked": "0",
          "frame.ignored": "0",
          "frame.protocols": "eth:ethertype:ip:udp:dns",
          "frame.coloring_rule.name": "UDP",
          "frame.coloring_rule.string": "udp"
        },....]

嘿,我真的很感谢你的回答。你能告诉我printRddjson_rdd行代表什么吗?它对medef PrintRdDdd不起作用。在rdd中的x。collect:print x只是用来打印rdd的内容。谢谢,伙计,是sql部分如何使用df.selectframe.offset\u shift.show1打印frame.offset\u shift返回无法解析frame.offset\u shift给定的输入列:[[u index,[u score,[u source,[u type]尝试df.select_source.layers.frame.frame.offset_shift.show此处的点frame.offset_shift使它认为它仍在另一层中,一旦我删除了该点,它就工作了。我是否应该从文件中删除所有点?
rdd = sc.wholeTextFiles("abc.json")     
import re
json_rdd = rdd.map(lambda x : x[1])\
.map(lambda x : re.sub(r"\s+", "", x, \
flags=re.UNICODE))


printRdd(json_rdd)
df = spark.read.json(json_rdd)
df.printSchema()