Java JsonParseException:无法识别的令牌';http';:应为(';true';、';false';或';null';)

Java JsonParseException:无法识别的令牌';http';:应为(';true';、';false';或';null';),java,jackson,cloudera,Java,Jackson,Cloudera,我们有以下字符串,它是一个有效的JSON,写在HDFS上的文件中 { "id":"tag:search.twitter.com,2005:564407444843950080", "objectType":"activity", "actor":{ "objectType":"person", "id":"id:twitter.com:2302910022", "link":"http%3A%2F%2Fwww.twitter.com%2Fme7me46

我们有以下字符串,它是一个有效的JSON,写在HDFS上的文件中

{  
  "id":"tag:search.twitter.com,2005:564407444843950080",
  "objectType":"activity",
  "actor":{  
    "objectType":"person",
    "id":"id:twitter.com:2302910022",
    "link":"http%3A%2F%2Fwww.twitter.com%2Fme7me4610012",
    "displayName":"",
    "postedTime":"2014-01-21T11:06:06.000Z",
    "image":"https%3A%2F%2Fpbs.twimg.com%2Fprofile_images%2F563125491159162881%2FfypkHK3M_normal.jpeg",
    "summary":"‏‏‏‏‏‏‏‏ضًـأّيِّعٌهّ أّروٌأّحًنِأّ تٌـشُـتٌـهّـيِّ مًنِ يِّفُـهّـمًهّـأّ فُـقُط  حسابي بالإنستقرام lloooo_20",
    "links":[  
      {  
        "href":null,
        "rel":"me"
      }
    ],
    "friendsCount":10503,
    "followersCount":10325,
    "listedCount":12,
    "statusesCount":84957,
    "twitterTimeZone":null,
    "verified":false,
    "utcOffset":null,
    "preferredUsername":"me7me4610012",
    "languages":[  
      "ar"
    ],
    "favoritesCount":17
  },
  "verb":"share",
  "postedTime":"2015-02-08T12:56:35.000Z",
  "generator":{  
    "displayName":"Twitter for Android",
    "link":"http%3A%2F%2Ftwitter.com%2Fdownload%2Fandroid"
  },
  "provider":{  
    "objectType":"service",
    "displayName":"Twitter",
    "link":"http%3A%2F%2Fwww.twitter.com"
  },
  "link":"http%3A%2F%2Ftwitter.com%2Fme7me4610012%2Fstatuses%2F564407444843950080",
  "body":"RT @sckud1: فيديو: إمام يرفض بغضب الصلاة على أحد قتلى حزب الله في سوريا بسبب إطلاق النار: ماعاد  http%3A%2F%2Ft.co%2FC55SaQKmUV http%3A%2F%2Ft.co%2Ft5TjIln…",
  "object":{  
    "id":"tag:search.twitter.com,2005:564407126526013440",
    "objectType":"activity",
    "actor":{  
      "objectType":"person",
      "id":"id:twitter.com:462268717",
      "link":"http%3A%2F%2Fwww.twitter.com/sckud1",
      "displayName":"صفق الهوى",
      "postedTime":"2012-01-12T19:24:17.000Z",
      "image":"https%3A%2F%2Fpbs.twimg.com%2Fprofile_images%2F508424482885615616%2FmPBGZBPx_normal.jpeg",
      "summary":"اعلانك في سوق الخليج يحقق لك الوصول الى اكثر من مليون متابع خليجي  http%3A%2F%2Fmarketgulf.com",
      "links":[  
        {  
          "href":"http%3A%2F%2Fmarketgulf.com",
          "rel":"me"
        }
      ],
      "friendsCount":435237,
      "followersCount":464951,
      "listedCount":708,
      "statusesCount":1071685,
      "twitterTimeZone":"Riyadh",
      "verified":false,
      "utcOffset":"10800",
      "preferredUsername":"sckud1",
      "languages":[  
        "ar"
      ],
      "location":{  
        "objectType":"place",
        "displayName":"Made in K S A"
      },
      "favoritesCount":77
    },
    "verb":"post",
    "postedTime":"2015-02-08T12:55:19.000Z",
    "generator":{  
      "displayName":"Tweet Old Post",
      "link":"http%3A%2F%2Fwww.ajaymatharu.com%2F"
    },
    "provider":{  
      "objectType":"service",
      "displayName":"Twitter",
      "link":"http%3A%2F%2Fwww.twitter.com"
    },
    "link":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatuses%2F564407126526013440",
    "body":"فيديو: إمام يرفض بغضب الصلاة على أحد قتلى حزب الله في سوريا بسبب إطلاق النار: ماعاد  http%3A%2F%2Ft.co%2FC55SaQKmUV http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
    "object":{  
      "objectType":"note",
      "id":"object:search.twitter.com,2005:564407126526013440",
      "summary":"فيديو: إمام يرفض بغضب الصلاة على أحد قتلى حزب الله في سوريا بسبب إطلاق النار: ماعاد  http%3A%2F%2Ft.co%2FC55SaQKmUV http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
      "link":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatuses%2F564407126526013440",
      "postedTime":"2015-02-08T12:55:19.000Z"
    },
    "favoritesCount":0,
    "twitter_entities":{  
      "hashtags":[  

      ],
      "trends":[  

      ],
      "urls":[  
        {  
          "url":"http%3A%2F%2Ft.co%2FC55SaQKmUV",
          "expanded_url":"http%3A%2F%2Fwww.hasterya.com%2Farchives%2F34688utm_source%3DReviveOldPost%26utm_medium%3Dsocial%26utm_campaign%3DReviveOldPost",
          "display_url":"hasterya.com/archives/34688…",
          "indices":[  
            85,
            107
          ]
        }
      ],
      "user_mentions":[  

      ],
      "symbols":[  

      ],
      "media":[  
        {  
          "id":564407126341468160,
          "id_str":"564407126341468160",
          "indices":[  
            108,
            130
          ],
          "media_url":"http%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
          "media_url_https":"https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
          "url":"http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
          "display_url":"pic.twitter.com/t5TjIlnZgN",
          "expanded_url":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatus%2F564407126526013440%2Fphoto%2F1",
          "type":"photo",
          "sizes":{  
            "large":{  
              "w":320,
              "h":180,
              "resize":"fit"
            },
            "thumb":{  
              "w":150,
              "h":150,
              "resize":"crop"
            },
            "small":{  
              "w":320,
              "h":180,
              "resize":"fit"
            },
            "medium":{  
              "w":320,
              "h":180,
              "resize":"fit"
            }
          }
        }
      ]
    },
    "twitter_extended_entities":{  
      "media":[  
        {  
          "id":564407126341468160,
          "id_str":"564407126341468160",
          "indices":[  
            108,
            130
          ],
          "media_url":"http%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
          "media_url_https":"https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
          "url":"http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
          "display_url":"pic.twitter.com/t5TjIlnZgN",
          "expanded_url":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatus%2F564407126526013440%2Fphoto%2F1",
          "type":"photo",
          "sizes":{  
            "large":{  
              "w":320,
              "h":180,
              "resize":"fit"
            },
            "thumb":{  
              "w":150,
              "h":150,
              "resize":"crop"
            },
            "small":{  
              "w":320,
              "h":180,
              "resize":"fit"
            },
            "medium":{  
              "w":320,
              "h":180,
              "resize":"fit"
            }
          }
        }
      ]
    },
    "twitter_filter_level":"low",
    "twitter_lang":"ar"
  },
  "favoritesCount":0,
  "twitter_entities":{  
    "hashtags":[  

    ],
    "trends":[  

    ],
    "urls":[  
      {  
        "url":"http%3A%2F%2Ft.co%2FC55SaQKmUV",
        "expanded_url":"http%3A%2F%2Fwww.hasterya.com%2Farchives%2F34688utm_source%3DReviveOldPost%26utm_medium%3Dsocial%26utm_campaign%3DReviveOldPost",
        "display_url":"hasterya.com/archives/34688…",
        "indices":[  
          97,
          119
        ]
      }
    ],
    "user_mentions":[  
      {  
        "screen_name":"sckud1",
        "name":"صفق الهوى",
        "id":462268717,
        "id_str":"462268717",
        "indices":[  
          3,
          10
        ]
      }
    ],
    "symbols":[  

    ],
    "media":[  
      {  
        "id":564407126341468160,
        "id_str":"564407126341468160",
        "indices":[  
          139,
          140
        ],
        "media_url":"http%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
        "media_url_https":"https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
        "url":"http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
        "display_url":"pic.twitter.com/t5TjIlnZgN",
        "expanded_url":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatus%2F564407126526013440%2Fphoto%2F1",
        "type":"photo",
        "sizes":{  
          "large":{  
            "w":320,
            "h":180,
            "resize":"fit"
          },
          "thumb":{  
            "w":150,
            "h":150,
            "resize":"crop"
          },
          "small":{  
            "w":320,
            "h":180,
            "resize":"fit"
          },
          "medium":{  
            "w":320,
            "h":180,
            "resize":"fit"
          }
        },
        "source_status_id":564407126526013440,
        "source_status_id_str":"564407126526013440"
      }
    ]
  },
  "twitter_extended_entities":{  
    "media":[  
      {  
        "id":564407126341468160,
        "id_str":"564407126341468160",
        "indices":[  
          139,
          140
        ],
        "media_url":"http%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
        "media_url_https":"https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FB9UtSoJIQAA07-r.jpg",
        "url":"http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
        "display_url":"pic.twitter.com/t5TjIlnZgN",
        "expanded_url":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatus%2F564407126526013440%2Fphoto%2F1",
        "type":"photo",
        "sizes":{  
          "large":{  
            "w":320,
            "h":180,
            "resize":"fit"
          },
          "thumb":{  
            "w":150,
            "h":150,
            "resize":"crop"
          },
          "small":{  
            "w":320,
            "h":180,
            "resize":"fit"
          },
          "medium":{  
            "w":320,
            "h":180,
            "resize":"fit"
          }
        },
        "source_status_id":564407126526013440,
        "source_status_id_str":"564407126526013440"
      }
    ]
  },
  "twitter_filter_level":"low",
  "twitter_lang":"ar",
  "retweetCount":1,
  "gnip":{  
    "matching_rules":[  
      {  
        "tag":"ISIS66"
      }
    ],
    "urls":[  
      {  
        "url":"http%3A%2F%2Ft.co%2Ft5TjIlnZgN",
        "expanded_url":"http%3A%2F%2Ftwitter.com%2Fsckud1%2Fstatus%2F564407126526013440%2Fphoto%2F1",
        "expanded_status":200
      },
      {  
        "url":"http%3A%2F%2Ft.co%2FC55SaQKmUV",
        "expanded_url":"http%3A%2F%2Fwww.hasterya.com%2Farchives%2F34688utm_source%3DReviveOldPost%26utm_medium%3Dsocial%26utm_campaign%3DReviveOldPost",
        "expanded_status":200
      }
    ],
    "klout_score":50,
    "language":{  
      "value":"ar"
    }
  }
}
编辑

我们配置了一个flume代理,从该文件读取数据并将其传递给Solr sink,但不幸的是,标题中的这个异常被抛出

这是堆栈跟踪

org.kitesdk.morphline.api.MorphlineRuntimeException: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'http': was expecting ('true', 'false' or 'null')
 at [Source: java.io.ByteArrayInputStream@20d7aa52; line: 1, column: 9]
    at org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:98)
    at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
    at org.kitesdk.morphline.stdlib.TryRulesBuilder$TryRules.doProcess(TryRulesBuilder.java:120)
    at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
    at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
    at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
    at org.apache.flume.sink.solr.morphline.MorphlineHandlerImpl.process(MorphlineHandlerImpl.java:128)
    at org.apache.flume.sink.solr.morphline.MorphlineSink.process(MorphlineSink.java:141)
    at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
    at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
    at java.lang.Thread.run(Thread.java:744)
Caused by: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'http': was expecting ('true', 'false' or 'null')
 at [Source: java.io.ByteArrayInputStream@20d7aa52; line: 1, column: 9]
    at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1524)
    at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:557)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidToken(UTF8StreamJsonParser.java:3095)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2340)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:818)
    at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:698)
    at com.fasterxml.jackson.databind.MappingIterator.hasNextValue(MappingIterator.java:159)
    at org.kitesdk.morphline.json.ReadJsonBuilder$ReadJson.doProcess(ReadJsonBuilder.java:109)
    at org.kitesdk.morphline.stdio.AbstractParser.doProcess(AbstractParser.java:96)
    ... 10 more
我们有以下字符串,它是一个有效的JSON

显然JSON解析器不同意

但是,异常表示错误在“第1行:第9列”,JSON开头附近没有“http”标记。因此,我怀疑当错误发生时,解析器试图解析与此字符串不同的内容


您需要找到实际解析的JSON。在调试器中运行应用程序,在相关构造函数上为
JsonParseException
设置断点。。。然后找出它试图解析的
ByteArrayInputStream
中的内容。

我很长一段时间都遇到了这个异常,无法找到问题所在。例外情况是第1行第9列。我犯的错误是获取flume正在处理的文件的第一行

Apache flume在修补程序中处理文件的内容。因此,当flume抛出此异常并显示第1行时,它表示当前修补程序中的第一行

如果flume代理配置为使用batch size=100,并且(例如)该文件包含400行,则这意味着将在以下第1、101、201301行中引发异常

如何发现导致问题的线路?

你有三种方法可以做到这一点

1-提取源代码并在调试模式下运行代理。如果你像我一样是一个普通的开发人员,不知道如何做到这一点,请检查其他两个选项

2-尝试根据批大小拆分文件,然后再次运行flume代理。如果将文件拆分为4个文件,并且第301行和第400行之间存在无效的json,flume代理将处理前3个文件,并在第4个文件处停止。将第四个文件再次拆分为更小的文件。继续此过程,直到到达只有一行的文件,flume在处理该文件时失败

3-将flume代理的批处理大小减少为仅一个,并比较正在使用的接收器输出中已处理事件的数量。例如,在我的例子中,我使用Solr sink。该文件包含400行。flume代理配置为批量大小=100。当我运行flume代理时,它会在某个点失败并抛出该异常。此时,请检查Solr中接收了多少文档。如果第346行存在无效的json,则索引到Solr中的文档数将为345,因此下一行就是导致问题的那一行

在我的例子中,我遵循了第三种选择,幸运的是我指出了导致问题的原因

这是一个很长的答案,但实际上并不能解决这个异常。我如何克服这个例外

我不知道为什么Jackson库在解析包含转义字符的json字符串时会抱怨
\n\r\t
。我认为(但我不确定)Jackson解析器在默认情况下会转义这些字符,这会将json字符串分成两行(在
\n
的情况下),然后将每一行作为单独的json字符串处理


在我的例子中,我们使用定制的拦截器在flume代理处理之前删除这些字符。这就是我们解决这个问题的方法

这可能很明显,但请确保您向解析器URL对象发送的不是包含www地址的字符串。这将不起作用:

    ObjectMapper mapper = new ObjectMapper();
    String www = "www.sample.pl";
    Weather weather = mapper.readValue(www, Weather.class);
但这将:

    ObjectMapper mapper = new ObjectMapper();
    URL www = new URL("http://www.oracle.com/");
    Weather weather = mapper.readValue(www, Weather.class);

在@RequestMapping中添加
products=“application/json”
,请尝试格式化您的问题,以便有人能够准确地帮助您如果您使用的是Morphline配置,请提供它。您应该重新格式化问题并省略不必要的代码,以便pepole能够帮助您。我对问题进行了一些编辑。我们有一个配置好的flume代理,它使用HDFS源和Solr MOPHLINE接收器,这只是一个配置,因为我们使用的是cloudera SDH。我如何调试这个?对不起,如果我的问题看起来很幼稚,我以前没有做过类似的实验。我回答的最后一段说了我认为你应该如何进行。这需要Java开发技能。如果你没有这些技能,你需要找当地人来帮助你。此外,按照上述要求提供吗啡配置。也许有人是这方面的专家,可以发现问题。(我不是。我不是。我只是从Java级别来讨论这个问题…)