Java 在xml解析中面临org.xml.sax.SAXParseException异常_Java_Xml_Spring Boot_Xml Parsing_Saxparseexception

Java 在xml解析中面临org.xml.sax.SAXParseException异常

java xml spring-boot

Java 在xml解析中面临org.xml.sax.SAXParseException异常,java,xml,spring-boot,xml-parsing,saxparseexception,Java,Xml,Spring Boot,Xml Parsing,Saxparseexception,我已经在JavaSpringBoot应用程序中编写了一个调度程序，它每小时运行一次，从上个月开始就运行得非常好。但今天它已经开始在解析时抛出异常。我猜可能是xml（我从中获取的数据被破坏了，或者可能是它有一点变化，我无法理解）请注意：我无法更改源数据。这是我的密码： @Scheduled(fixedRate = 1*60*60*1000 , initialDelay = 10*1000) public String updateNewsFeed() { tr

我已经在JavaSpringBoot应用程序中编写了一个调度程序，它每小时运行一次，从上个月开始就运行得非常好。但今天它已经开始在解析时抛出异常。我猜可能是xml（我从中获取的数据被破坏了，或者可能是它有一点变化，我无法理解）

请注意：我无法更改源数据。

这是我的密码：

    @Scheduled(fixedRate = 1*60*60*1000 , initialDelay = 10*1000)
    public String updateNewsFeed() {

        try {
            DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
            String URL = "https://nation.com.pk/rss/coronavirus";
            Document doc = db.parse(URL);
            List<NewsFeed> newsFeedList = parseNewsItemsToList(doc);
           
            return "Works fine";

        } catch (Exception ex) {
            return ex.getMessage();
        }
}

public List<NewsFeed> parseNewsItemsToList(Document doc) throws Exception{
        doc.getDocumentElement().normalize();
        NodeList nodes = doc.getElementsByTagName("item");
        List<NewsFeed> newsFeedList = new ArrayList<>();
        for (int i = 0; i < nodes.getLength(); i++) {
            Element element = (Element) nodes.item(i);

            NodeList title = element.getElementsByTagName("title");
            NodeList link = element.getElementsByTagName("link");
            NodeList description = element.getElementsByTagName("description");
            NodeList pubDate = element.getElementsByTagName("pubDate");
            NodeList guid = element.getElementsByTagName("guid");

            org.jsoup.nodes.Document htmlDoc = Jsoup.connect(link.item(0).getTextContent().trim()).get();
                /*Elements pngs = htmlDoc.select("picture");
                System.out.println("\nimg link:"+pngs.toString());*/

            String image = htmlDoc.select("picture").select("img[src~=(?i)\\.(png|jpe?g)]").attr("src").trim();
            newsFeedList.add(new NewsFeed(
                    title.item(0).getTextContent().trim(),
                    description.item(0).getTextContent().trim(),
                    pubDate.item(0).getTextContent().trim(),
                    guid.item(0).getTextContent().trim(),
                    image,
                    link.item(0).getTextContent().trim()
            ));
        }
        return newsFeedList;
    }

@Scheduled（fixedRate=1*60*60*1000，initialDelay=10*1000）
公共字符串updateNewsFeed（）{
试一试{
DocumentBuilder db=DocumentBuilderFactory.newInstance（）.newDocumentBuilder（）；
字符串URL=”https://nation.com.pk/rss/coronavirus";
文档doc=db.parse（URL）；
List newsFeedList=parseNewsItemsToList（doc）；
返回“工作罚款”；
}捕获（例外情况除外）{
返回例如getMessage（）；
}
}
公共列表parseNewsItemsToList（文档文档）引发异常{
doc.getDocumentElement（）.normalize（）；
NodeList节点=doc.getElementsByTagName（“项”）；
List newsFeedList=newarraylist（）；
对于（int i=0；i


以下是错误消息：
[致命错误]冠状病毒：195:32:实体名称必须紧跟实体引用中的“&”。org.xml.sax.SAXParseException；系统ID:https://nation.com.pk/rss/coronavirus; 行号：195；栏目号：32；实体名称必须紧跟实体引用中的“&”。com.sun.org.apache.xerces.internal.parsers.DOMParser.parse（DOMParser.java:258）com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse（DocumentBuilderImpl.java:339）javax.xml.parsers.DocumentBuilder.parse（DocumentBuilder.java:177）com.i2p.covid19.service.NewsFeedService.updateNewsFeed（NewsFeedService.java:87）sun.reflect.NativeMethodAccessorImpl.invoke0（本机方法）sun.reflect.NativeMethodAccessorImpl.invoke（NativeMethodAccessorImpl.java:62）sun.reflect.DelegatingMethodAccessorImpl.invoke（DelegatingMethodAccessorImpl.java:43）java.lang.reflect.Method.invoke（Method.java:498）在org.springframework.scheduling.support.ScheduledMethodRunnable.run（ScheduledMethodRunnable.java:84）在org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run（DelegatingErrorHandlingRunnable.java:54）在java.util.concurrent.Executors$runnableapter.call（Executors.java:511）在java.util.concurrent.FutureTask.runAndReset（FutureTask.java:308）在java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301（ScheduledThreadPoolExecutor.java:180）在java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run（ScheduledThreadPoolExecutor.java:294）在java.util.concurrent.ThreadPoolExecutor.runWorker（ThreadPoolExecutor.java:1149）上在java.lang.Thread.run（Thread.java:748）的java.util.concurrent.ThreadPoolExecutor$Worker.run（ThreadPoolExecutor.java:624）问题是XML中的符号和字符。
生活方式与娱乐

&
是CDATA
节之外的非法XML文档。必须将其写入&；
，但XML文档的生产者已转义了&
字符
如果将&
替换为&；
，它将正常工作
使用ROMETOOLS库（）
如果您的目标是处理RSS提要，我建议使用rome
库，它可以处理像&
这样的特殊字符-它简单明了。请参阅
下面的代码片段从RSS提要的
标记打印国际新闻
：
URL feedSource = new URL("https://nation.com.pk/rss/coronavirus");
SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new XmlReader(feedSource));
System.out.println(feed.getTitle());

谢谢！你救了我一天，我正在写正则表达式来解决这个问题。谢谢。