从Jena中的url读取RDF/XML

从Jena中的url读取RDF/XML,xml,rdf,jena,Xml,Rdf,Jena,我正在尝试使用Jena读取XML文件。正常情况下,它正在工作 final String url = "http://www.bbc.co.uk/nature/life/Human"; Model model = ModelFactory.createDefaultModel(); model.read(url, "RDF/XML"); 但是当我尝试另一个URL时,当段落包含br或链接。它给了我这个错误 Exception in thread "main"

我正在尝试使用Jena读取XML文件。正常情况下,它正在工作

    final String url = "http://www.bbc.co.uk/nature/life/Human";
    Model model = ModelFactory.createDefaultModel();       
    model.read(url, "RDF/XML");
但是当我尝试另一个URL时,当段落包含br或链接。它给了我这个错误

Exception in thread "main" org.apache.jena.riot.RiotException: [line: 25, col: 6 ] {E202} Cannot have both string data "Great white sharks are at the very top of the marine food chain. Feared as man-eaters, they are only responsible for about 5-10 attacks a year, which are rarely fatal. Great whites are ultimate predators. Powerful streamlined bodies and a mouth full of terrifyingly sharp, serrated teeth, combine with super senses that can detect a single drop of blood from over a mile away. Hiding from a great white isn't an option as they can detect and home in on small electrical discharges from hearts and gills. Unlike most other sharks, live young are born that immediately swim away.
" and XML data <br> inside a property element. Maybe you want rdf:parseType='Literal'.
线程“main”org.apache.jena.riot.RiotException:[行:25,列:6]{E202}中的异常不能同时包含两个字符串数据“大白鲨处于海洋食物链的最顶端。作为食人族,它们每年只会造成5-10次袭击,很少致命。大白鲨是终极捕食者。强大的流线型身体和一张满是可怕的锋利锯齿状牙齿的嘴巴,再加上超级感官,可以检测到一英里外的一滴血。躲避大白鲨不是一个选择,因为它们可以探测到心脏和鳃的微小放电,并以此为家。与大多数其他鲨鱼不同,活的幼鱼出生后会立即游走。 属性元素中的XML数据。可能需要rdf:parseType='Literal'。 这是Jena抛出此错误时第二种情况的链接


我应该怎么做才能让它忽略这一点。

问题在于BBC网站的数据;

需要转义为
br/
,才能将HTML标记放入字符串值。在RDF/XML中,字符串值不能具有简单字符串的原始标记

不幸的是,BBC网站没有完全处理内容协商:要求Turtle或N-triples会得到一个XHTML页面

您需要使用一个常规HTTP请求下载文件,文件头为
Accept:application/rdf+xml
,修补内容,并从固定版本解析它。一种方法是将其读入Java字符串,用
br/
替换

,然后从字符串解析