Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/php/252.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使用php读取文件中的多个xml内容_Php_Xml_Parsing - Fatal编程技术网

如何使用php读取文件中的多个xml内容

如何使用php读取文件中的多个xml内容,php,xml,parsing,Php,Xml,Parsing,我正在处理这类XML序列文件,您能建议我解析以下内容吗: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE us-patent-grant SYSTEM "us-patent-grant-v42-2006-08-23.dtd" [ ]> <name>ccccc</name> <document-id> <country>US</country> <doc-nu

我正在处理这类XML序列文件,您能建议我解析以下内容吗:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE us-patent-grant SYSTEM "us-patent-grant-v42-2006-08-23.dtd" [ ]>
<name>ccccc</name>
<document-id>
<country>US</country>
<doc-number>D0629997</doc-number>
<kind>S1</kind>
<date>20110104</date>
</document-id>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE us-patent-grant SYSTEM "us-patent-grant-v42-2006-08-23.dtd" [ ]>
<name>dddd</name>
<document-id>
<country>US</country>
<doc-number>D0629998</doc-number>
<kind>S2</kind>
<date>20110104</date>
</document-id>

ccccc
美国
D0629997
S1
20110104
dddd
美国
D0629998
S2
20110104

这不是有效的XML文件。它看起来像是一个文件中的两个文件,但即使这样它也是无效的。假设这是两个独立的文件,您可以先尝试“整理”它们。假设$xml是包含xml内容的字符串:

$xml = tidy_repair_string($xml, array(
    'output-xml' => true,
    'input-xml' => true
)); 
然后可以在其上使用SimpleXml:

$xml = new SimpleXmlElement($xml);

我知道这个XML文件是从哪里来的,我发现谷歌会提供一些无效的XML(除非他们只是托管从其他地方得到的这个文件),这很奇怪。这个解析它的建议对我很有用:

该文件包含一系列相互连接的XML文档。您需要注册一个透明地分割文件的PHP streamwrapper,然后您可以单独处理每个文档,甚至以流式方式处理。例如:

stream_wrapper_register('xmlseq', 'XMLSequenceStream');

$path = "xmlseq://zip://ipg140107.zip#ipg140107.xml";

while (XMLSequenceStream::notAtEndOfSequence($path)) {
    $reader = new XMLReader();
    $reader->open($path);
    // just consume the whole document
    while ($reader::next()) {
        XMLReaderNode::dump($reader);
    }
}

XMLSequenceStream::clean();    
该流包装器是SimpleXMLElement或DOMDocument的一部分,并且可以与SimpleXMLElement或DOMDocument一起工作,尽管对于更大的文件,XMLReader更适合

对于我在示例(中)中使用的文件,以该序列为例计算不同树元素的总体元素结构为:

\-us-patent-grant (473)
  |-us-bibliographic-data-grant (473)
  | |-publication-reference (473)
  | | \-document-id (473)
  | |   |-country (473)
  | |   |-doc-number (473)
  | |   |-kind (473)
  | |   \-date (473)
  | |-application-reference (473)
  | | \-document-id (473)
  | |   |-country (473)
  | |   |-doc-number (473)
  | |   \-date (473)
  | |-us-application-series-code (473)
  | |-us-term-of-grant (470)
  | | |-length-of-grant (450)
  | | |-disclaimer (18)
  | | | \-text (18)
  | | \-us-term-extension (20)
  | |-classification-locarno (450)
  | | |-edition (450)
  | | \-main-classification (450)
  | |-classification-national (473)
  | | |-country (473)
  | | |-main-classification (473)
  | | \-further-classification (143)
  | |-invention-title (473)
  | | \-i (12)
  | |-us-references-cited (458)
  | | \-us-citation (11000)
  | |   |-patcit (10265)
  | |   | \-document-id (10265)
  | |   |   |-country (10265)
  | |   |   |-doc-number (10265)
  | |   |   |-kind (9884)
  | |   |   |-name (9811)
  | |   |   \-date (10264)
  | |   |-category (10999)
  | |   |-classification-national (6309)
  | |   | |-country (6309)
  | |   | \-main-classification (6309)
  | |   |-nplcit (735)
  | |   | \-othercit (735)
  | |   |   |-sub (281)
  | |   |   |-i (7)
  | |   |   \-sup (1)
  | |   \-classification-cpc-text (1)
  | |-number-of-claims (472)
  | |-us-exemplary-claim (472)
  | |-us-field-of-classification-search (472)
  | | \-classification-national (8991)
  | |   |-country (8991)
  | |   |-main-classification (8991)
  | |   \-additional-info (1205)
  | |-figures (472)
  | | |-number-of-drawing-sheets (472)
  | | \-number-of-figures (472)
  | |-us-parties (472)
  | | |-us-applicants (472)
  | | | \-us-applicant (765)
  | | |   |-addressbook (765)
  | | |   | |-last-name (573)
  | | |   | |-first-name (573)
  | | |   | |-address (765)
  | | |   | | |-city (765)
  | | |   | | |-country (765)
  | | |   | | \-state (423)
  | | |   | \-orgname (192)
  | | |   \-residence (765)
  | | |     \-country (765)
  | | |-inventors (472)
  | | | \-inventor (969)
  | | |   \-addressbook (969)
  | | |     |-last-name (969)
  | | |     |-first-name (969)
  | | |     \-address (969)
  | | |       |-city (969)
  | | |       |-country (969)
  | | |       \-state (519)
  | | \-agents (429)
  | |   \-agent (500)
  | |     \-addressbook (500)
  | |       |-orgname (361)
  | |       |-address (500)
  | |       | \-country (500)
  | |       |-last-name (139)
  | |       \-first-name (139)
  | |-assignees (385)
  | | \-assignee (391)
  | |   |-addressbook (390)
  | |   | |-orgname (386)
  | |   | |-role (390)
  | |   | |-address (390)
  | |   | | |-city (355)
  | |   | | |-country (390)
  | |   | | \-state (192)
  | |   | |-last-name (4)
  | |   | \-first-name (4)
  | |   |-orgname (1)
  | |   \-role (1)
  | |-examiners (472)
  | | |-primary-examiner (472)
  | | | |-last-name (472)
  | | | |-first-name (472)
  | | | \-department (472)
  | | \-assistant-examiner (65)
  | |   |-last-name (65)
  | |   \-first-name (65)
  | |-us-related-documents (65)
  | | |-continuation-in-part (16)
  | | | \-relation (16)
  | | |   |-parent-doc (16)
  | | |   | |-document-id (16)
  | | |   | | |-country (16)
  | | |   | | |-doc-number (16)
  | | |   | | \-date (16)
  | | |   | |-parent-status (11)
  | | |   | \-parent-grant-document (5)
  | | |   |   \-document-id (5)
  | | |   |     |-country (5)
  | | |   |     |-doc-number (5)
  | | |   |     \-date (2)
  | | |   \-child-doc (16)
  | | |     \-document-id (16)
  | | |       |-country (16)
  | | |       \-doc-number (16)
  | | |-continuation (21)
  | | | \-relation (21)
  | | |   |-parent-doc (21)
  | | |   | |-document-id (21)
  | | |   | | |-country (21)
  | | |   | | |-doc-number (21)
  | | |   | | \-date (21)
  | | |   | |-parent-status (16)
  | | |   | \-parent-grant-document (5)
  | | |   |   \-document-id (5)
  | | |   |     |-country (5)
  | | |   |     |-doc-number (5)
  | | |   |     \-date (2)
  | | |   \-child-doc (21)
  | | |     \-document-id (21)
  | | |       |-country (21)
  | | |       \-doc-number (21)
  | | |-division (32)
  | | | \-relation (32)
  | | |   |-parent-doc (32)
  | | |   | |-document-id (32)
  | | |   | | |-country (32)
  | | |   | | |-doc-number (32)
  | | |   | | \-date (32)
  | | |   | |-parent-grant-document (24)
  | | |   | | \-document-id (24)
  | | |   | |   |-country (24)
  | | |   | |   |-doc-number (24)
  | | |   | |   \-date (1)
  | | |   | \-parent-status (8)
  | | |   \-child-doc (32)
  | | |     \-document-id (32)
  | | |       |-country (32)
  | | |       \-doc-number (32)
  | | \-related-publication (9)
  | |   \-document-id (9)
  | |     |-country (9)
  | |     |-doc-number (9)
  | |     |-kind (9)
  | |     \-date (9)
  | |-priority-claims (140)
  | | \-priority-claim (182)
  | |   |-country (182)
  | |   |-doc-number (182)
  | |   \-date (182)
  | |-us-sir-flag (1)
  | |-classifications-ipcr (23)
  | | \-classification-ipcr (24)
  | |   |-ipc-version-indicator (24)
  | |   | \-date (24)
  | |   |-classification-level (24)
  | |   |-section (24)
  | |   |-class (24)
  | |   |-subclass (24)
  | |   |-main-group (24)
  | |   |-subgroup (24)
  | |   |-symbol-position (24)
  | |   |-classification-value (24)
  | |   |-action-date (24)
  | |   | \-date (24)
  | |   |-generating-office (24)
  | |   | \-country (24)
  | |   |-classification-status (24)
  | |   \-classification-data-source (24)
  | |-us-botanic (21)
  | | |-latin-name (21)
  | | \-variety (21)
  | \-classifications-cpc (1)
  |   \-main-cpc (1)
  |     \-classification-cpc (1)
  |       |-cpc-version-indicator (1)
  |       | \-date (1)
  |       |-section (1)
  |       |-class (1)
  |       |-subclass (1)
  |       |-main-group (1)
  |       |-subgroup (1)
  |       |-symbol-position (1)
  |       |-classification-value (1)
  |       |-action-date (1)
  |       | \-date (1)
  |       |-generating-office (1)
  |       | \-country (1)
  |       |-classification-status (1)
  |       |-classification-data-source (1)
  |       \-scheme-origination-code (1)
  |-drawings (472)
  | \-figure (3033)
  |   \-img (3033)
  |-description (472)
  | |-description-of-drawings (472)
  | | |-p (3955)
  | | | |-figref (4478)
  | | | |-b (86)
  | | | \-i (6)
  | | \-heading (22)
  | |-heading (162)
  | \-p (340)
  |   |-figref (15)
  |   |-b (250)
  |   |-i (146)
  |   |-ul (96)
  |   | \-li (444)
  |   |   |-ul (215)
  |   |   | \-li (273)
  |   |   |   |-ul (199)
  |   |   |   | \-li (1192)
  |   |   |   |   |-i (1219)
  |   |   |   |   |-b (1)
  |   |   |   |   |-sup (10)
  |   |   |   |   \-sub (2)
  |   |   |   \-i (11)
  |   |   |-sup (2)
  |   |   \-i (26)
  |   |-tables (15)
  |   | \-table (15)
  |   |   \-tgroup (49)
  |   |     |-colspec (175)
  |   |     |-thead (15)
  |   |     | \-row (27)
  |   |     |   \-entry (51)
  |   |     \-tbody (49)
  |   |       \-row (291)
  |   |         \-entry (997)
  |   |           \-sup (28)
  |   \-sup (2)
  |-us-claim-statement (472)
  |-claims (472)
  | \-claim (476)
  |   \-claim-text (476)
  |     |-figref (1)
  |     |-claim-text (5)
  |     |-claim-ref (4)
  |     \-i (15)
  \-abstract (22)
    \-p (22)
      |-i (27)
      \-ul (2)
        \-li (2)
          \-ul (2)
            \-li (11)

为什么您首先有这样一个XML文件?根据XML规范,您的文档看起来无效。。。多个相同的处理指令,没有唯一的根节点,afaik!DOCTYPE不是有效的节点名称,它未关闭。。。我怀疑是否有一个解析器会毫无怨言地接受它…X-Ref:嗨,谢谢你的回复。我有很多这样的文件要解析。那么你能解释更多细节吗?什么是整理?我如何读取文件的内容?fread不能正确地处理这类文件!