Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/performance/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在PHP中高效解析大型XML文件以生成SQL_Php_Performance_Simplexml_Xmlreader - Fatal编程技术网

在PHP中高效解析大型XML文件以生成SQL

在PHP中高效解析大型XML文件以生成SQL,php,performance,simplexml,xmlreader,Php,Performance,Simplexml,Xmlreader,我正在尝试解析一个大型XML文件并将其加载到MySQL中。我使用了simplexml来解析它,它工作得很好,但是对于这个大型XML文件来说,它的速度会变慢。现在我正在尝试使用XMLReader 以下是XML的示例: <?xml version="1.0" encoding="UTF-8"?> <drug type="biotech" created="2005-06-13" updated="2015-02-23"> <drugbank-id primary="t

我正在尝试解析一个大型XML文件并将其加载到MySQL中。我使用了
simplexml
来解析它,它工作得很好,但是对于这个大型XML文件来说,它的速度会变慢。现在我正在尝试使用
XMLReader

以下是XML的示例:

<?xml version="1.0" encoding="UTF-8"?>

<drug type="biotech" created="2005-06-13" updated="2015-02-23">
<drugbank-id primary="true">DB00001</drugbank-id>
<drugbank-id>BIOD00024</drugbank-id>
<drugbank-id>BTD00024</drugbank-id>
<name>Lepirudin</name>
<description>Lepirudin is identical </description>
<cas-number>120993-53-5</cas-number>
<groups>
  <group>approved</group>
</groups>
<pathways>
<pathway>
  <smpdb-id>SMP00278</smpdb-id>
  <name>Lepirudin Action Pathway</name>
  <drugs>
    <drug>
      <drugbank-id>DB00001</drugbank-id>
      <name>Lepirudin</name>
    </drug>
    <drug>
      <drugbank-id>DB01373</drugbank-id>
      <name>Calcium</name>
    </drug>
  </drugs>
...
</drug>

<drug type="biotech" created="2005-06-15" updated="2015-02-25">
...
</drug>
它工作正常,它给我7789行。但是,我想使用
XMLReader
来解析它。但是
XMLReader
的问题是,我发现它的行数超过35000行

如果查看XML,您可以看到在
节点中还有一些其他
子节点。我怎样才能克服这个问题

下面是我使用
XMLReader
的过程:

<?php

$servername = "localhost"; // Example : localhost
$username   = "root";
$password   = "pass";
$dbname     = "dbname";

// Create connection
$conn = new mysqli($servername, $username, $password, $dbname);
// Check connection
if ($conn->connect_error) {
    die("Connection failed: " . $conn->connect_error);
} 

$reader = new XMLReader();
$reader->open('drugbank.xml');
while ($reader->read())
{
    if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'drug')
    {
        $doc = new DOMDocument('1.0', 'UTF-8');
        $xml = simplexml_import_dom($doc->importNode($reader->expand(),true));

        $name = $xml->name;
        $description  = $xml->description;
        $casnumber = $xml->{'cas-number'};

        // ...

        $sql = "INSERT INTO `drug` (name, description,cas_number,created,updated,type) 
VALUES ('$name', '$description','$casnumber','$created','$updated','$type')";

        if ($conn->query($sql) === TRUE) {
            $last_id = $conn->insert_id;
        } else {
            echo "outer else Error: " . $sql . "<br>" . $conn->error. "<br>" ;
        }
    }
}

$conn->close();
open('drugbank.xml');
而($reader->read())
{
如果($reader->nodeType==XMLReader::ELEMENT&&$reader->name=='drug')
{
$doc=新的DOMDocument('1.0','UTF-8');
$xml=simplexml\u import\u dom($doc->importNode($reader->expand(),true));
$name=$xml->name;
$description=$xml->description;
$casnumber=$xml->{'cas-number'};
// ...
$sql=“插入“药物”(名称、说明、cas号、创建、更新、类型)
值(“$name”、“$description”、“$casnumber”、“$created”、“$updated”、“$type”);
if($conn->query($sql)==TRUE){
$last\u id=$conn->insert\u id;
}否则{
回显“外部else错误:“.$sql.”
“$conn->错误。”
”; } } } $conn->close();

在这个示例中,我发现它提供了35000多行。

好的,我有一个在执行速度、内存使用和数据库负载方面有很大改进的工作示例:

<?php
define('INSERT_BATCH_SIZE', 500);
define('DRUG_XML_FILE', 'drugbank.xml');

$servername = "localhost"; // Example : localhost
$username   = "root";
$password   = "pass";
$dbname     = "dbname";

function parseXml($mysql)
{
    $drugs = array();

    $xmlReader = new XMLReader();
    $xmlReader->open(DRUG_XML_FILE);

    // Move our pointer to the first <drug /> element.
    while ($xmlReader->read() && $xmlReader->name !== 'drug') ;

    $drugCount = 0;
    $totalDrugs = 0;

    // Iterate over the outer <drug /> elements.
    while ($xmlReader->name == 'drug')
    {
        // Convert the node into a SimpleXMLElement for ease of use.
        $item = new SimpleXMLElement($xmlReader->readOuterXML());

        $name = $item->name;
        $description = $item->description;
        $casNumber = $item->{'cas-number'};
        $created = $item['created'];
        $updated = $item['updated'];
        $type = $item['type'];

        $drugs[] = "('$name', '$description','$casNumber','$created','$updated','$type')";
        $drugCount++;
        $totalDrugs++;

        // Once we've reached the desired batch size, insert the batch and reset the counter.
        if ($drugCount >= INSERT_BATCH_SIZE)
        {
            batchInsertDrugs($mysql, $drugs);
            $drugCount = 0;
        }

        // Go to next <drug />.
        $xmlReader->next('drug');
    }

    $xmlReader->close();

    // Insert the leftovers from the last batch.
    batchInsertDrugs($mysql, $drugs);

    echo "Inserted $totalDrugs total drugs.";
}

function batchInsertDrugs($mysql, &$drugs)
{
    // Generate a batched INSERT statement.
    $statement = "INSERT INTO `drug` (name, description, cas_number, created, updated, type) VALUES";
    $statement = $statement . ' ' . implode(",\n", $drugs);

    echo $statement, "\n";

    // Run the batch INSERT.
    if ($mysql->query($statement))
    {
        echo "Inserted " . count($drugs) . " drugs.";
    }
    else
    {
        echo "INSERT Error: " . $statement . "<br>" . $mysql->error. "<br>" ;
    }

    // Clear the buffer.
    $drugs = array();
}

// Create MySQL connection.
$mysql = new mysqli($servername, $username, $password, $dbname);
if ($mysql->connect_error)
{
    die("Connection failed: " . $mysql->connect_error);
}

parseXml($mysql);
open(药品XML文件);
//将指针移到第一个元素。
而($xmlReader->read()&&$xmlReader->name!=='drug');
$drugCount=0;
$totaldruges=0;
//迭代外部元素。
而($xmlReader->name=='drug')
{
//将节点转换为SimpleXMLElement以便于使用。
$item=newsimplexmlement($xmlReader->readOuterXML());
$name=$item->name;
$description=$item->description;
$casNumber=$item->{'cas-number'};
$created=$item['created'];
$updated=$item['updated'];
$type=$item['type'];
$druges[]=“(“$name”、“$description”、“$casNumber”、“$created”、“$updated”、“$type”)”;
$drugCount++;
$totalDrugs++;
//一旦达到所需的批次大小,插入批次并重置计数器。
如果($drugCount>=插入批次大小)
{
batchInsertDrugs($mysql,$drugs);
$drugCount=0;
}
//转到下一个。
$xmlReader->next('drug');
}
$xmlReader->close();
//插入最后一批的剩菜。
batchInsertDrugs($mysql,$drugs);
echo“插入$TotalDruges总计药品。”;
}
函数batchInsertDrugs($mysql,&$drugs)
{
//生成批处理的INSERT语句。
$statement=“插入“药物”(名称、说明、cas号、创建、更新、类型)值”;
$statement=$statement.'''。内爆(“,\n”,$druges);
echo$语句“\n”;
//运行批插入。
if($mysql->query($statement))
{
回声“插入”。计数($药物)。“药物”;
}
其他的
{
echo“插入错误:“.$statement.”
“$mysql->错误。”
”; } //清除缓冲区。 $druges=array(); } //创建MySQL连接。 $mysql=newmysqli($servername、$username、$password、$dbname); 如果($mysql->connect\u错误) { die(“连接失败:”..mysql->connect\u错误); } parseXml($mysql);
我使用。 使用SimpleXML会导致在内存中解析整个文档,这既慢又占用内存。这种方法使用的是一种快速拉式解析器。您可能仍然可以使用加快速度,但它的模式有点复杂,上面的示例将明显优于您开始使用的示例

在我的示例中,另一个重要的变化是我们使用了MySQL批处理插入,因此我们实际上只在处理每
500个
(可配置)项时访问数据库。您可以调整此数字以获得更好的性能。在某个时间点之后,查询将变得太大,MySQL无法处理,但是您一次可以做的远远不止是
500


如果您想让我进一步解释其中的任何部分,或者您对此有任何问题,请在评论中告诉我!:)

使用PHP解析大型XML文件是一个不好的主意,因为需要提示答案。。。你的解决方案非常有效,比我的好得多
<?php
define('INSERT_BATCH_SIZE', 500);
define('DRUG_XML_FILE', 'drugbank.xml');

$servername = "localhost"; // Example : localhost
$username   = "root";
$password   = "pass";
$dbname     = "dbname";

function parseXml($mysql)
{
    $drugs = array();

    $xmlReader = new XMLReader();
    $xmlReader->open(DRUG_XML_FILE);

    // Move our pointer to the first <drug /> element.
    while ($xmlReader->read() && $xmlReader->name !== 'drug') ;

    $drugCount = 0;
    $totalDrugs = 0;

    // Iterate over the outer <drug /> elements.
    while ($xmlReader->name == 'drug')
    {
        // Convert the node into a SimpleXMLElement for ease of use.
        $item = new SimpleXMLElement($xmlReader->readOuterXML());

        $name = $item->name;
        $description = $item->description;
        $casNumber = $item->{'cas-number'};
        $created = $item['created'];
        $updated = $item['updated'];
        $type = $item['type'];

        $drugs[] = "('$name', '$description','$casNumber','$created','$updated','$type')";
        $drugCount++;
        $totalDrugs++;

        // Once we've reached the desired batch size, insert the batch and reset the counter.
        if ($drugCount >= INSERT_BATCH_SIZE)
        {
            batchInsertDrugs($mysql, $drugs);
            $drugCount = 0;
        }

        // Go to next <drug />.
        $xmlReader->next('drug');
    }

    $xmlReader->close();

    // Insert the leftovers from the last batch.
    batchInsertDrugs($mysql, $drugs);

    echo "Inserted $totalDrugs total drugs.";
}

function batchInsertDrugs($mysql, &$drugs)
{
    // Generate a batched INSERT statement.
    $statement = "INSERT INTO `drug` (name, description, cas_number, created, updated, type) VALUES";
    $statement = $statement . ' ' . implode(",\n", $drugs);

    echo $statement, "\n";

    // Run the batch INSERT.
    if ($mysql->query($statement))
    {
        echo "Inserted " . count($drugs) . " drugs.";
    }
    else
    {
        echo "INSERT Error: " . $statement . "<br>" . $mysql->error. "<br>" ;
    }

    // Clear the buffer.
    $drugs = array();
}

// Create MySQL connection.
$mysql = new mysqli($servername, $username, $password, $dbname);
if ($mysql->connect_error)
{
    die("Connection failed: " . $mysql->connect_error);
}

parseXml($mysql);