如果节点值包含url,如何删除xml节点?

如果节点值包含url,如何删除xml节点?,xml,xslt,Xml,Xslt,我想使用这个xml准备一个xsd,并进一步处理这些行以将数据插入数据库。为了准备xsd,使用xslt将结构转换为所需的格式 <linked-hash-map> <entry> <string>_type</string> <string>News</string> </entry> <entry> <string>value</string>

我想使用这个xml准备一个xsd,并进一步处理这些行以将数据插入数据库。为了准备xsd,使用xslt将结构转换为所需的格式

<linked-hash-map>
  <entry>
    <string>_type</string>
    <string>News</string>
  </entry>
  <entry>
    <string>value</string>
    <list>
      <linked-hash-map>
        <entry>
          <string>name</string>
          <string>
            Virat Kohli 
          </string>
        </entry>
        <entry>
          <string>url</string>
          <string>
            http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&rd=1&h=nw8K4uNRgs-nvsuz2GyXpqMxdRmzWK8Xbm3W_1IlO24&v=1&r=http%3a%2f%2fmovies.ndtv.com%2fbollywood%2fvirat-kohli-hearts-anushka-sharma-a-timeline-of-their-romance-1659877&p=DevEx,5026.1
          </string>
        </entry>
        <entry>
          <string>image</string>
          <linked-hash-map>
            <entry>
              <string>thumbnail</string>
              <linked-hash-map>
                <entry>
                  <string>contentUrl</string>
                  <string>
                    https://www.bing.com/th?id=ON.EE674002EC235BD5795D34695EABF504&pid=News
                  </string>
                </entry>
                <entry>
                  <string>width</string>
                  <int>640</int>
                </entry>
              </linked-hash-map>
            </entry>
          </linked-hash-map>
        </entry>
        <entry>
          <string>description</string>
          <string>
            On Wednesday, cricketer Virat Kohli
          </string>
        </entry>
        <entry>
          <string>datePublished</string>
          <string>2017-02-16T05:39:00</string>
        </entry>
        <entry>
          <string>category</string>
          <string>Entertainment</string>
        </entry>
      </linked-hash-map>
      <linked-hash-map>
        <entry>
          <string>name</string>
          <string>
            Shah Rukh Khan’s TV show
          </string>
        </entry>
        <entry>
          <string>url</string>
          <string>
            http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&rd=1&h=4CnQhOg9Nm7pmIu9OvDl6x9WtYtSuXblCSR_WQz1VoA&v=1&r=http%3a%2f%2fwww.hindustantimes.com%2ftv%2fshah-rukh-khan-s-tv-show-circus-is-back-on-small-screen%2fstory-OjQUQIWi6ogxj5eF1hivTI.html&p=DevEx,5040.1
          </string>
        </entry>
        <entry>
          <string>image</string>
          <linked-hash-map>
            <entry>
              <string>thumbnail</string>
              <linked-hash-map>
                <entry>
                  <string>contentUrl</string>
                  <string>
                    https://www.bing.com/th?id=ON.2974262BB8317FA4D4BCE4A61CA9488E&pid=News
                  </string>
                </entry>
                <entry>
                  <string>width</string>
                  <int>700</int>
                </entry>
              </linked-hash-map>
            </entry>
          </linked-hash-map>
        </entry>
        <entry>
          <string>description</string>
          <string>
            Here’s some wonderful news 
          </string>
        </entry>
        <entry>
          <string>datePublished</string>
          <string>2017-02-16T05:36:00</string>
        </entry>
        <entry>
          <string>category</string>
          <string>Entertainment</string>
        </entry>
      </linked-hash-map>
    </list>
  </entry>
</linked-hash-map>

_类型
新闻
价值
名称
维拉特科利
网址
http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&rd=1&h=nw8K4uNRgs-nvsuz2GyXpqMxdRmzWK8Xbm3W\u 1IlO24&v=1&r=http%3a%2f%2fmovies.ndtv.com%2fbollywood%2fvirat-kohli-hearts-anushka-sharma-a-timeline-of-Thers-romance-1659877&p=DevEx,5026.1
形象
缩略图
contentUrl
https://www.bing.com/th?id=ON.EE674002EC235BD5795D34695EABF504&pid=News
宽度
640
描述
周三,板球运动员维拉特·科利
出版日期
2017-02-16T05:39:00
类别
游戏娱乐
名称
沙鲁克汗的电视节目
网址
http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&rd=1&h=4CnQhOg9Nm7pmIu9OvDl6x9WtYtSuXblCSR_WQz1VoA&v=1&r=http%3a%2f%2fwww.hindustantimes.com%2ftv%2fshah-rukh-khan-s-tv-show-circus-is-back-on-small-screen%2fstory-OjQUQIWi6ogxj5eF1hivTI.html&p=DevEx,5040.1
形象
缩略图
contentUrl
https://www.bing.com/th?id=ON.2974262BB8317FA4D4BCE4A61CA9488E&pid=News
宽度
700
描述
这里有一些好消息
出版日期
2017-02-16T05:36:00
类别
游戏娱乐
这里URL有查询字符串。如何删除URL或如何使用querystring对URL进行编码

期望输出:

<?xml version="1.0" encoding="utf-8"?>
<linked-hash-map>
  <entry>
    <linked-hash-map>
      <_type>News</_type>
      <datarow>
        <name> Virat Kohli</name>
        <url>http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&rd=1&h=nw8K4uNRgs-nvsuz2GyXpqMxdRmzWK8Xbm3W_1IlO24&v=1&r=http%3a%2f%2fmovies.ndtv.com%2fbollywood%2fvirat-kohli-hearts-anushka-sharma-a-timeline-of-their-romance-1659877&p=DevEx,5026.1</url>
        <contentUrl>  https://www.bing.com/th?id=ON.EE674002EC235BD5795D34695EABF504&pid=News </contentUrl>
        <width>640</width>
        <description> On Wednesday, cricketer Virat Kohli</description>
        <readLink> https://api.cognitive.microsoft.com/api/v5/entities/b8ef6b82-02be-1e24-584c-f8283b7bdaeb </readLink>
        <datePublished>2017-02-16T05:39:00</datePublished>
        <category>Entertainment</category>     
      </datarow>
      <datarow>
        <name> Shah Rukh Khan’s TV show</name>
        <url> http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&rd=1&h=4CnQhOg9Nm7pmIu9OvDl6x9WtYtSuXblCSR_WQz1VoA&v=1&r=http%3a%2f%2fwww.hindustantimes.com%2ftv%2fshah-rukh-khan-s-tv-show-circus-is-back-on-small-screen%2fstory-OjQUQIWi6ogxj5eF1hivTI.html&p=DevEx,5040.1 </url>
        <contentUrl>  https://www.bing.com/th?id=ON.EE674002EC235BD5795D34695EABF504&pid=News </contentUrl>
        <width>640</width>
        <description> Here’s some wonderful news </description>
        <readLink> https://api.cognitive.microsoft.com/api/v5/entities/b8ef6b82-02be-1e24-584c-f8283b7bdaeb </readLink>
        <datePublished>2017-02-16T05:39:00</datePublished>
        <category>Entertainment</category>
      </datarow>
    </linked-hash-map>
  </entry>
</linked-hash-map>

新闻
维拉特科利
http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&rd=1&h=nw8K4uNRgs-nvsuz2GyXpqMxdRmzWK8Xbm3W\u 1IlO24&v=1&r=http%3a%2f%2fmovies.ndtv.com%2fbollywood%2fvirat-kohli-hearts-anushka-sharma-a-timeline-of-Thers-romance-1659877&p=DevEx,5026.1
https://www.bing.com/th?id=ON.EE674002EC235BD5795D34695EABF504&pid=News 
640
周三,板球运动员维拉特·科利
https://api.cognitive.microsoft.com/api/v5/entities/b8ef6b82-02be-1e24-584c-f8283b7bdaeb 
2017-02-16T05:39:00
游戏娱乐
沙鲁克汗的电视节目
http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&rd=1&h=4CnQhOg9Nm7pmIu9OvDl6x9WtYtSuXblCSR_WQz1VoA&v=1&r=http%3a%2f%2fwww.hindustantimes.com%2ftv%2fshah-rukh-khan-s-tv-show-circus-is-back-on-small-screen%2fstory-OjQUQIWi6ogxj5eF1hivTI.html&p=DevEx,5040.1
https://www.bing.com/th?id=ON.EE674002EC235BD5795D34695EABF504&pid=News 
640
这里有一些好消息
https://api.cognitive.microsoft.com/api/v5/entities/b8ef6b82-02be-1e24-584c-f8283b7bdaeb 
2017-02-16T05:39:00
游戏娱乐
下面是我用来转换这个结构的脚本

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="/linked-hash-map">
    <xsl:element name="{local-name()}">
      <xsl:for-each select="entry">
        <xsl:choose>
          <xsl:when test="list/linked-hash-map">
            <xsl:for-each select="list/linked-hash-map">
              <datarow>
                <xsl:for-each select="entry">
                  <xsl:if test="not(node()[1]='image' or node()[1]='about' or node()[1]='clusteredArticles'  or node()[1]='mentions' or node()[1]='provider' or node()[1]='url' or node()[1]='description' or node()[1]='name')">
                    <xsl:text disable-output-escaping="yes">&lt;</xsl:text>
                    <xsl:value-of select="*[1]"/>
                    <xsl:text disable-output-escaping="yes">&gt;</xsl:text>
                    <xsl:value-of select="*[2]"/>
                    <xsl:text disable-output-escaping="yes">&lt;/</xsl:text>
                    <xsl:value-of select="*[1]"/>
                    <xsl:text disable-output-escaping="yes">&gt;</xsl:text>
                  </xsl:if>
                </xsl:for-each>
              </datarow>
            </xsl:for-each>
          </xsl:when>
          <xsl:otherwise>
            <xsl:text disable-output-escaping="yes">&lt;</xsl:text>
            <xsl:value-of select="*[1]"/>
            <xsl:text disable-output-escaping="yes">&gt;</xsl:text>
            <xsl:value-of select="*[2]"/>
            <xsl:text disable-output-escaping="yes">&lt;/</xsl:text>
            <xsl:value-of select="*[1]"/>
            <xsl:text disable-output-escaping="yes">&gt;</xsl:text>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:for-each>
    </xsl:element>
  </xsl:template>
  <xsl:template match="/">
    <xsl:copy>
      <linked-hash-map>
        <entry>
          <xsl:apply-templates/>
        </entry>
      </linked-hash-map>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

/
/

当前,您的原始XML格式不正确,因为URL中使用的符号和必须替换为相应的,即
&

仔细检查原始XML的呈现方式,因为不应该将其开发为包含串联字符串的文本文件(构建此标记的一种方法)。不幸的是,这是通用编程中的常见做法。XML文档应该使用W3C兼容的DOM库(即Java的
javax.XML
、Python的
XML.etree
、PHP的
DOMDocument
、.NET的
XmlDocument
)及其
createElement
appendChild
setAttribute
或相应的方法构建

一旦呈现一个有效的XML,请考虑下面更通用的XSLT。

输入(针对字符实体进行调整)


_类型
新闻
价值
名称
维拉特科利
网址
http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&;rd=1&;h=nw8K4uNRgs-NVSUZ2GYXPQMXDRMZWK8KBM3W\u 1IlO24&;v=1&;r=http%3a%2f%2fmooks.ndtv.com%2f宝莱坞%2fvirat-kohli-hearts-anushka-sharma-a-timeline-of-they-roman-1659877&;p=DevEx,5026.1
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<linked-hash-map>
  <entry>
    <string>_type</string>
    <string>News</string>
  </entry>
  <entry>
    <string>value</string>
    <list>
      <linked-hash-map>
        <entry>
          <string>name</string>
          <string>
            Virat Kohli 
          </string>
        </entry>
        <entry>
          <string>url</string>
          <string>
            http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&amp;CID=09E4F1057ADB64720330FB2E7BC96547&amp;rd=1&amp;h=nw8K4uNRgs-nvsuz2GyXpqMxdRmzWK8Xbm3W_1IlO24&amp;v=1&amp;r=http%3a%2f%2fmovies.ndtv.com%2fbollywood%2fvirat-kohli-hearts-anushka-sharma-a-timeline-of-their-romance-1659877&amp;p=DevEx,5026.1
          </string>
        </entry>
        <entry>
          <string>image</string>
          <linked-hash-map>
            <entry>
              <string>thumbnail</string>
              <linked-hash-map>
                <entry>
                  <string>contentUrl</string>
                  <string>
                    https://www.bing.com/th?id=ON.EE674002EC235BD5795D34695EABF504&amp;pid=News
                  </string>
                </entry>
                <entry>
                  <string>width</string>
                  <int>640</int>
                </entry>
              </linked-hash-map>
            </entry>
          </linked-hash-map>
        </entry>
        <entry>
          <string>description</string>
          <string>
            On Wednesday, cricketer Virat Kohli
          </string>
        </entry>
        <entry>
          <string>datePublished</string>
          <string>2017-02-16T05:39:00</string>
        </entry>
        <entry>
          <string>category</string>
          <string>Entertainment</string>
        </entry>
      </linked-hash-map>
      <linked-hash-map>
        <entry>
          <string>name</string>
          <string>
            Shah Rukh Khan's TV show
          </string>
        </entry>
        <entry>
          <string>url</string>
          <string>
            http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&amp;CID=09E4F1057ADB64720330FB2E7BC96547&amp;rd=1&amp;h=4CnQhOg9Nm7pmIu9OvDl6x9WtYtSuXblCSR_WQz1VoA&amp;v=1&amp;r=http%3a%2f%2fwww.hindustantimes.com%2ftv%2fshah-rukh-khan-s-tv-show-circus-is-back-on-small-screen%2fstory-OjQUQIWi6ogxj5eF1hivTI.html&amp;p=DevEx,5040.1
          </string>
        </entry>
        <entry>
          <string>image</string>
          <linked-hash-map>
            <entry>
              <string>thumbnail</string>
              <linked-hash-map>
                <entry>
                  <string>contentUrl</string>
                  <string>
                    https://www.bing.com/th?id=ON.2974262BB8317FA4D4BCE4A61CA9488E&amp;pid=News
                  </string>
                </entry>
                <entry>
                  <string>width</string>
                  <int>700</int>
                </entry>
              </linked-hash-map>
            </entry>
          </linked-hash-map>
        </entry>
        <entry>
          <string>description</string>
          <string>
            Here's some wonderful news 
          </string>
        </entry>
        <entry>
          <string>datePublished</string>
          <string>2017-02-16T05:36:00</string>
        </entry>
        <entry>
          <string>category</string>
          <string>Entertainment</string>
        </entry>
      </linked-hash-map>
    </list>
  </entry>
</linked-hash-map>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  <xsl:strip-space elements="*"/>

  <!-- APPLY ONLY SECOND ENTRY OFF ROOT -->  
  <xsl:template match="/linked-hash-map">
    <xsl:copy>      
      <xsl:apply-templates select="entry[2]"/>      
    </xsl:copy>
  </xsl:template>

  <xsl:template match="entry[2]">
    <xsl:copy>
      <!-- RETRIEVE FIRST ENTRY CONTENT -->  
      <xsl:element name="{preceding-sibling::entry/string[1]}">
        <xsl:value-of select="preceding-sibling::entry/string[2]"/>
      </xsl:element>
      <!-- APPLY GRANDCHILD LINKED HASH MAP -->
      <linked-hash-map><xsl:apply-templates select="list/linked-hash-map"/></linked-hash-map>
    </xsl:copy>
  </xsl:template>

  <!-- GENERALIZE FOR ALL DESCENDANT ENTRY NODES (W/O LINKED HASH MAP CHILD) -->  
  <xsl:template match="linked-hash-map">    
    <datarow>
      <xsl:for-each select="descendant::entry[local-name(*[2])!='linked-hash-map']">        
          <xsl:element name="{string[1]}">
            <xsl:value-of select="normalize-space(string[2]|int)"/>
          </xsl:element>
      </xsl:for-each>
      <!-- ADDED NODE (NOT PART OF ORIGINAL) -->
      <readLink>https://api.cognitive.microsoft.com/api/v5/entities/b8ef6b82-02be-1e24-584c-f8283b7bdaeb</readLink>
    </datarow>    
  </xsl:template>

</xsl:stylesheet>
<?xml version="1.0" encoding="UTF-8"?>
<linked-hash-map>
   <entry>
      <_type>News</_type>
      <linked-hash-map>
         <datarow>
            <name>Virat Kohli</name>
            <url>http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&amp;CID=09E4F1057ADB64720330FB2E7BC96547&amp;rd=1&amp;h=nw8K4uNRgs-nvsuz2GyXpqMxdRmzWK8Xbm3W_1IlO24&amp;v=1&amp;r=http%3a%2f%2fmovies.ndtv.com%2fbollywood%2fvirat-kohli-hearts-anushka-sharma-a-timeline-of-their-romance-1659877&amp;p=DevEx,5026.1</url>
            <contentUrl>https://www.bing.com/th?id=ON.EE674002EC235BD5795D34695EABF504&amp;pid=News</contentUrl>
            <width>640</width>
            <description>On Wednesday, cricketer Virat Kohli</description>
            <datePublished>2017-02-16T05:39:00</datePublished>
            <category>Entertainment</category>
            <readLink>https://api.cognitive.microsoft.com/api/v5/entities/b8ef6b82-02be-1e24-584c-f8283b7bdaeb</readLink>
         </datarow>
         <datarow>
            <name>Shah Rukh Khan's TV show</name>
            <url>http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&amp;CID=09E4F1057ADB64720330FB2E7BC96547&amp;rd=1&amp;h=4CnQhOg9Nm7pmIu9OvDl6x9WtYtSuXblCSR_WQz1VoA&amp;v=1&amp;r=http%3a%2f%2fwww.hindustantimes.com%2ftv%2fshah-rukh-khan-s-tv-show-circus-is-back-on-small-screen%2fstory-OjQUQIWi6ogxj5eF1hivTI.html&amp;p=DevEx,5040.1</url>
            <contentUrl>https://www.bing.com/th?id=ON.2974262BB8317FA4D4BCE4A61CA9488E&amp;pid=News</contentUrl>
            <width>700</width>
            <description>Here's some wonderful news</description>
            <datePublished>2017-02-16T05:36:00</datePublished>
            <category>Entertainment</category>
            <readLink>https://api.cognitive.microsoft.com/api/v5/entities/b8ef6b82-02be-1e24-584c-f8283b7bdaeb</readLink>
         </datarow>
      </linked-hash-map>
   </entry>
</linked-hash-map>