Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/sql-server/24.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/xml/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Sql server 查询不符合XML的结构化数据_Sql Server_Xml_Xml Parsing_Xquery_Sgml - Fatal编程技术网

Sql server 查询不符合XML的结构化数据

Sql server 查询不符合XML的结构化数据,sql-server,xml,xml-parsing,xquery,sgml,Sql Server,Xml,Xml Parsing,Xquery,Sgml,作为一名数据分析员,我经常会遇到一些文件,这些文件包含一些专有格式的结构化数据,并且无法进行正常的XML解析 例如,我有一个大约100个文档的档案,所有文档都以以下内容开头: <!DOCTYPE DOCUMENT PUBLIC "-//Gale Research//DTD Document V2.0//EN"> 我在下面提供了一个文档的节略示例,如果你对克隆感到不快,请不要阅读 无论如何,有没有一种方法可以在没有DTD、名称空间或URI或我需要的任何东西的情况下查询这个问题?我

作为一名数据分析员,我经常会遇到一些文件,这些文件包含一些专有格式的结构化数据,并且无法进行正常的XML解析

例如,我有一个大约100个文档的档案,所有文档都以以下内容开头:

<!DOCTYPE DOCUMENT PUBLIC "-//Gale Research//DTD Document V2.0//EN">

我在下面提供了一个文档的节略示例,如果你对克隆感到不快,请不要阅读

无论如何,有没有一种方法可以在没有DTD、名称空间或URI或我需要的任何东西的情况下查询这个问题?我可以使用SQL Server 2012+或xquery,也可以使用php或vba

<!DOCTYPE DOCUMENT PUBLIC "-//Gale Research//DTD Document V2.0//EN">
<document synfileid="MCIESS0044">
<galedata><project>
<projectname>
<title>Opposing Viewpoints Resource Center</title>
</projectname>
</project></galedata>
<doc.head>
<title>Cloning</title>
</doc.head>
<doc.body>

<para>A clone is an identical copy of a plant or animal, produced from the genetic material of a single organism. In 1996 scientists in Britain created a sheep named Dolly, the first successful clone of an adult mammal. Since then, scientists have successfully cloned other animals, such as goats, mice, pigs, and rabbits. People began wondering if human beings would be next. The question of whether human cloning should be allowed, and under what conditions, raises a number of challenging scientific, legal, and ethical issues&#x2014;including what it means to be human.</para>

<head n="1">Scientific Background</head>
<para>People have been cloning plants for thousands of years. Some plants produce offspring without any genetic material from another organism. In these cases, cloning simply requires cutting pieces of the stems, roots, or leaves of the plants and then planting the cuttings. The cuttings will grow into identical copies of the originals. Many common fruits, vegetables, and ornamental plants are produced in this way from parent plants with especially desirable characteristics.</para>

<para>&#91;lots of excluded text&#93; Perhaps the most perplexing question of all: How would clones feel about their status? As a copy, would they lack the sense of uniqueness that is part of the human condition? As yet, such questions have no answers&#x2014;perhaps they never will. The debate about cloning, both animal and human, however, will certainly continue. The technology exists to create clones. How will society use this technology?</para>

</doc.body>
</document>

反对观点资源中心
克隆
克隆是一种植物或动物的完全相同的复制品,由单一生物体的遗传物质产生。1996年,英国科学家创造了一只名叫多利的绵羊,这是第一只成功克隆成年哺乳动物的绵羊。从那时起,科学家们成功地克隆了其他动物,如山羊、老鼠、猪和兔子。人们开始怀疑人类是否是下一个。是否允许克隆人以及在何种条件下克隆人的问题引发了一系列具有挑战性的科学、法律和伦理问题;包括它对人类的意义。
科学背景
几千年来,人们一直在克隆植物。有些植物的后代没有来自其他生物体的任何遗传物质。在这些情况下,克隆只需要切割植物的茎、根或叶,然后种植插条。剪报将成长为与原件完全相同的副本。许多常见的水果、蔬菜和观赏植物都是以这种方式从具有特别理想特性的亲本植物中生产出来的。
[大量被排除的文本];也许最令人困惑的问题是:克隆人会如何看待自己的地位?作为一个复制品,它们是否会缺少作为人类条件一部分的独特感?到目前为止,这些问题还没有答案;也许他们永远不会。然而,关于动物和人类克隆的争论肯定会继续下去。这项技术的存在是为了创造克隆。社会将如何使用这项技术?

< /代码> 如果您正在寻找一个辅助函数,它将解析几乎任何XML,请考虑以下内容:

结果可能超出了需要,但很容易减少。在我的发现阶段,我会经常使用它

示例

Declare @XML xml ='
<document synfileid="MCIESS0044">
  <galedata>
    <project>
      <projectname>
        <title>Opposing Viewpoints Resource Center</title>
      </projectname>
    </project>
  </galedata>
  <doc.head>
    <title>Cloning</title>
  </doc.head>
  <doc.body>
    <para>A clone is an identical copy of a plant or animal, produced from the genetic material of a single organism. In 1996 scientists in Britain created a sheep named Dolly, the first successful clone of an adult mammal. Since then, scientists have successfully cloned other animals, such as goats, mice, pigs, and rabbits. People began wondering if human beings would be next. The question of whether human cloning should be allowed, and under what conditions, raises a number of challenging scientific, legal, and ethical issues—including what it means to be human.</para>
    <head n="1">Scientific Background</head>
    <para>People have been cloning plants for thousands of years. Some plants produce offspring without any genetic material from another organism. In these cases, cloning simply requires cutting pieces of the stems, roots, or leaves of the plants and then planting the cuttings. The cuttings will grow into identical copies of the originals. Many common fruits, vegetables, and ornamental plants are produced in this way from parent plants with especially desirable characteristics.</para>
    <para>[lots of excluded text] Perhaps the most perplexing question of all: How would clones feel about their status? As a copy, would they lack the sense of uniqueness that is part of the human condition? As yet, such questions have no answers—perhaps they never will. The debate about cloning, both animal and human, however, will certainly continue. The technology exists to create clones. How will society use this technology?</para>
  </doc.body>
</document>
'

Select * 
 From  [dbo].[tvf-XML-Hier](@XML) 
 Order By R1
Declare@XML='XML'
反对观点资源中心
克隆
克隆是一种植物或动物的完全相同的复制品,由单一生物体的遗传物质产生。1996年,英国科学家创造了一只名叫多利的绵羊,这是第一只成功克隆成年哺乳动物的绵羊。从那时起,科学家们成功地克隆了其他动物,如山羊、老鼠、猪和兔子。人们开始怀疑人类是否是下一个。是否允许克隆人以及在何种条件下克隆人的问题引发了一系列具有挑战性的科学、法律和伦理问题,包括克隆人对人类意味着什么。
科学背景
几千年来,人们一直在克隆植物。有些植物的后代没有来自其他生物体的任何遗传物质。在这些情况下,克隆只需要切割植物的茎、根或叶,然后种植插条。剪报将成长为与原件完全相同的副本。许多常见的水果、蔬菜和观赏植物都是以这种方式从具有特别理想特性的亲本植物中生产出来的。
[大量被排除在外的文本]也许是最令人困惑的问题:克隆人会如何看待自己的地位?作为一个复制品,它们是否会缺少作为人类条件一部分的独特感?到目前为止,这些问题还没有答案,也许永远也不会。然而,关于动物和人类克隆的争论肯定会继续下去。这项技术的存在是为了创造克隆。社会将如何使用这项技术?
'
选择*
来自[dbo]。[tvf XML Hier](@XML)
R1订购
返回

TVF如果感兴趣

完全披露:原始来源是 ... 只是做了一些调整

CREATE FUNCTION [dbo].[tvf-XML-Hier](@XML xml)

Returns Table 
As Return

with  cte0 as ( 
                  Select Lvl       = 1
                        ,ID        = Cast(1 as int) 
                        ,Pt        = Cast(NULL as int)
                        ,Element   = x.value('local-name(.)','varchar(150)')
                        ,Attribute = cast('' as varchar(150))
                        ,Value     = x.value('text()[1]','varchar(max)')
                        ,XPath     = cast(concat(x.value('local-name(.)','varchar(max)'),'[' ,cast(Row_Number() Over(Order By (Select 1)) as int),']') as varchar(max))
                        ,Seq       = cast(1000000+Row_Number() over(Order By (Select 1)) as varchar(max))
                        ,AttData   = x.query('.') 
                        ,XMLData   = x.query('*') 
                  From   @XML.nodes('/*') a(x) 
                  Union  All
                  Select Lvl       = p.Lvl + 1 
                        ,ID        = Cast( (Lvl + 1) * 1024 + (Row_Number() Over(Order By (Select 1)) * 2) as int ) * 10
                        ,Pt        = p.ID
                        ,Element   = c.value('local-name(.)','varchar(150)')
                        ,Attribute = cast('' as varchar(150))
                        ,Value     = cast( c.value('text()[1]','varchar(max)') as varchar(max) ) 
                        ,XPath     = cast(concat(p.XPath,'/',c.value('local-name(.)','varchar(max)'),'[',cast(Row_Number() Over(PARTITION BY c.value('local-name(.)','varchar(max)') Order By (Select 1)) as int),']') as varchar(max) )
                        ,Seq       = cast(concat(p.Seq,' ',10000000+Cast( (Lvl + 1) * 1024 + (Row_Number() Over(Order By (Select 1)) * 2) as int ) * 10) as varchar(max))
                        ,AttData   = c.query('.') 
                        ,XMLData   = c.query('*') 
                  From   cte0 p 
                  Cross  Apply p.XMLData.nodes('*') b(c) 
              )
    , cte1 as (   
                  Select R1 = Row_Number() over (Order By Seq),A.*
                  From  (
                          Select  Lvl,ID,Pt,Element,Attribute,Value,XPath,Seq From cte0
                          Union All
                          Select Lvl       = p.Lvl+1
                                ,ID        = p.ID + Row_Number() over (Order By (Select NULL)) 
                                ,Pt        = p.ID
                                ,Element   = p.Element
                                ,Attribute = x.value('local-name(.)','varchar(150)')
                                ,Value     = x.value('.','varchar(max)')
                                ,XPath     = p.XPath + '/@' + x.value('local-name(.)','varchar(max)')
                                ,Seq       = cast(concat(p.Seq,' ',10000000+p.ID + Row_Number() over (Order By (Select NULL)) ) as varchar(max))
                          From   cte0 p 
                          Cross  Apply AttData.nodes('/*/@*') a(x) 
                        ) A 
               )

Select A.R1
      ,R2  = IsNull((Select max(R1) From cte1 Where Seq Like A.Seq+'%'),A.R1)
      ,A.Lvl
      ,A.ID
      ,A.Pt
      ,A.Element
      ,A.Attribute
      ,A.XPath
      ,Title = Replicate('|---',Lvl-1)+Element+IIF(Attribute='','','@'+Attribute)
      ,A.Value
 From  cte1 A

/*
Source: http://beyondrelational.com/modules/2/blogs/28/posts/10495/xquery-lab-58-select-from-xml.aspx

Declare @XML xml='<person><firstname preferred="Annie" nickname="BeBe">Annabelle</firstname><lastname>Smith</lastname></person>'
Select * from [dbo].[tvf-XML-Hier](@XML) Order by R1
*/
CREATE FUNCTION[dbo].[tvf-XML-Hier](@XML-XML)
返回表
作为回报
以cte0为(
选择Lvl=1
,ID=Cast(1作为int)
,Pt=Cast(NULL为int)
,Element=x.value('local-name(.),'varchar(150)'
,Attribute=cast(“”作为varchar(150))
,Value=x.Value('text()[1]','varchar(max)'
,XPath=cast(concat(x.value('local-name(.),'varchar(max)'),'[',cast(Order By(Select 1))上的行号()为int,']')为varchar(max))
,Seq=将(1000000+行号()转换为varchar(最大值))
,AttData=x.query(“.”)
,XMLData=x.query(“*”)
来自@XML.nodes('/*')a(x)
联合所有
选择Lvl=p。Lvl+1
,ID=Cast((Lvl+1)*1024+(行号()在(排序依据(选择1))*2)上作为int)*10
,Pt=p.ID
,Element=c.value('local-name(.),'varchar(150)'
,Attribute=cast(“”作为varchar(150))
,Value=cast(c.Value('text()[1]','varchar(max)')作为varchar(max))
,XPath=cast(concat(p.XPath,“/”,c.value('local-name(.),'varchar(max)”,“[”,cast(按c.value分区('local-name(.),'varchar(max)'))上的行数()作为int,作为varchar(max))
,Seq=cast(concat(p.Seq.),10000000+铸造((Lvl+1)*1024+(排号)超过(订单编号)
<!DOCTYPE DOCUMENT
  PUBLIC "-//Gale Research//DTD Document V2.0//EN"
         "test.dtd">
<!ELEMENT document ANY>
<!ELEMENT galedata ANY>
<!ELEMENT project ANY>
<!ELEMENT projectname ANY>
<!ELEMENT title ANY>
<!ELEMENT doc.head ANY>
<!ELEMENT doc.body ANY>
<!ELEMENT para ANY>
<!ELEMENT head ANY>
<!ATTLIST document synfileid CDATA #IMPLIED>
<!ATTLIST head n NUMBER #IMPLIED>
PUBLIC "-//Gale Research//DTD Document V2.0//EN" "test.dtd"
SGMLDECL "xml10-sgmldecl.dcl"
osx <your-file>
<!SGML "ISO 8879:1986 (WWW)"

 -- SGML Declaration for XML 1.0 --

 -- from: 
    Final text of revised Web SGML Adaptations Annex (TC2) to ISO 8879:1986
    ISO/IEC JTC1/SC34 N0029: 1998-12-06
    Annex L.2 (informative): SGML Declaration for XML

    changes made to accommodate validation are noted with 'VALID:'
 --

 CHARSET
     BASESET "ISO Registration Number 177//CHARSET
             ISO/IEC 10646-1:1993 UCS-4 with implementation
             level 3//ESC 2/5 2/15 4/6"
     DESCSET
             0        9  UNUSED
             9        2       9
            11        2  UNUSED
            13        1      13
            14       18  UNUSED
            32       95      32
           127        1  UNUSED
           128       32  UNUSED
           160    55136     160
         55296     2048  UNUSED  -- surrogates --
         57344     8190   57344
         65534        2  UNUSED  -- FFFE and FFFF --
         65536  1048576   65536

 CAPACITY NONE  -- Capacities are not restricted in XML --

 SCOPE DOCUMENT

 SYNTAX
     SHUNCHAR NONE
     BASESET "ISO Registration Number 177//CHARSET
             ISO/IEC 10646-1:1993 UCS-4 with implementation
             level 3//ESC 2/5 2/15 4/6"
     DESCSET
         0 1114112 0
     FUNCTION
         RE    13
         RS    10
         SPACE 32
         TAB   SEPCHAR 9
     NAMING
         LCNMSTRT ""
         UCNMSTRT ""
         NAMESTRT
             58 95 192-214 216-246 248-305 308-318 321-328
             330-382 384-451 461-496 500-501 506-535 592-680
             699-705 902 904-906 908 910-929 931-974 976-982
             986 988 990 992 994-1011 1025-1036 1038-1103
             1105-1116 1118-1153 1168-1220 1223-1224
             1227-1228 1232-1259 1262-1269 1272-1273
             1329-1366 1369 1377-1414 1488-1514 1520-1522
             1569-1594 1601-1610 1649-1719 1722-1726
             1728-1742 1744-1747 1749 1765-1766 2309-2361
             2365 2392-2401 2437-2444 2447-2448 2451-2472
             2474-2480 2482 2486-2489 2524-2525 2527-2529
             2544-2545 2565-2570 2575-2576 2579-2600
             2602-2608 2610-2611 2613-2614 2616-2617
             2649-2652 2654 2674-2676 2693-2699 2701
             2703-2705 2707-2728 2730-2736 2738-2739
             2741-2745 2749 2784 2821-2828 2831-2832
             2835-2856 2858-2864 2866-2867 2870-2873 2877
             2908-2909 2911-2913 2949-2954 2958-2960
             2962-2965 2969-2970 2972 2974-2975 2979-2980
             2984-2986 2990-2997 2999-3001 3077-3084
             3086-3088 3090-3112 3114-3123 3125-3129
             3168-3169 3205-3212 3214-3216 3218-3240
             3242-3251 3253-3257 3294 3296-3297 3333-3340
             3342-3344 3346-3368 3370-3385 3424-3425
             3585-3630 3632 3634-3635 3648-3653 3713-3714
             3716 3719-3720 3722 3725 3732-3735 3737-3743
             3745-3747 3749 3751 3754-3755 3757-3758 3760
             3762-3763 3773 3776-3780 3904-3911 3913-3945
             4256-4293 4304-4342 4352 4354-4355 4357-4359
             4361 4363-4364 4366-4370 4412 4414 4416 4428
             4430 4432 4436-4437 4441 4447-4449 4451 4453
             4455 4457 4461-4462 4466-4467 4469 4510 4520
             4523 4526-4527 4535-4536 4538 4540-4546 4587
             4592 4601 7680-7835 7840-7929 7936-7957
             7960-7965 7968-8005 8008-8013 8016-8023 8025
             8027 8029 8031-8061 8064-8116 8118-8124 8126
             8130-8132 8134-8140 8144-8147 8150-8155
             8160-8172 8178-8180 8182-8188 8486 8490-8491
             8494 8576-8578 12295 12321-12329 12353-12436
             12449-12538 12549-12588 19968-40869 44032-55203

         LCNMCHAR ""
         UCNMCHAR ""
         NAMECHAR
             45-46 183 720-721 768-837 864-865 903 1155-1158
             1425-1441 1443-1465 1467-1469 1471 1473-1474
             1476 1600 1611-1618 1632-1641 1648 1750-1764
             1767-1768 1770-1773 1776-1785 2305-2307 2364
             2366-2381 2385-2388 2402-2403 2406-2415
             2433-2435 2492 2494-2500 2503-2504 2507-2509
             2519 2530-2531 2534-2543 2562 2620 2622-2626
             2631-2632 2635-2637 2662-2673 2689-2691 2748
             2750-2757 2759-2761 2763-2765 2790-2799
             2817-2819 2876 2878-2883 2887-2888 2891-2893
             2902-2903 2918-2927 2946-2947 3006-3010
             3014-3016 3018-3021 3031 3047-3055 3073-3075
             3134-3140 3142-3144 3146-3149 3157-3158
             3174-3183 3202-3203 3262-3268 3270-3272
             3274-3277 3285-3286 3302-3311 3330-3331
             3390-3395 3398-3400 3402-3405 3415 3430-3439
             3633 3636-3642 3654-3662 3664-3673 3761
             3764-3769 3771-3772 3782 3784-3789 3792-3801
             3864-3865 3872-3881 3893 3895 3897 3902-3903
             3953-3972 3974-3979 3984-3989 3991 3993-4013
             4017-4023 4025 8400-8412 8417 12293 12330-12335
             12337-12341 12441-12442 12445-12446 12540-12542

         NAMECASE
             GENERAL NO
             ENTITY  NO
     DELIM
         GENERAL  SGMLREF
         HCRO     "&#38;#x"
                  -- Ampersand followed by "#x" (without quotes) --
         NESTC    "/"
         NET      ">"
         PIC      "?>"
         SHORTREF NONE

     NAMES
         SGMLREF

     QUANTITY
         NONE -- Quantities are not restricted in XML --

     ENTITIES
         "amp"  38
         "lt"   60
         "gt"   62
         "quot" 34
         "apos" 39

 FEATURES
     MINIMIZE
         DATATAG NO
         OMITTAG NO
         RANK    NO
         SHORTTAG
             STARTTAG
                 EMPTY    NO
                 UNCLOSED NO
                 NETENABL IMMEDNET
             ENDTAG
                 EMPTY    NO
                 UNCLOSED NO
             ATTRIB
                 DEFAULT  YES
                 OMITNAME NO
                 VALUE    NO
         EMPTYNRM  YES
         IMPLYDEF
             ATTLIST  YES
             DOCTYPE  NO
             ELEMENT  YES
             ENTITY   NO
             NOTATION YES
     LINK
         SIMPLE   NO
         IMPLICIT NO
         EXPLICIT NO
     OTHER
         CONCUR   NO
         SUBDOC   NO
         FORMAL   NO
         URN      NO
         KEEPRSRE YES
         VALIDITY NOASSERT
         ENTITIES
             REF      ANY
             INTEGRAL YES

 APPINFO NONE

 SEEALSO "ISO 8879//NOTATION Extensible Markup Language (XML)
 1.0//EN">