XML解析返回带换行符的字符串_Xml_Go

XML解析返回带换行符的字符串

xml go

XML解析返回带换行符的字符串,xml,go,Xml,Go,我正试图通过站点地图解析XML，然后在地址上循环以获取Go中帖子的详细信息。但我得到了一个奇怪的错误：：URL中的第一个路径段不能包含冒号这是代码片段： type SitemapIndex struct { Locations []Location `xml:"sitemap"` } type Location struct { Loc string `xml:"loc"` } func (l Location) String() string { return

我正试图通过站点地图解析XML，然后在地址上循环以获取Go中帖子的详细信息。但我得到了一个奇怪的错误：

：URL中的第一个路径段不能包含冒号

这是代码片段：

type SitemapIndex struct {
    Locations []Location `xml:"sitemap"`
}

type Location struct {
    Loc string `xml:"loc"`
}

func (l Location) String() string {
    return fmt.Sprintf(l.Loc)
}

func main() {
    resp, _ := http.Get("https://www.washingtonpost.com/news-sitemaps/index.xml")
    bytes, _ := ioutil.ReadAll(resp.Body)
    var s SitemapIndex
    xml.Unmarshal(bytes, &s)
    for _, Location := range s.Locations {
        fmt.Printf("Location: %s", Location.Loc)
        resp, err := http.Get(Location.Loc)
        fmt.Println("resp", resp)
        fmt.Println("err", err)
    }
}

以及输出：

Location: 
https://www.washingtonpost.com/news-sitemaps/politics.xml
resp <nil>
err parse 
https://www.washingtonpost.com/news-sitemaps/politics.xml
: first path segment in URL cannot contain colon
Location: 
https://www.washingtonpost.com/news-sitemaps/opinions.xml
resp <nil>
err parse 
https://www.washingtonpost.com/news-sitemaps/opinions.xml
: first path segment in URL cannot contain colon
...
...

输出，如您所见，错误为零：

Location: 
https://www.washingtonpost.com/news-sitemaps/politics.xml
resp &{200 OK 200 HTTP/2.0 2 0 map[Server:[nginx] Arc-Service:[api] Arc-Org-Name:[washpost] Expires:[Sat, 02 Feb 2019 05:32:38 GMT] Content-Security-Policy:[upgrade-insecure-requests] Arc-Deployment:[washpost] Arc-Organization:[washpost] Cache-Control:[private, max-age=60] Arc-Context:[index] Arc-Application:[Feeds] Vary:[Accept-Encoding] Content-Type:[text/xml; charset=utf-8] Arc-Servername:[api.washpost.arcpublishing.com] Arc-Environment:[index] Arc-Org-Env:[washpost] Arc-Route:[/feeds] Date:[Sat, 02 Feb 2019 05:31:38 GMT]] 0xc000112870 -1 [] false true map[] 0xc00017c200 0xc0000ca370}
err <nil>
Location: 
...
...

位置：
https://www.washingtonpost.com/news-sitemaps/politics.xml
resp&{200 OK 200 HTTP/2.0 2 0 map[Server:[nginx]Arc服务：[api]Arc组织名称：[washpost]过期：[Sat，2019年2月2日05:32:38 GMT]内容安全策略：[升级不安全请求]Arc部署：[washpost]Arc组织：[washpost]缓存控制：[private，max age=60]Arc上下文：[index]Arc应用程序：[feed]变化：[接受编码]内容类型：[text/xml；charset=utf-8]Arc服务器名：[api.washpost.arcpublishing.com]Arc环境：[index]Arc组织环境：[washpost]Arc路由：[/feeds]日期：[2019年2月2日星期六05:31:38 GMT]]0xc000112870-1[]假真映射[]0xc00017c200 0xc0000ca370}
犯错误
地点：
...
...

但我是新手，所以我不知道出了什么问题。你能告诉我哪里错了吗？

你是对的。事实上，问题来自新词。如您所见，您使用的是

Printf

，没有添加任何

\n

，并且在输出的开头和结尾分别添加了一个

可以使用删除这些换行符。下面是您正在尝试解析的站点地图。一旦字符串被修剪，您将能够调用

http.Get

，而不会出现任何错误

func main（）{
var s SitemapIndex
xml.Unmarshal（字节和秒）
对于_，位置：=范围s.位置{
loc:=字符串.Trim（Location.loc，“\n”）
fmt.Printf（“位置：%s\n”，loc）
}
}

此代码正确地输出位置，没有任何换行符，如预期的那样：

for _, Location := range s.Locations {
        fmt.Printf("Location: %s", Location.Loc)
        test := "https://www.washingtonpost.com/news-sitemaps/politics.xml"
        resp, err := http.Get(test)
        fmt.Println("resp", resp)
        fmt.Println("err", err)
    }

位置：https://www.washingtonpost.com/news-sitemaps/politics.xml
地点：https://www.washingtonpost.com/news-sitemaps/opinions.xml
地点：https://www.washingtonpost.com/news-sitemaps/local.xml
地点：https://www.washingtonpost.com/news-sitemaps/sports.xml
地点：https://www.washingtonpost.com/news-sitemaps/national.xml
地点：https://www.washingtonpost.com/news-sitemaps/world.xml
地点：https://www.washingtonpost.com/news-sitemaps/business.xml
地点：https://www.washingtonpost.com/news-sitemaps/technology.xml
地点：https://www.washingtonpost.com/news-sitemaps/lifestyle.xml
地点：https://www.washingtonpost.com/news-sitemaps/entertainment.xml
地点：https://www.washingtonpost.com/news-sitemaps/goingoutguide.xml

之所以在

Location.Loc

字段中有这些换行符，是因为此URL返回了XML。参赛作品如下：


https://www.washingtonpost.com/news-sitemaps/goingoutguide.xml

正如您所看到的，在

loc

元素中的内容前后都有新行。

请参阅修改后的代码中嵌入的注释，以描述和解决问题

func main() {
    resp, _ := http.Get("https://www.washingtonpost.com/news-sitemaps/index.xml")
    bytes, _ := ioutil.ReadAll(resp.Body)
    var s SitemapIndex
    xml.Unmarshal(bytes, &s)
    for _, Location := range s.Locations {
            // Note that %v shows that there are indeed newlines at beginning and end of Location.Loc
            fmt.Printf("Location: (%v)", Location.Loc)
            // solution: use strings.TrimSpace to remove newlines from Location.Loc
            resp, err := http.Get(strings.TrimSpace(Location.Loc))
            fmt.Println("resp", resp)
            fmt.Println("err", err)
    }

}

loc元素中内容前后的换行。

！就这样！