Golang初学者使用http.Get()解析URL时遇到错误怎么办

Golang初学者使用http.Get()解析URL时遇到错误怎么办你好，

我是 Golang 的新手，从昨天开始就一直被一个问题困扰。首先，这是代码中无法正常运行的部分：

func main() {
    var s SitemapIndex

    resp, _ := http.Get("https://www.washingtonpost.com/news-sitemaps/index.xml")
    bytes, _ := ioutil.ReadAll(resp.Body)
    resp.Body.Close()

    xml.Unmarshal(bytes, &s)

    for _, Location := range s.Locations {
        resp, err := http.Get(Location)
        ioutil.ReadAll(resp.Body)

    }

}

问题出现在这一行：resp, err := http.Get(Location)。err 告诉我 Location 有问题，而 resp 是 <nil>。

当我打印 err 时，完整的错误信息如下：

解析 https://www.washingtonpost.com/news-sitemaps/politics.xml ：URL 的第一个路径段不能包含冒号

所以错误来自这个 URL。这很奇怪，因为之前明确提供的 URL 没有产生任何错误，而且两个 URL 的格式相同。尽管如此，我尝试从 URL 中移除 https:// 和 www.，但问题仍未解决。

我尝试在网上查找这个错误信息，但没有找到任何解决方案，甚至没有找到合适的解释……

我真的不知道该怎么办，不得不承认我需要一些帮助…… 😓

谢谢！

更多关于Golang初学者使用http.Get()解析URL时遇到错误怎么办的实战教程也可以访问 https://www.itying.com/category-94-b0.html

zlyuanteng 1楼

是的，这确实需要很长时间，不过我没有具体测量过。但由于使用 curl 获取初始站点地图也需要很长时间，所以我猜测是服务器速度较慢。

更多关于Golang初学者使用http.Get()解析URL时遇到错误怎么办的实战系列教程也可以访问 https://www.itying.com/category-94-b0.html

sinazl 2楼

非常感谢！这个方法有效！我这边使用了 strings.TrimSpace()。这两种方法有什么区别吗（看起来它们的功能似乎相同）？

func main() {
    fmt.Println("hello world")
}

yibo5220 3楼

好的，我明白了，这很合理，谢谢。

另外，如果你尝试运行了那段代码，你能告诉我它运行得怎么样吗？我运行了它，但不得不等了大约一分钟才得到结果。我不认为是我的网络连接问题，因为我的网络连接状况良好。

wuwangju 4楼

在我的情况下，我只去除了换行符，没有做其他处理。根据文档说明，TrimSpace 会移除所有空白字符，但"所有空白字符"具体指代的内容并不明确。

func main() {
    fmt.Println("hello world")
}

songsunli 5楼

我会照做的，谢谢。我会寻找实现的方法。

顺便提供我所有的结构体：

type SitemapIndex struct {

Locations []string `xml:"sitemap>loc"`

}

type News struct {

Titles []string `xml:"url>news>title"`

Keywords []string `xml:"url>news>keywords"`

Locations []string `xml:"url>loc"`

}

type NewsMap struct {

Keyword string

Location string

}

eggper 6楼

能否请您分享 SitemapIndex 的定义？

或者更理想的是，提供一个完整但最小化的、能复现您问题的代码包。

但乍看之下，我推测解析后的 “location” 前后可能带有换行符、空格或其他类型的空白字符，因为 XML 中存在空白，这些空白在解析 XML 时可能会被规范化为单个空白标记，但不会被移除。

在尝试加载之前，请先对 Location 进行修剪处理。

// 代码示例：修剪 Location 字段
func processLocation(loc string) string {
    return strings.TrimSpace(loc)
}

eggper 7楼

乍看起来这似乎是可行的：

package main

import (
        "encoding/xml"
        "fmt"
        "io/ioutil"
        "net/http"
        "strings"
)

type SitemapIndex struct {
        Locations []string `xml:"sitemap>loc"`
}

func main() {
        var s SitemapIndex

        resp, _ := http.Get("https://www.washingtonpost.com/news-sitemaps/index.xml")
        bytes, _ := ioutil.ReadAll(resp.Body)
        resp.Body.Close()

        xml.Unmarshal(bytes, &s)

        fmt.Printf("%#v", s)

        for _, Location := range s.Locations {
                resp, _ := http.Get(strings.Trim(Location, "\n"))
                t, _ := ioutil.ReadAll(resp.Body)
                fmt.Println(string(t))
        }
}

htzhanglong 8楼

问题出在从 XML 解析得到的 Location 字段包含了一个无效的 URL 格式。错误信息 “URL 的第一个路径段不能包含冒号” 表明解析后的 URL 字符串格式不正确，可能包含多余的字符或格式错误。

在您的代码中，s.Locations 可能包含类似 "https://www.washingtonpost.com/news-sitemaps/politics.xml" 的字符串，但实际解析出的内容可能包含额外的空格、换行符或其他不可见字符，导致 URL 解析失败。

以下是修复步骤和示例代码：

检查并清理 URL：在调用 http.Get() 之前，使用 strings.TrimSpace() 去除 URL 字符串中的多余空格或换行符。
验证 URL 格式：使用 url.Parse() 验证 URL 是否有效，如果无效则跳过或处理错误。

修改后的代码示例：

package main

import (
    "encoding/xml"
    "fmt"
    "io/ioutil"
    "net/http"
    "net/url"
    "strings"
)

type SitemapIndex struct {
    Locations []string `xml:"sitemap>loc"`
}

func main() {
    var s SitemapIndex

    resp, err := http.Get("https://www.washingtonpost.com/news-sitemaps/index.xml")
    if err != nil {
        fmt.Printf("Error fetching sitemap index: %v\n", err)
        return
    }
    defer resp.Body.Close()

    bytes, err := ioutil.ReadAll(resp.Body)
    if err != nil {
        fmt.Printf("Error reading response body: %v\n", err)
        return
    }

    err = xml.Unmarshal(bytes, &s)
    if err != nil {
        fmt.Printf("Error unmarshaling XML: %v\n", err)
        return
    }

    for _, rawLocation := range s.Locations {
        // 清理 URL：去除前后空格和换行符
        cleanedLocation := strings.TrimSpace(rawLocation)
        
        // 验证 URL 格式
        parsedURL, err := url.Parse(cleanedLocation)
        if err != nil {
            fmt.Printf("Invalid URL %s: %v\n", cleanedLocation, err)
            continue
        }

        resp, err := http.Get(parsedURL.String())
        if err != nil {
            fmt.Printf("Error fetching %s: %v\n", parsedURL.String(), err)
            continue
        }
        defer resp.Body.Close()

        bodyBytes, err := ioutil.ReadAll(resp.Body)
        if err != nil {
            fmt.Printf("Error reading body from %s: %v\n", parsedURL.String(), err)
            continue
        }

        fmt.Printf("Successfully fetched %s, body length: %d\n", parsedURL.String(), len(bodyBytes))
    }
}

关键改进：

使用 strings.TrimSpace() 清理从 XML 解析得到的 URL 字符串。
使用 url.Parse() 验证 URL 格式，确保其有效。
添加错误处理，避免程序因单个 URL 错误而终止。
使用 defer 确保响应体被正确关闭。

如果问题仍然存在，请检查 XML 结构是否与 SitemapIndex 结构体定义匹配，并确认 Locations 字段确实包含有效的 URL 字符串。您可以在清理后打印每个 cleanedLocation 来验证内容。