Python3 用 urllib 下载图片非常慢，会是什么原因呢？

初学者想学写个爬虫，边学边写

想要下载一张 Y 站的图片，代码为

urllib.request.urlopen('http://xxx.jpg').read()

其中 url 是可以正常访问的。图片不大，浏览器打开只需要几秒（排除缓存原因）。但在 python 中下载它却需要 30+秒，将下载到的数据写出为文件是可以正常查看的

那么问题来了，究竟是什么原因导致下载一张图片那么慢呢？

请问是还有什么地方需要配置吗？

附完整代码：

# 创建目录存放今天爬下来的图
dir_name = datetime.datetime.now().strftime('%Y%m%d')
if not os.path.exists(dir_name):
    os.mkdir(dir_name)
info[1] 的值为 https://files.yande.re/sample/6718a8caa71a4547a417f41bc9f063bb/yande.re 385001 sample byakuya_reki seifuku.jpg
print(‘开始下载……’)
print(info[1])
i = time.time()
img = urllib.request.urlopen(info[1]).read()
print(‘下载完毕。耗时：’+str(int(time.time() - i))+‘s’)
获取文件名，并将%20 替换为空格
file_name = info[1].split(’/’)[-1].replace(’%20’, ’ ‘)
file = open(dir_name+’/’+file_name, ‘wb’)
file.write(img)
file.close()
exit(200)

Python3 用 urllib 下载图片非常慢，会是什么原因呢？

h691938207 1楼

我打开这张图也要很久

用 urllib 下载图片慢，通常有几个常见原因：

没有设置 User-Agent：很多网站会拦截没有浏览器标识的请求
缺少连接复用：urllib 默认每次请求都新建连接，HTTP/1.1 下可以复用
网络延迟和超时设置：默认超时时间可能不合适

试试这个改进版的代码：

import urllib.request
import urllib.error
import socket

def download_image(url, save_path, timeout=10):
    # 设置合理的请求头
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
        'Accept': 'image/webp,image/apng,image/*,*/*;q=0.8',
        'Accept-Language': 'zh-CN,zh;q=0.9',
        'Connection': 'keep-alive'
    }
    
    req = urllib.request.Request(url, headers=headers)
    
    try:
        # 设置超时
        socket.setdefaulttimeout(timeout)
        
        with urllib.request.urlopen(req) as response:
            data = response.read()
            
            with open(save_path, 'wb') as f:
                f.write(data)
                
        print(f"图片下载完成: {save_path}")
        return True
        
    except urllib.error.URLError as e:
        print(f"下载失败: {e.reason}")
        return False
    except socket.timeout:
        print("连接超时")
        return False

# 使用示例
download_image('https://example.com/image.jpg', 'downloaded_image.jpg')

如果还是慢，考虑用 requests 库，它默认处理了连接池和更好的重试机制。

总结：加个合适的请求头和超时设置试试。

可能是区域问题？但我浏览器打开很快呀，换个浏览器速度也差不多。我的代码是在本地运行的，怎么速度差那么多呢？

也许服务器对爬虫限速了呢

你看看是不是打开了系统代理

浏览器有缓存吧

还真有这个可能……CloudnuY 用的是 ss 的 pac 模式，这个不碍事吧？gulu 换过浏览器一样几秒打开

因为你浏览器走代理了,yandex 大一点的图片 2 30m,慢很正常

跑了一下， 3s

zlyuanteng 10楼

#4 用的是 ss 的 pac 模式，这个不碍事吧？
#5 换过浏览器一样几秒打开
#7 用的是 ss 的 pac 模式，这个不碍事吧？
#8 就用上面的代码跑？ 3s ？

bupafengyu 11楼作者

如果挂 ss 的话，推荐使用这个库 PySocks
具体使用参考 http://stackoverflow.com/questions/31777692/python3-requests-with-sock5-proxy

phonegap100 12楼

#10 看起来不错。不过我正是初学，希望尽量以原生的方式实现，先收藏了

ionicwang 13楼

一般调用 wget 下载

htzhanglong 14楼

还是挂个代理吧,萌妹的服务器时不时抽风,不太好判断.
推荐用 requests 代替 urllib

根据你的表述，就是系统代理没跑了。

我一般用 urlretrieve

en 跑了上面的代码是 3s 不知道是不是因为挂了代理

htzhanglong 18楼

1 loop, best of 3: 2.98 s per loop

回到顶部