Python中Urllib2如何读取响应长度并诊断出错原因

逻辑是读取超长文本（ 1w 行），循环读一小部分，数据格式为 json，使用 urllib2 提交 json，read()接受返回数据，写入文件，每次运行读取到固定行的时候既不中断也不写入，Ctrl+c 中断后回显为 data=recv （ 1 ）这个
Python中Urllib2如何读取响应长度并诊断出错原因

songsunli 1楼

try except 没有起作用…程序没有中断…

songsunli 2楼

import urllib2
import sys

def fetch_url_with_debug(url):
    """
    使用urllib2获取URL响应，并诊断常见错误
    """
    req = urllib2.Request(url)
    
    try:
        # 1. 发送请求
        response = urllib2.urlopen(req)
        
        # 2. 读取响应内容并获取长度
        content = response.read()
        content_length = len(content)
        
        # 3. 获取响应头中的Content-Length（服务器声明的长度）
        headers = response.info()
        declared_length = headers.getheader('Content-Length')
        
        print("=" * 50)
        print("响应诊断信息：")
        print(f"URL: {url}")
        print(f"HTTP状态码: {response.code}")
        print(f"实际读取的内容长度: {content_length} bytes")
        
        if declared_length:
            declared_length = int(declared_length)
            print(f"服务器声明的Content-Length: {declared_length} bytes")
            
            # 检查长度是否匹配
            if content_length != declared_length:
                print(f"⚠️  警告: 实际长度({content_length})与声明长度({declared_length})不匹配")
                print("可能原因: 1) 服务器错误 2) 压缩传输 3) 分块编码")
        else:
            print("服务器未提供Content-Length头")
        
        # 4. 显示响应头（调试用）
        print("\n响应头信息:")
        for header in headers.headers:
            print(f"  {header.strip()}")
            
        return content
        
    except urllib2.HTTPError as e:
        print(f"HTTP错误 {e.code}: {e.reason}")
        print(f"错误URL: {e.url}")
        if hasattr(e, 'headers'):
            print("错误响应头:")
            for header in e.headers.headers:
                print(f"  {header.strip()}")
        return None
        
    except urllib2.URLError as e:
        print(f"URL错误: {e.reason}")
        # 更详细的错误诊断
        if hasattr(e.reason, 'errno'):
            print(f"系统错误码: {e.reason.errno}")
        return None
        
    except Exception as e:
        print(f"其他错误: {type(e).__name__}: {str(e)}")
        return None

# 使用示例
if __name__ == "__main__":
    # 测试正常URL
    print("测试1: 正常URL")
    fetch_url_with_debug("http://httpbin.org/get")
    
    print("\n" + "="*50 + "\n")
    
    # 测试404错误
    print("测试2: 不存在的URL")
    fetch_url_with_debug("http://httpbin.org/status/404")
    
    print("\n" + "="*50 + "\n")
    
    # 测试网络错误
    print("测试3: 无法连接的URL")
    fetch_url_with_debug("http://nonexistent-domain-xyz.com/")

代码说明：

获取响应长度：
- len(response.read()) 获取实际读取的内容字节数
- headers.getheader('Content-Length') 获取服务器声明的长度
错误诊断：
- HTTPError: 处理HTTP状态码错误（404、500等）
- URLError: 处理网络连接错误（DNS解析失败、连接超时等）
- 通用异常捕获：处理其他意外错误
关键检查点：
- 对比实际长度和Content-Length头
- 显示完整的响应头信息
- 提供具体的错误代码和原因

常见问题诊断：

长度不匹配：可能服务器使用了分块传输或压缩
HTTPError：检查URL是否正确，服务器是否正常运行
URLError：检查网络连接，DNS解析，防火墙设置

一句话建议： 实际读取长度比Content-Length头更可靠，务必处理所有可能的异常。

phonegap100 3楼

查询到结果原因为连接被远端重置…设置 urlopen(timeout)是否有用…另外链接被重置程序为何没有响应…

caililin 4楼

贴代码…

yibo5220 5楼

File “/usr/lib64/python2.6/urllib2.py”, line 126, in urlopen
return _opener.open(url, data, timeout)
File “/usr/lib64/python2.6/urllib2.py”, line 391, in open
response = self._open(req, data)
File “/usr/lib64/python2.6/urllib2.py”, line 409, in _open
‘_open’, req)
File “/usr/lib64/python2.6/urllib2.py”, line 369, in _call_chain
result = func(*args)
File “/usr/lib64/python2.6/urllib2.py”, line 1190, in http_open
return self.do_open( httplib.HTTPConnection, req)
File “/usr/lib64/python2.6/urllib2.py”, line 1163, in do_open
r = h.getresponse()
File “/usr/lib64/python2.6/httplib.py”, line 990, in getresponse
response.begin()
File “/usr/lib64/python2.6/httplib.py”, line 391, in begin
version, status, reason = self._read_status()
File “/usr/lib64/python2.6/httplib.py”, line 349, in _read_status
line = self.fp.readline()
File “/usr/lib64/python2.6/socket.py”, line 433, in readline
data = recv(1)
KeyboardInterrupt
不能发代码…

vueper 6楼

request = urllib2.Request(url)
while condition :
time = 重连次数
for t in range(time):
try:
time.sleep(0.5)
res = urllib2.urlopen(request,json)
except urllib2.URLError, e:
if t < (time-1):
record_log()
m = res.read()
伪代码…

caililin 7楼

表示没看懂你说的啥意思，代码也不全