Python中Urllib2如何读取响应长度并诊断出错原因
逻辑是读取超长文本( 1w 行),循环读一小部分,数据格式为 json,使用 urllib2 提交 json,read()接受返回数据,写入文件,每次运行读取到固定行的时候既不中断也不写入,Ctrl+c 中断后回显为 data=recv ( 1 )这个
Python中Urllib2如何读取响应长度并诊断出错原因
try except 没有起作用…程序没有中断…
import urllib2
import sys
def fetch_url_with_debug(url):
"""
使用urllib2获取URL响应,并诊断常见错误
"""
req = urllib2.Request(url)
try:
# 1. 发送请求
response = urllib2.urlopen(req)
# 2. 读取响应内容并获取长度
content = response.read()
content_length = len(content)
# 3. 获取响应头中的Content-Length(服务器声明的长度)
headers = response.info()
declared_length = headers.getheader('Content-Length')
print("=" * 50)
print("响应诊断信息:")
print(f"URL: {url}")
print(f"HTTP状态码: {response.code}")
print(f"实际读取的内容长度: {content_length} bytes")
if declared_length:
declared_length = int(declared_length)
print(f"服务器声明的Content-Length: {declared_length} bytes")
# 检查长度是否匹配
if content_length != declared_length:
print(f"⚠️ 警告: 实际长度({content_length})与声明长度({declared_length})不匹配")
print("可能原因: 1) 服务器错误 2) 压缩传输 3) 分块编码")
else:
print("服务器未提供Content-Length头")
# 4. 显示响应头(调试用)
print("\n响应头信息:")
for header in headers.headers:
print(f" {header.strip()}")
return content
except urllib2.HTTPError as e:
print(f"HTTP错误 {e.code}: {e.reason}")
print(f"错误URL: {e.url}")
if hasattr(e, 'headers'):
print("错误响应头:")
for header in e.headers.headers:
print(f" {header.strip()}")
return None
except urllib2.URLError as e:
print(f"URL错误: {e.reason}")
# 更详细的错误诊断
if hasattr(e.reason, 'errno'):
print(f"系统错误码: {e.reason.errno}")
return None
except Exception as e:
print(f"其他错误: {type(e).__name__}: {str(e)}")
return None
# 使用示例
if __name__ == "__main__":
# 测试正常URL
print("测试1: 正常URL")
fetch_url_with_debug("http://httpbin.org/get")
print("\n" + "="*50 + "\n")
# 测试404错误
print("测试2: 不存在的URL")
fetch_url_with_debug("http://httpbin.org/status/404")
print("\n" + "="*50 + "\n")
# 测试网络错误
print("测试3: 无法连接的URL")
fetch_url_with_debug("http://nonexistent-domain-xyz.com/")
代码说明:
-
获取响应长度:
len(response.read())获取实际读取的内容字节数headers.getheader('Content-Length')获取服务器声明的长度
-
错误诊断:
HTTPError: 处理HTTP状态码错误(404、500等)URLError: 处理网络连接错误(DNS解析失败、连接超时等)- 通用异常捕获:处理其他意外错误
-
关键检查点:
- 对比实际长度和Content-Length头
- 显示完整的响应头信息
- 提供具体的错误代码和原因
常见问题诊断:
- 长度不匹配:可能服务器使用了分块传输或压缩
- HTTPError:检查URL是否正确,服务器是否正常运行
- URLError:检查网络连接,DNS解析,防火墙设置
一句话建议: 实际读取长度比Content-Length头更可靠,务必处理所有可能的异常。
查询到结果原因为连接被远端重置…设置 urlopen(timeout)是否有用…另外链接被重置 程序为何没有响应…
贴代码…
File “/usr/lib64/python2.6/urllib2.py”, line 126, in urlopen
return _opener.open(url, data, timeout)
File “/usr/lib64/python2.6/urllib2.py”, line 391, in open
response = self._open(req, data)
File “/usr/lib64/python2.6/urllib2.py”, line 409, in _open
‘_open’, req)
File “/usr/lib64/python2.6/urllib2.py”, line 369, in _call_chain
result = func(*args)
File “/usr/lib64/python2.6/urllib2.py”, line 1190, in http_open
return self.do_open( httplib.HTTPConnection, req)
File “/usr/lib64/python2.6/urllib2.py”, line 1163, in do_open
r = h.getresponse()
File “/usr/lib64/python2.6/httplib.py”, line 990, in getresponse
response.begin()
File “/usr/lib64/python2.6/httplib.py”, line 391, in begin
version, status, reason = self._read_status()
File “/usr/lib64/python2.6/httplib.py”, line 349, in _read_status
line = self.fp.readline()
File “/usr/lib64/python2.6/socket.py”, line 433, in readline
data = recv(1)
KeyboardInterrupt
不能发代码…
request = urllib2.Request(url)
while condition :
time = 重连次数
for t in range(time):
try:
time.sleep(0.5)
res = urllib2.urlopen(request,json)
except urllib2.URLError, e:
if t < (time-1):
record_log()
m = res.read()
伪代码…
表示没看懂你说的啥意思,代码也不全

