Python爬虫运行4~5小时后报错，如何排查原因？

报错很长，但看上去大概是这个原因：socket.gaierror: [Errno -3] Temporary failure in name resolution

阿里云上运行的

Traceback (most recent call last): File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 137, in _new_conn (self.host, self.port), self.timeout, **extra_kw) File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 67, in create_connection for res in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM): File "/usr/lib/python3.5/socket.py", line 732, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno -3] Temporary failure in name resolution During handling of the above exception, another exception occurred: Traceback (most recent call last): File “/usr/lib/python3/dist-packages/urllib3/connectionpool.py”, line 560, in urlopen body=body, headers=headers) File “/usr/lib/python3/dist-packages/urllib3/connectionpool.py”, line 354, in _make_request conn.request(method, url, **httplib_request_kw) File “/usr/lib/python3.5/http/client.py”, line 1106, in request self._send_request(method, url, body, headers) File “/usr/lib/python3.5/http/client.py”, line 1151, in _send_request self.endheaders(body) File “/usr/lib/python3.5/http/client.py”, line 1102, in endheaders self._send_output(message_body) File “/usr/lib/python3.5/http/client.py”, line 934, in _send_output self.send(msg) File “/usr/lib/python3.5/http/client.py”, line 877, in send self.connect() File “/usr/lib/python3/dist-packages/urllib3/connection.py”, line 162, in connect conn = self._new_conn() File “/usr/lib/python3/dist-packages/urllib3/connection.py”, line 146, in _new_conn self, “Failed to establish a new connection: %s” % e) requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x7feaccda2668>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution During handling of the above exception, another exception occurred: Traceback (most recent call last): File “/usr/lib/python3/dist-packages/requests/adapters.py”, line 376, in send timeout=timeout File “/usr/lib/python3/dist-packages/urllib3/connectionpool.py”, line 610, in urlopen _stacktrace=sys.exc_info()[2]) File “/usr/lib/python3/dist-packages/urllib3/util/retry.py”, line 273, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) requests.packages.urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host=‘http://www.xiangshu.com/’, port=80): Max retries exceeded with url: http://www.xiangshu.com/3603751.html (Caused by NewConnectionError(’<requests.packages.urllib3.connection.HTTPConnection object at 0x7feaccda2668>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution’,)) During handling of the above exception, another exception occurred:

Traceback (most recent call last): File “getimg.py”, line 102, in <module> GetImg().getdata() File “getimg.py”, line 76, in getdata base_url + j[‘href’], headers=self.headers) File “/usr/lib/python3/dist-packages/requests/sessions.py”, line 480, in get return self.request(‘GET’, url, **kwargs) File “/usr/lib/python3/dist-packages/requests/sessions.py”, line 468, in request resp = self.send(prep, **send_kwargs) File “/usr/lib/python3/dist-packages/requests/sessions.py”, line 576, in send r = adapter.send(request, **kwargs) File “/usr/lib/python3/dist-packages/requests/adapters.py”, line 437, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPConnectionPool(host=‘http://www.xiangshu.com/’, port=80): Max retries exceeded with url: http://www.xiangshu.com/3603751.html (Caused by NewConnectionError(’<requests.packages.urllib3.connection.HTTPConnection object at 0x7feaccda2668>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution’,))

vueper 1楼

是所有的都报错还是偶尔有报错？像是触发反爬

bupafengyu 2楼

先看报错信息，这是最直接的线索。如果没保存，下次运行时重定向输出到文件：python your_script.py 2>&1 | tee log.txt。

大概率是这几个方向的问题：

内存泄漏：长时间运行，数据（比如列表、字典）只增不减，内存耗尽。用 tracemalloc 模块在关键点打快照对比，或者用 psutil 监控进程内存。
连接未释放：网络请求（如 requests、aiohttp）的会话或响应对象没正确关闭，导致连接池耗尽或资源泄漏。确保使用 with 语句或手动 .close()。
资源限制：目标网站的反爬机制，如IP被封、请求频率过高导致临时封禁。检查错误是否包含 429 Too Many Requests、403 Forbidden 或连接超时。
程序逻辑缺陷：比如递归调用、死循环、异常处理不当导致状态累积。检查循环和递归的退出条件。

快速排查步骤：

运行到3小时左右，用 top 或任务管理器看内存/CPU占用是否持续增长。
在代码里加简单日志，记录已处理的数量和内存使用情况，看增长是否异常。
如果是网络错误，检查响应状态码和内容，可能是触发了反爬。

关键代码检查点（以requests为例）：

import requests
import time
import sys

# 使用会话并确保关闭，或为每个请求单独处理
with requests.Session() as session:
    session.headers.update({'User-Agent': 'Your Bot'})
    for i, url in enumerate(urls):
        try:
            # 增加延迟和超时设置
            time.sleep(1)
            resp = session.get(url, timeout=10)
            resp.raise_for_status()
            # 处理数据...
            # 定期打印状态，监控进展
            if i % 100 == 0:
                print(f"Processed {i} items")
                sys.stdout.flush()
        except requests.exceptions.RequestException as e:
            print(f"Error on {url}: {e}")
            # 根据错误类型决定是跳过、重试还是退出
            continue

如果是异步爬虫（如aiohttp），要确保所有task都被正确await和关闭。

总结：优先监控内存和网络错误，定位是资源问题还是反爬。