Python爬虫中为什么第一台电脑被反爬后不能返回数据,而第二台电脑可以?

第一个:
curl -v ‘https://www.qichacha.com/gongsi_getList’ -H ‘cookie: acw_tc=AQAAADKhg2r1fgQAzB38csKGgfa3ll5A; PHPSESSID=mr3rtla2pree2kma06in109lp7; UM_distinctid=16409b0ebb924e-01a16023a76872-19336953-13c680-16409b0ebba682; zg_did=%7B%22did%22%3A%20%2216409b0ebcf400-0a47e9ae97aea3-19336953-13c680-16409b0ebd0661%22%7D; _uab_collina=152917094889257593834626; _umdata=535523100CBE37C3B9E8426803FAE682F695DD5C372880100D01308BEF2CB953FEF5024D24D0BA85CD43AD3E795C914C6E418FBD7FCF11CFC02159EA6BDBD805; hasShow=1; zg_de1d1a35bfa24ce29bbf2c7eb17e6c4f=%7B%22sid%22%3A%201529381569060%2C%22updated%22%3A%201529381569062%2C%22info%22%3A%201529170947031%2C%22superProperty%22%3A%20%22%7B%7D%22%2C%22platform%22%3A%20%22%7B%7D%22%2C%22utm%22%3A%20%22%7B%7D%22%2C%22referrerDomain%22%3A%20%22www.baidu.com%22%2C%22cuid%22%3A%20%22b77823811d3a8fd207eef49092fcf4d6%22%7D; CNZZDATA1254842228=222723142-1529170913-https%253A%252F%252Fwww.qichacha.com%252F%7C1529376747; Hm_lvt_3456bee468c83cc63fb5147f119f1075=1529170947,1529201010,1529202769,1529381570; Hm_lpvt_3456bee468c83cc63fb5147f119f1075=1529381570’ -H ‘origin: https://www.qichacha.com’ -H ‘accept-encoding: gzip, deflate, br’ -H ‘accept-language: en-US,en;q=0.9’ -H ‘user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36’ -H ‘content-type: application/x-www-form-urlencoded; charset=UTF-8’ -H ‘accept: /’ -H ‘referer: https://www.qichacha.com/’ -H ‘authority: www.qichacha.com’ -H ‘x-requested-with: XMLHttpRequest’ --data $‘key=\u767e\u5ea6&type=0’ --compressed
* About to connect() to www.qichacha.com port 443 (#0)
* Trying 42.81.4.218…
* Connected to www.qichacha.com (42.81.4.218) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
* subject: CN=.qichacha.com,OU=IT,O=苏州企查查网络科技有限公司,L=苏州市,ST=江苏省,C=CN
start date: Jun 16 00:00:00 2017 GMT
* expire date: Jun 15 23:59:59 2020 GMT
* common name: .qichacha.com
issuer: CN=GeoTrust SSL CA - G3,O=GeoTrust Inc.,C=US
> POST /gongsi_getList HTTP/1.1
> Host: www.qichacha.com
> cookie: acw_tc=AQAAADKhg2r1fgQAzB38csKGgfa3ll5A; PHPSESSID=mr3rtla2pree2kma06in109lp7; UM_distinctid=16409b0ebb924e-01a16023a76872-19336953-13c680-16409b0ebba682; zg_did=%7B%22did%22%3A%20%2216409b0ebcf400-0a47e9ae97aea3-19336953-13c680-16409b0ebd0661%22%7D; _uab_collina=152917094889257593834626; _umdata=535523100CBE37C3B9E8426803FAE682F695DD5C372880100D01308BEF2CB953FEF5024D24D0BA85CD43AD3E795C914C6E418FBD7FCF11CFC02159EA6BDBD805; hasShow=1; zg_de1d1a35bfa24ce29bbf2c7eb17e6c4f=%7B%22sid%22%3A%201529381569060%2C%22updated%22%3A%201529381569062%2C%22info%22%3A%201529170947031%2C%22superProperty%22%3A%20%22%7B%7D%22%2C%22platform%22%3A%20%22%7B%7D%22%2C%22utm%22%3A%20%22%7B%7D%22%2C%22referrerDomain%22%3A%20%22www.baidu.com%22%2C%22cuid%22%3A%20%22b77823811d3a8fd207eef49092fcf4d6%22%7D; CNZZDATA1254842228=222723142-1529170913-https%253A%252F%252Fwww.qichacha.com%252F%7C1529376747; Hm_lvt_3456bee468c83cc63fb5147f119f1075=1529170947,1529201010,1529202769,1529381570; Hm_lpvt_3456bee468c83cc63fb5147f119f1075=1529381570
> origin: https://www.qichacha.com
> accept-encoding: gzip, deflate, br
> accept-language: en-US,en;q=0.9
> user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36
> content-type: application/x-www-form-urlencoded; charset=UTF-8
> accept: /
> referer: https://www.qichacha.com/
> authority: www.qichacha.com
> x-requested-with: XMLHttpRequest
> Content-Length: 17
>
* upload completely sent off: 17 out of 17 bytes
< HTTP/1.1 200 OK
< Server: Tengine
< Content-Type: text/html; charset=UTF-8
< Transfer-Encoding: chunked
< Connection: keep-alive
< Date: Tue, 19 Jun 2018 05:53:51 GMT
< Vary: Accept-Encoding
< Expires: Thu, 19 Nov 1981 08:52:00 GMT
< Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
< Pragma: no-cache
< Content-Encoding: gzip
< Via: cache19.l2nu20-3[116,200-0,M], cache40.l2nu20-3[117,0], cache8.cn247[132,200-0,M], cache8.cn247[133,0]
< X-Cache: MISS TCP_MISS dirn:-2:-2 mlen:-1
< X-Swift-SaveTime: Tue, 19 Jun 2018 05:53:52 GMT
< X-Swift-CacheTime: 0
< Timing-Allow-Origin: *
< EagleId: 2a51048815293876318761329e


第二个

curl -v ‘https://www.qichacha.com/gongsi_getList’ -H ‘cookie: acw_tc=AQAAADKhg2r1fgQAzB38csKGgfa3ll5A; PHPSESSID=mr3rtla2pree2kma06in109lp7; UM_distinctid=16409b0ebb924e-01a16023a76872-19336953-13c680-16409b0ebba682; zg_did=%7B%22did%22%3A%20%2216409b0ebcf400-0a47e9ae97aea3-19336953-13c680-16409b0ebd0661%22%7D; _uab_collina=152917094889257593834626; _umdata=535523100CBE37C3B9E8426803FAE682F695DD5C372880100D01308BEF2CB953FEF5024D24D0BA85CD43AD3E795C914C6E418FBD7FCF11CFC02159EA6BDBD805; hasShow=1; Hm_lvt_3456bee468c83cc63fb5147f119f1075=1529170947,1529201010,1529202769,1529381570; CNZZDATA1254842228=222723142-1529170913-https%253A%252F%252Fwww.qichacha.com%252F%7C1529382147; zg_de1d1a35bfa24ce29bbf2c7eb17e6c4f=%7B%22sid%22%3A%201529386744674%2C%22updated%22%3A%201529386772675%2C%22info%22%3A%201529170947031%2C%22superProperty%22%3A%20%22%7B%7D%22%2C%22platform%22%3A%20%22%7B%7D%22%2C%22utm%22%3A%20%22%7B%7D%22%2C%22referrerDomain%22%3A%20%22%22%2C%22cuid%22%3A%20%22b77823811d3a8fd207eef49092fcf4d6%22%7D; Hm_lpvt_3456bee468c83cc63fb5147f119f1075=1529386773’ -H ‘origin: https://www.qichacha.com’ -H ‘accept-encoding: gzip, deflate, br’ -H ‘accept-language: en-US,en;q=0.9’ -H ‘user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36’ -H ‘content-type: application/x-www-form-urlencoded; charset=UTF-8’ -H ‘accept: /’ -H ‘referer: https://www.qichacha.com/’ -H ‘authority: www.qichacha.com’ -H ‘x-requested-with: XMLHttpRequest’ --data $‘key=\u767e\u5ea6&type=0’ --compressed
* Trying 42.81.4.217…
* TCP_NODELAY set
* Connected to www.qichacha.com (42.81.4.217) port 443 (#0)
* TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate: .qichacha.com
Server certificate: GeoTrust SSL CA - G3
* Server certificate: GeoTrust Global CA
> POST /gongsi_getList HTTP/1.1
> Host: www.qichacha.com
> cookie: acw_tc=AQAAADKhg2r1fgQAzB38csKGgfa3ll5A; PHPSESSID=mr3rtla2pree2kma06in109lp7; UM_distinctid=16409b0ebb924e-01a16023a76872-19336953-13c680-16409b0ebba682; zg_did=%7B%22did%22%3A%20%2216409b0ebcf400-0a47e9ae97aea3-19336953-13c680-16409b0ebd0661%22%7D; _uab_collina=152917094889257593834626; _umdata=535523100CBE37C3B9E8426803FAE682F695DD5C372880100D01308BEF2CB953FEF5024D24D0BA85CD43AD3E795C914C6E418FBD7FCF11CFC02159EA6BDBD805; hasShow=1; Hm_lvt_3456bee468c83cc63fb5147f119f1075=1529170947,1529201010,1529202769,1529381570; CNZZDATA1254842228=222723142-1529170913-https%253A%252F%252Fwww.qichacha.com%252F%7C1529382147; zg_de1d1a35bfa24ce29bbf2c7eb17e6c4f=%7B%22sid%22%3A%201529386744674%2C%22updated%22%3A%201529386772675%2C%22info%22%3A%201529170947031%2C%22superProperty%22%3A%20%22%7B%7D%22%2C%22platform%22%3A%20%22%7B%7D%22%2C%22utm%22%3A%20%22%7B%7D%22%2C%22referrerDomain%22%3A%20%22%22%2C%22cuid%22%3A%20%22b77823811d3a8fd207eef49092fcf4d6%22%7D; Hm_lpvt_3456bee468c83cc63fb5147f119f1075=1529386773
> origin: https://www.qichacha.com
> accept-encoding: gzip, deflate, br
> accept-language: en-US,en;q=0.9
> user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36
> content-type: application/x-www-form-urlencoded; charset=UTF-8
> accept: /
> referer: https://www.qichacha.com/
> authority: www.qichacha.com
> x-requested-with: XMLHttpRequest
> Content-Length: 17
>
* upload completely sent off: 17 out of 17 bytes
< HTTP/1.1 200 OK
< Server: Tengine
< Content-Type: text/html; charset=UTF-8
< Transfer-Encoding: chunked
< Connection: keep-alive
< Date: Tue, 19 Jun 2018 05:51:11 GMT
< Vary: Accept-Encoding
< Expires: Thu, 19 Nov 1981 08:52:00 GMT
< Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
< Pragma: no-cache
< Content-Encoding: gzip
< Via: cache19.l2em21-1[138,200-0,M], cache2.l2em21-1[139,0], cache8.cn247[169,200-0,M], cache5.cn247[170,0]
< X-Cache: MISS TCP_MISS dirn:-2:-2 mlen:-1
< X-Swift-SaveTime: Tue, 19 Jun 2018 05:51:11 GMT
< X-Swift-CacheTime: 0
< Timing-Allow-Origin: *
< EagleId: 2a51048515293874710697396e
<
* Curl_http_done: called premature == 0
* Connection #0 to host www.qichacha.com left intact
[{“KeyNo”:“3f603703d59a04cbe427e5825099a565”,“Name”:"<em>\u767e\u5ea6</em>\u5728\u7ebf\u7f51\u7edc\u6280\u672f(\u5317\u4eac)\u6709\u9650\u516c\u53f8",“Reason”:"\u80a1\u7968\u7b80\u79f0",“Value”:"<em>\u767e\u5ea6</em>",“OperName”:null,“ImageUrl”:null},{“KeyNo”:“576c21e3468a6b178bbf291e4820e896”,“Name”:"\u5317\u4eac<em>\u767e\u5ea6</em>\u7f51\u8baf\u79d1\u6280\u6709\u9650\u516c\u53f8",“Reason”:"\u516c\u53f8\u540d\u79f0",“Value”:"\u5317\u4eac<em>\u767e\u5ea6</em>\u7f51\u8baf\u79d1\u6280\u6709\u9650\u516c\u53f8",“OperName”:null,“ImageUrl”:null},{“KeyNo”:“040087950737026999780939d6a623e9”,“Name”:"<em>\u767e\u5ea6</em>\u56fd\u9645\u79d1\u6280(\u6df1\u5733)\u6709\u9650\u516c\u53f8",“Reason”:"\u516c\u53f8\u540d\u79f0",“Value”:"<em>\u767e\u5ea6</em>\u56fd\u9645\u79d1\u6280(\u6df1\u5733)\u6709\u9650\u516c\u53f8",“OperName”:null,“ImageUrl”:null},{“KeyNo”:“9459ee4a7789af50354b26dfc971c28a”,“Name”:"<em>\u767e\u5ea6</em>\u79fb\u4fe1\u7f51\u7edc\u6280\u672f(\u5317\u4eac)\u6709\u9650\u516c\u53f8",“Reason”:"\u516c\u53f8\u540d\u79f0",“Value”:"<em>\u767e\u5ea6</em>\u79fb\u4fe1\u7f51\u7edc\u6280\u672f(\u5317\u4eac)\u6709\u9650\u516c\u53f8",“OperName”:null,“ImageUrl”:null},{“KeyNo”:“587d870f88a25bc849102850fcef9c0e”,“Name”:"<em>\u767e\u5ea6</em>\u65f6\u4ee3\u7f51\u7edc\u6280\u672f(\u5317\u4eac)\u6709\u9650\u516c\u53f8",“Reason”:"\u516c\u53f8\u540d\u79f0",“Value”:"<em>\u767e\u5ea6</em>\u65f6\u4ee3\u7f51\u7edc\u6280\u672f(\u5317\u4eac)\u6709\u9650\u516c\u53f8",“OperName”:null,“ImageUrl”:null}]%
Python爬虫中为什么第一台电脑被反爬后不能返回数据,而第二台电脑可以?


4 回复

IP 被限制


这个问题很典型,通常和IP地址、浏览器指纹或本地存储状态有关。

第一台电脑被反爬,最常见的原因是它的公网IP地址已经被目标网站标记或封禁。很多网站的反爬虫策略会跟踪异常请求的IP,一旦触发规则(比如请求频率过高、请求头异常),就会暂时或永久屏蔽该IP的访问。这时候,无论你怎么换浏览器或清Cookie,只要IP没变,就收不到正常数据。

第二台电脑能访问,大概率是因为它有一个全新的、未被封禁的IP地址。这验证了问题出在IP层面,而不是你的爬虫代码逻辑。

要解决这个问题,核心是让你的请求看起来像来自不同的、正常的用户。这里给你一个最直接、最有效的代码方案,使用 requests 库配合轮换User-Agent和代理IP:

import requests
import random
from time import sleep

# 1. 准备一个用户代理池,模拟不同浏览器
USER_AGENTS = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15',
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
]

# 2. 代理IP池(这里用免费示例,实际项目建议用付费可靠代理)
PROXIES = [
    {'http': 'http://123.45.67.89:8080', 'https': 'http://123.45.67.89:8080'},
    # ... 添加更多代理
]

def robust_request(url):
    """一个带基本反反爬措施的请求函数"""
    headers = {
        'User-Agent': random.choice(USER_AGENTS),
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Accept-Language': 'en-US,en;q=0.5',
        'Referer': 'https://www.google.com/'
    }

    proxy = random.choice(PROXIES) if PROXIES else None

    try:
        resp = requests.get(url, headers=headers, proxies=proxy, timeout=10)
        resp.raise_for_status() # 检查HTTP错误
        return resp
    except requests.exceptions.RequestException as e:
        print(f"请求失败: {e}")
        return None

# 使用示例
if __name__ == '__main__':
    url = 'https://httpbin.org/headers' # 测试用网址,会返回你的请求头
    response = robust_request(url)

    if response and response.status_code == 200:
        print("成功获取数据!")
        print(response.json()) # 查看返回的请求头信息
    else:
        print("获取数据失败。")

    # 重要:在请求间随机延迟,模拟真人操作
    sleep(random.uniform(1, 3))

代码要点解释:

  1. User-Agent轮换:每次请求随机从池中选取一个,避免使用Python默认的User-Agent。
  2. 代理IP:这是解决IP封锁的关键。代码中使用了代理池,请求通过不同的IP发出。免费的代理不稳定,对于严肃项目,强烈建议使用付费代理服务
  3. 请求头完善:添加了 AcceptAccept-LanguageReferer 等常见头,使请求更像浏览器。
  4. 随机延迟:在连续请求之间使用 sleep,避免高频请求触发风控。

总结:问题根源是IP被识别,解决方案是更换IP并完善请求头。

这些爬虫起家的网站,都会有反爬虫策略的

我用一样的 ip 不一样结果啊

回到顶部