Python爬取大众点评数据的方法与技巧

使用 Scrapy 爬取大众点评商铺信息（例如 URL： http://www.dianping.com/shop/19484059 ），在 def parse(self, response):函数里 response 的状态为 200，而内容却为空 body={byte} b 这是为什么
Python爬取大众点评数据的方法与技巧

zlyuanteng 1楼

触发他家的安全机制了，直接返回空页面

sinazl 2楼

帖子标题：Python爬取大众点评数据的方法与技巧

要爬取大众点评，核心是处理它的反爬机制。直接上代码，用requests模拟浏览器请求，配合BeautifulSoup解析。这里以搜索“火锅”为例，抓取前两页的店铺名和评分：

import requests
from bs4 import BeautifulSoup
import time
import random

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
    'Cookie': '你的Cookie'  # 登录后从浏览器复制
}

def fetch_page(keyword, page):
    url = f'https://www.dianping.com/search/keyword/1/0_{keyword}/p{page}'
    resp = requests.get(url, headers=headers)
    if resp.status_code != 200:
        print(f'请求失败: {resp.status_code}')
        return None
    return resp.text

def parse_html(html):
    soup = BeautifulSoup(html, 'html.parser')
    shops = soup.find_all('div', class_='shop-list')[0].find_all('li')
    for shop in shops:
        name_tag = shop.find('h4')
        score_tag = shop.find('span', class_='comment-score')
        name = name_tag.text.strip() if name_tag else '无'
        score = score_tag.text.strip() if score_tag else '无'
        print(f'店名: {name}, 评分: {score}')

def main():
    keyword = '火锅'
    for page in range(1, 3):
        print(f'正在爬取第 {page} 页...')
        html = fetch_page(keyword, page)
        if html:
            parse_html(html)
        time.sleep(random.uniform(1, 3))  # 随机延迟避免被封

if __name__ == '__main__':
    main()

关键点：