在爬取大众点评商铺信息时遇到反爬问题，Python中如何有效应对？

我在抓取大众点评中商铺信息的时候，当我使用动态切换 IP （使用的是阿布云 IP 代理的）抓取的时候，被重定向到验证码的页面，但是当我不使用 IP 代理的时候，返回正常的信息，是识别了动态代理 IP 了码？怎么识别的动态 IP，有方法绕过去吗？并且在抓取部分字段时，有的在网页中，但是有的只给了一个标签，里面什么内容都没有，如何破？

wuwangju 1楼

遇到大众点评的反爬，核心思路是模拟真实用户行为并处理动态内容。直接上代码，这里用 requests 配合 Selenium 处理动态加载，并用 BeautifulSoup 解析：

import time
import random
import requests
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

# 配置Selenium
options = webdriver.ChromeOptions()
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_experimental_option("excludeSwitches", ["enable-automation"])
driver = webdriver.Chrome(options=options)

# 设置请求头
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
    'Accept-Language': 'zh-CN,zh;q=0.9',
    'Referer': 'https://www.dianping.com/'
}

def crawl_shop_info(url):
    try:
        driver.get(url)
        # 等待关键元素加载
        WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.CLASS_NAME, "shop-name"))
        )
        
        # 随机延迟模拟人工操作
        time.sleep(random.uniform(1, 3))
        
        # 获取页面源码
        page_source = driver.page_source
        soup = BeautifulSoup(page_source, 'html.parser')
        
        # 解析店铺信息
        shop_name = soup.find('h1', class_='shop-name').text.strip()
        address = soup.find('span', class_='address').text.strip()
        
        return {
            'shop_name': shop_name,
            'address': address
        }
        
    except Exception as e:
        print(f"爬取失败: {e}")
        return None
    finally:
        driver.quit()

# 使用示例
if __name__ == "__main__":
    shop_url = "https://www.dianping.com/shop/xxx"  # 替换为目标店铺URL
    info = crawl_shop_info(shop_url)
    print(info)

关键点说明：