Python爬取拉勾网时如何解决滑动验证码问题?

最近我在爬拉钩的
https://www.lagou.com/gongsi/0-2-0-0
这个页面点进去进入公司的页面 爬虫地址信息 有时候报验证码,求指教
headers = {
“User-Agent”: “Mozilla/5.0 (hp-tablet; Linux; hpwOS/3.0.2; U; de-DE) AppleWebKit/534.6 (KHTML, like Gecko) wOSBrowser/234.40.1 Safari/534.6 TouchPad/1.0”,
“X-Requested-With”: “XMLHttpRequest”, “Referer”: “https://www.lagou.com/gongsi/0-1-0-0
}
cookies_str = "user_trace_token=20180909010719-4eb82332-59f2-4979-b7ba-4a96de35eb40; _ga=GA1.2.1153938840.1536426437; LGUID=20180909010720-a5755fe0-b389-11e8-8ccd-525400f775ce; index_location_city=%E5%8C%97%E4%…“

res = requests.get(companyLink, headers=header, cookies=get_cookies(cookies_str )

偶尔会弹出这个链接

https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=47.105.103.225
Python爬取拉勾网时如何解决滑动验证码问题?


1 回复

核心思路: 用Selenium模拟真人滑动轨迹,绕过验证码。

完整代码示例:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ActionChains
import time
import random

def get_track(distance):
    """生成滑动轨迹"""
    track = []
    current = 0
    mid = distance * 3/4
    t = 0.2
    v = 0
    while current < distance:
        if current < mid:
            a = 2
        else:
            a = -3
        v0 = v
        v = v0 + a * t
        move = v0 * t + 0.5 * a * t * t
        current += move
        track.append(round(move))
    return track

def drag_slider(driver, slider, distance):
    """执行滑动操作"""
    ActionChains(driver).click_and_hold(slider).perform()
    track = get_track(distance)
    
    for x in track:
        ActionChains(driver).move_by_offset(xoffset=x, yoffset=0).perform()
        time.sleep(random.uniform(0.001, 0.003))
    
    time.sleep(0.2)
    ActionChains(driver).release().perform()

def main():
    driver = webdriver.Chrome()
    driver.get("https://www.lagou.com/")
    
    try:
        # 等待验证码出现
        wait = WebDriverWait(driver, 10)
        slider = wait.until(
            EC.element_to_be_clickable((By.CLASS_NAME, "geetest_slider_button"))
        )
        
        # 计算需要滑动的距离(这里需要根据实际页面调整)
        # 通常需要获取滑块背景图的宽度
        distance = 260  # 这个值需要根据实际情况调整
        
        # 执行滑动
        drag_slider(driver, slider, distance)
        
        time.sleep(3)  # 等待验证结果
        
    except Exception as e:
        print(f"验证失败: {e}")
    finally:
        driver.quit()

if __name__ == "__main__":
    main()

关键点说明:

  1. 轨迹模拟get_track()函数生成先加速后减速的物理运动轨迹,模拟真人操作
  2. 滑动执行drag_slider()通过ActionChains分解滑动动作,加入随机延迟
  3. 距离计算:滑动距离distance需要根据实际验证码图片缺口位置确定,可能需要截图分析

调整建议:

  • 实际距离可能需要通过图像识别计算缺口位置
  • 可以加入失败重试机制
  • 考虑使用undetected-chromedriver避免被检测

一句话总结: 用物理轨迹模拟真人滑动,关键在轨迹要够“人性化”。

回到顶部