Python爬取拉勾网时如何解决滑动验证码问题?
最近我在爬拉钩的
https://www.lagou.com/gongsi/0-2-0-0
这个页面点进去进入公司的页面 爬虫地址信息 有时候报验证码,求指教
headers = {
“User-Agent”: “Mozilla/5.0 (hp-tablet; Linux; hpwOS/3.0.2; U; de-DE) AppleWebKit/534.6 (KHTML, like Gecko) wOSBrowser/234.40.1 Safari/534.6 TouchPad/1.0”,
“X-Requested-With”: “XMLHttpRequest”, “Referer”: “https://www.lagou.com/gongsi/0-1-0-0”
}
cookies_str = "user_trace_token=20180909010719-4eb82332-59f2-4979-b7ba-4a96de35eb40; _ga=GA1.2.1153938840.1536426437; LGUID=20180909010720-a5755fe0-b389-11e8-8ccd-525400f775ce; index_location_city=%E5%8C%97%E4%…“
res = requests.get(companyLink, headers=header, cookies=get_cookies(cookies_str )
偶尔会弹出这个链接
https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=47.105.103.225
Python爬取拉勾网时如何解决滑动验证码问题?
核心思路: 用Selenium模拟真人滑动轨迹,绕过验证码。
完整代码示例:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ActionChains
import time
import random
def get_track(distance):
"""生成滑动轨迹"""
track = []
current = 0
mid = distance * 3/4
t = 0.2
v = 0
while current < distance:
if current < mid:
a = 2
else:
a = -3
v0 = v
v = v0 + a * t
move = v0 * t + 0.5 * a * t * t
current += move
track.append(round(move))
return track
def drag_slider(driver, slider, distance):
"""执行滑动操作"""
ActionChains(driver).click_and_hold(slider).perform()
track = get_track(distance)
for x in track:
ActionChains(driver).move_by_offset(xoffset=x, yoffset=0).perform()
time.sleep(random.uniform(0.001, 0.003))
time.sleep(0.2)
ActionChains(driver).release().perform()
def main():
driver = webdriver.Chrome()
driver.get("https://www.lagou.com/")
try:
# 等待验证码出现
wait = WebDriverWait(driver, 10)
slider = wait.until(
EC.element_to_be_clickable((By.CLASS_NAME, "geetest_slider_button"))
)
# 计算需要滑动的距离(这里需要根据实际页面调整)
# 通常需要获取滑块背景图的宽度
distance = 260 # 这个值需要根据实际情况调整
# 执行滑动
drag_slider(driver, slider, distance)
time.sleep(3) # 等待验证结果
except Exception as e:
print(f"验证失败: {e}")
finally:
driver.quit()
if __name__ == "__main__":
main()
关键点说明:
- 轨迹模拟:
get_track()函数生成先加速后减速的物理运动轨迹,模拟真人操作 - 滑动执行:
drag_slider()通过ActionChains分解滑动动作,加入随机延迟 - 距离计算:滑动距离
distance需要根据实际验证码图片缺口位置确定,可能需要截图分析
调整建议:
- 实际距离可能需要通过图像识别计算缺口位置
- 可以加入失败重试机制
- 考虑使用
undetected-chromedriver避免被检测
一句话总结: 用物理轨迹模拟真人滑动,关键在轨迹要够“人性化”。

