Python爬虫遇到表单提交保护，如何绕过？

直接不隐讳的直接说了，想写个 python 脚本帮朋友申请签证名额，

遇到的问题

每次 get，得到的都是 noscript 的提示，需要 JS
对于非本地 IP 有验证码

我的尝试

mechanize

import sys
import mechanize
url ='xxx'
response2=br.open(url)
request = br.request
print (response2.info())
print (response2.read())

output：

Cache-Control: no-store, must-revalidate, no-cache, max-age=0
Content-Type: text/html
Connection: close
Vary: Accept-Encoding
Pragma: no-cache
Expires: -1
CacheControl: no-cache
X-UA-Compatible: IE=edge
Content-Type: text/html; charset=utf-8
… more content …
<noscript>Please enable JavaScript to view the page content.</noscript>
</head><body>
</body></html>

selenium

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
url= ‘xxx’
driver.get(url)
print driver.context
print driver.title
print driver.page_source
driver.close()

output

nosctipt 同上

Refes

目标站点及详细问题，请见 so https://stackoverflow.com/questions/44562212/fetching-web-page-but-need-javascript-to-view-page-content
目标 url 在 comment 里

Python爬虫遇到表单提交保护，如何绕过？

sinazl 1楼作者

怎么登录，需要用户名密码啊。可以帮你看看

gougou168 2楼

遇到表单保护，直接上代码。常见的情况是CSRF token或者动态参数，用requests配合BeautifulSoup就能搞定。

import requests
from bs4 import BeautifulSoup

# 先获取页面，提取token
session = requests.Session()
login_page = session.get('https://example.com/login')
soup = BeautifulSoup(login_page.text, 'html.parser')

# 找csrf token，具体看页面结构
csrf_token = soup.find('input', {'name': 'csrf_token'})['value']

# 构造表单数据
payload = {
    'username': 'your_username',
    'password': 'your_password',
    'csrf_token': csrf_token
}

# 提交
response = session.post('https://example.com/login', data=payload)

# 检查是否成功
if '登录成功' in response.text:
    print('搞定')

如果还有JS加密参数，得用selenium或者分析JS逻辑。总结：先分析页面结构，模拟正常请求流程。

h691938207 3楼

直接开 selenium 在真机上刷，以前刷论坛是这么做的。

songsunli 4楼

1、需要 JS 要么协议头有问题，要么获取了 cookies，要处理。
2、对接验证码平台。

itying888 5楼

楼主，是新西兰的 WHV 签证吧？别爬了，没用的。这个现在只能靠手速和自动填表了。23 号就开抢了，你现在准备也太晚了吧？今年网站已经改版了，表格最新的 DOM 信息有吗？付款页面的 DOM 信息有吗？这些我都有，不过一早已经没兴趣抢了。不过移民政策改了，相信今年抢的人不会那么多了，君不见淘宝的代抢价格已经从 7000 多降到 4000 多了吗？

最后，祝你朋友好运啦！