Python 外包项目：如何使用 Scrapy 实现 Web 的 POST 请求爬虫？

发 post 怎么回事？反爬策略没测出来？

在Scrapy里发POST请求，主要就两种方式，看你要爬的数据是固定的还是动态的。

1. 固定表单数据（最常用） 如果你的POST数据是固定的几个键值对，直接在FormRequest里写死就行。比如模拟登录：

import scrapy
from scrapy.http import FormRequest

class MySpider(scrapy.Spider):
    name = 'post_spider'
    start_urls = ['http://example.com/login']

    def parse(self, response):
        # 直接构造FormRequest发送POST
        return FormRequest.from_response(
            response,
            formdata={'username': 'your_user', 'password': 'your_pass'},
            callback=self.after_login
        )

    def after_login(self, response):
        # 登录后的处理逻辑
        if "Welcome" in response.text:
            self.logger.info("Login succeeded!")
        # ... 继续爬取

2. 动态或JSON数据 如果数据需要从页面提取，或者API要求JSON格式，就得先抓取再构造。

import scrapy
import json

class ApiSpider(scrapy.Spider):
    name = 'api_spider'
    
    def start_requests(self):
        # 起始请求可以是GET，用于获取token等动态参数
        yield scrapy.Request('http://api.example.com/init', callback=self.parse_init)
    
    def parse_init(self, response):
        # 假设从响应中提取了一个token
        token = response.css('input#token::attr(value)').get()
        
        # 构造JSON格式的POST请求体
        payload = {
            'query': 'scrapy',
            'token': token,
            'page': 1
        }
        
        yield scrapy.Request(
            url='http://api.example.com/search',
            method='POST',
            body=json.dumps(payload),  # 重要：JSON数据要dumps成字符串
            headers={'Content-Type': 'application/json'},
            callback=self.parse_results
        )
    
    def parse_results(self, response):
        data = json.loads(response.text)
        # 处理返回的JSON数据...

关键点：

用FormRequest处理标准表单提交。
用scrapy.Request并指定method='POST'处理自定义请求，JSON数据记得json.dumps()并设置正确的Content-Type头。
如果需要从初始页面提取CSRF token之类的动态参数，通常需要先发一个GET请求。

总结：根据数据格式选对请求方法，动态参数记得先抓取。

nodeper 3楼

不是爬虫，是模拟操作点击。

sinazl 4楼

找到 js 直接 post