Python中如何自定义ScrapydWeb的Run Spider页面settings与arguments默认值
1.安装更新:
pip install -U git+https://github.com/my8100/scrapydweb.git
2.如果之前已在使用 scrapydweb v1.2.0,则在已有的配置文件中添加如下配置选项:
############################## Run Spider #####################################
# The default is False, set it to True to automatically
# expand the 'settings & arguments' section in the Run Spider page.
SCHEDULE_EXPAND_SETTINGS_ARGUMENTS = False
The default is ‘Mozilla/5.0’, set it a non-empty string to customize the default value of custom
in the drop-down list of USER_AGENT.
SCHEDULE_CUSTOM_USER_AGENT = ‘Mozilla/5.0’
The default is None, set it to any value of [‘custom’, ‘Chrome’, ‘iPhone’, ‘iPad’, ‘Android’]
to customize the default value of USER_AGENT.
SCHEDULE_USER_AGENT = None
The default is None, set it to True or False to customize the default value of ROBOTSTXT_OBEY.
SCHEDULE_ROBOTSTXT_OBEY = None
The default is None, set it to True or False to customize the default value of COOKIES_ENABLED.
SCHEDULE_COOKIES_ENABLED = None
The default is None, set it to a non-negative integer to customize the default value of CONCURRENT_REQUESTS.
SCHEDULE_CONCURRENT_REQUESTS = None
The default is None, set it to a non-negative number to customize the default value of DOWNLOAD_DELAY.
SCHEDULE_DOWNLOAD_DELAY = None
The default is “-d setting=CLOSESPIDER_TIMEOUT=60\r\n-d setting=CLOSESPIDER_PAGECOUNT=10\r\n-d arg1=val1”,
set it to ‘’ or any non-empty string to customize the default value of additional.
Use ‘\r\n’ as the line separator.
SCHEDULE_ADDITIONAL = “-d setting=CLOSESPIDER_TIMEOUT=60\r\n-d setting=CLOSESPIDER_PAGECOUNT=10\r\n-d arg1=val1”
3.GitHub
Python中如何自定义ScrapydWeb的Run Spider页面settings与arguments默认值
这个需求很常见,主要是通过修改ScrapydWeb的配置文件来实现。ScrapydWeb默认会从项目的settings.py中读取配置,但Run Spider页面的默认值可以通过scrapydweb_settings_v10.py进行自定义。
具体操作如下:
-
找到配置文件:在ScrapydWeb部署目录下,找到
scrapydweb_settings_v10.py文件。 -
修改默认设置:在配置文件中添加或修改以下配置项:
# 设置默认的settings值
DEFAULT_SETTINGS = {
'DOWNLOAD_DELAY': 2,
'CONCURRENT_REQUESTS': 16,
'ROBOTSTXT_OBEY': False,
# 添加其他你需要的默认设置
}
# 设置默认的arguments值
DEFAULT_KWARGS = {
'arg1': 'default_value1',
'arg2': 'default_value2',
# 添加其他你需要的默认参数
}
# 是否在Run Spider页面显示这些默认值
DISPLAY_DEFAULT_SETTINGS = True
DISPLAY_DEFAULT_KWARGS = True
-
重启服务:修改完成后需要重启ScrapydWeb服务使配置生效。
-
项目级别覆盖:如果需要在不同项目中使用不同的默认值,可以在各个spider的
custom_settings中覆盖这些默认设置。
这样配置后,每次在Run Spider页面,这些设置和参数就会自动填充为默认值,用户可以根据需要修改。
总结:通过修改scrapydweb_settings_v10.py配置文件来设置默认值。

