Python中如何自定义ScrapydWeb的Run Spider页面settings与arguments默认值

1.安装更新:

pip install -U git+https://github.com/my8100/scrapydweb.git

2.如果之前已在使用 scrapydweb v1.2.0,则在已有的配置文件中添加如下配置选项:


############################## Run Spider #####################################
# The default is False, set it to True to automatically
# expand the 'settings & arguments' section in the Run Spider page.
SCHEDULE_EXPAND_SETTINGS_ARGUMENTS = False

The default is ‘Mozilla/5.0’, set it a non-empty string to customize the default value of custom

in the drop-down list of USER_AGENT.

SCHEDULE_CUSTOM_USER_AGENT = ‘Mozilla/5.0’

The default is None, set it to any value of [‘custom’, ‘Chrome’, ‘iPhone’, ‘iPad’, ‘Android’]

to customize the default value of USER_AGENT.

SCHEDULE_USER_AGENT = None

The default is None, set it to True or False to customize the default value of ROBOTSTXT_OBEY.

SCHEDULE_ROBOTSTXT_OBEY = None

The default is None, set it to True or False to customize the default value of COOKIES_ENABLED.

SCHEDULE_COOKIES_ENABLED = None

The default is None, set it to a non-negative integer to customize the default value of CONCURRENT_REQUESTS.

SCHEDULE_CONCURRENT_REQUESTS = None

The default is None, set it to a non-negative number to customize the default value of DOWNLOAD_DELAY.

SCHEDULE_DOWNLOAD_DELAY = None

The default is “-d setting=CLOSESPIDER_TIMEOUT=60\r\n-d setting=CLOSESPIDER_PAGECOUNT=10\r\n-d arg1=val1”,

set it to ‘’ or any non-empty string to customize the default value of additional.

Use ‘\r\n’ as the line separator.

SCHEDULE_ADDITIONAL = “-d setting=CLOSESPIDER_TIMEOUT=60\r\n-d setting=CLOSESPIDER_PAGECOUNT=10\r\n-d arg1=val1”

3.GitHub

https://github.com/my8100/scrapydweb


Python中如何自定义ScrapydWeb的Run Spider页面settings与arguments默认值

1 回复

这个需求很常见,主要是通过修改ScrapydWeb的配置文件来实现。ScrapydWeb默认会从项目的settings.py中读取配置,但Run Spider页面的默认值可以通过scrapydweb_settings_v10.py进行自定义。

具体操作如下:

  1. 找到配置文件:在ScrapydWeb部署目录下,找到scrapydweb_settings_v10.py文件。

  2. 修改默认设置:在配置文件中添加或修改以下配置项:

# 设置默认的settings值
DEFAULT_SETTINGS = {
    'DOWNLOAD_DELAY': 2,
    'CONCURRENT_REQUESTS': 16,
    'ROBOTSTXT_OBEY': False,
    # 添加其他你需要的默认设置
}

# 设置默认的arguments值
DEFAULT_KWARGS = {
    'arg1': 'default_value1',
    'arg2': 'default_value2',
    # 添加其他你需要的默认参数
}

# 是否在Run Spider页面显示这些默认值
DISPLAY_DEFAULT_SETTINGS = True
DISPLAY_DEFAULT_KWARGS = True
  1. 重启服务:修改完成后需要重启ScrapydWeb服务使配置生效。

  2. 项目级别覆盖:如果需要在不同项目中使用不同的默认值,可以在各个spider的custom_settings中覆盖这些默认设置。

这样配置后,每次在Run Spider页面,这些设置和参数就会自动填充为默认值,用户可以根据需要修改。

总结:通过修改scrapydweb_settings_v10.py配置文件来设置默认值。

回到顶部