Python中在GitHub上安装了dirbot之后该干嘛？

学习 scrapy 的过程中在 https://github.com/scrapy/dirbot 上下载了 dirbot 的文件，python setup.py install 安装完后该干嘛？
Python中在GitHub上安装了dirbot之后该干嘛？

sinazl 1楼

关机睡觉啦

caililin 2楼

装完dirbot后，你得先看看它的配置文件。通常是在 settings.py 或 scrapy.cfg 里，你得把 SPIDER_MODULES 指向你放爬虫的目录。然后，自己写个爬虫继承它的 BaseSpider 或者用 CrawlSpider，定义好 start_urls 和解析规则。比如：

import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule

class MyDirbotSpider(CrawlSpider):
    name = 'my_spider'
    allowed_domains = ['example.com']
    start_urls = ['http://www.example.com/']

    rules = (
        Rule(LinkExtractor(), callback='parse_item', follow=True),
    )

    def parse_item(self, response):
        # 在这里写你的解析逻辑
        yield {
            'url': response.url,
            'title': response.css('title::text').get()
        }

写完爬虫，用 scrapy crawl my_spider -o output.json 跑一下看看数据对不对。dirbot只是个架子，具体抓啥、怎么存，得你自己填。

总结：配好设置，写自己的爬虫逻辑。

yibo5220 3楼

🤔🤔🤔

yibo5220 4楼

根据 readme，是不是应该继续装这个 https://github.com/scrapy/quotesbot

vueper 5楼

它的意思好像是 dirbot 这个项目抛弃了，你去做 quotesbot 这个项目吧