Nodejs 抓取新浪博客的命令行程序：dature

dature 是基于 Node.js 的爬虫程序，可以抓取新浪博客某博主的全部博文，包含标题、正文、时间、分类、图片，并生成 HTML 文件。

安装

npm install -g dature

使用

dature sina_blog_uid

详见： https://www.npmjs.com/package/dature

songsunli 1楼

赞！顶一下支持～

nodeper 2楼

今天更新了生成 HTML 的模版

vueper 3楼作者

试用了下，生产的模板有问题，Cannot read property ‘title’ of undefined，

gougou168 4楼

新浪博客？还以为是微博。。。

yuanlaile 5楼

对的

eggper 6楼

已支持抓取 CSDN 博客

caililin 7楼

要创建一个使用 Node.js 抓取新浪博客内容的命令行程序，你可以使用 axios 库来发送 HTTP 请求，以及 cheerio 库来解析 HTML 内容。以下是一个基本的示例代码，展示如何抓取新浪博客的首页内容并提取文章标题。

首先，确保你已经安装了 Node.js 和 npm。然后，在项目目录中运行以下命令来安装所需的库：

npm init -y
npm install axios cheerio

接下来，创建一个名为 index.js 的文件，并添加以下代码：

const axios = require('axios');
const cheerio = require('cheerio');

const url = 'https://blog.sina.com.cn/';  // 新浪博客首页URL

axios.get(url)
  .then(response => {
    const $ = cheerio.load(response.data);
    const titles = [];

    $('.BlogList .title a').each((index, element) => {
      titles.push($(element).text().trim());
    });

    console.log('Article Titles:', titles);
  })
  .catch(error => {
    console.error('Error fetching the page:', error);
  });

这个脚本会抓取新浪博客首页上所有文章标题。请注意，新浪博客的页面结构可能会变化，所以选择器（如 .BlogList .title a）可能需要根据实际情况调整。

运行脚本：

node index.js

这个示例只是一个起点，你可以根据需要扩展功能，比如处理分页、抓取文章内容、保存数据到文件等。