[求助] 为何用Python爬取微博网页时，代码有1500多行，而我只能提取10%左右的数据？

为何微博网页代码有 1500 多行（ response 内看到的），而我只能提取 10%左右，已经试过很多次了，代码如下：

import requests
from pyquery import PyQuery as pq
from urllib.parse import urlencode
import re

def dizhi():
headers = {‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3573.0 Safari/537.36’}
data = {
‘q’: ‘微信群’,
‘typeall’: ‘1’,
‘suball’: ‘1’,
‘timescope’: ‘custom:2018 - 12 - 20 - 0: 2018 - 12 - 22 - 0’,
‘Refer’: ‘g’
}
url = ‘https://s.weibo.com/weibo/%25E5%25AE%259D%25E5%25A6%2588%25E7%25BE%25A4?’ + urlencode(data)
wangzhi = requests.get(url,headers = headers)
return wangzhi.text

def jiexi(html):
doc = pq(html)
item = doc(’.m4 li’)
for i in item.items():
print(i(‘img’).attr(‘src’))

def main():
html = dizhi()
print(html)
jiexi(html)

if name == ‘main’:
main()
[求助] 为何用Python爬取微博网页时，代码有1500多行，而我只能提取10%左右的数据？

sinazl 1楼

是不是异步加载的数据

yibo5220 2楼作者

我无法理解你的问题。

wuwangju 3楼

caililin 4楼

微博为什么抓页面啊，不是有 API 吗

vueper 5楼

试试微博移动端的 api，限制少，可以拿来学习

itying888 6楼

https://html.python-requests.org

request-html 库试试，先执行下 js，再取 text

vueper 7楼

你在浏览器里看到的是经过 JS 渲染的，想看到和你自己写的程序去请求拿到的 response 完全一样的内容，请使用 postman 之类的工具。