Python中如何抓取微店的数据?
貌似只能通过微信客户端访问,请问大家有什么经验吗? 谢谢
Python中如何抓取微店的数据?
import requests
from bs4 import BeautifulSoup
import json
import time
class WeidianScraper:
def __init__(self):
self.headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept': 'application/json, text/plain, */*',
'Accept-Language': 'zh-CN,zh;q=0.9',
}
self.session = requests.Session()
def get_product_list(self, shop_id, page=1):
"""获取商品列表"""
url = f'https://weidian.com/item.html?itemID={shop_id}'
try:
response = self.session.get(url, headers=self.headers, timeout=10)
response.raise_for_status()
# 解析页面获取商品数据
soup = BeautifulSoup(response.text, 'html.parser')
# 微店数据通常通过JavaScript加载,这里需要找实际的API接口
# 实际开发中需要通过浏览器开发者工具找到真实的数据接口
products = []
# 示例:查找商品信息(实际需要根据页面结构调整)
product_items = soup.find_all('div', class_='product-item')
for item in product_items:
product = {
'title': item.find('h3').text.strip() if item.find('h3') else '',
'price': item.find('span', class_='price').text.strip() if item.find('span', class_='price') else '',
'sales': item.find('span', class_='sales').text.strip() if item.find('span', class_='sales') else '',
'link': item.find('a')['href'] if item.find('a') else ''
}
products.append(product)
return products
except Exception as e:
print(f"获取商品列表失败: {e}")
return []
def get_product_detail(self, product_url):
"""获取商品详情"""
try:
response = self.session.get(product_url, headers=self.headers, timeout=10)
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
# 提取商品详情信息
detail = {
'title': soup.find('h1', class_='product-title').text.strip() if soup.find('h1', class_='product-title') else '',
'description': soup.find('div', class_='product-desc').text.strip() if soup.find('div', class_='product-desc') else '',
'images': [img['src'] for img in soup.find_all('img', class_='product-image')],
'sku_info': self._extract_sku_info(soup)
}
return detail
except Exception as e:
print(f"获取商品详情失败: {e}")
return {}
def _extract_sku_info(self, soup):
"""提取SKU信息"""
sku_data = []
sku_elements = soup.find_all('div', class_='sku-item')
for sku in sku_elements:
sku_info = {
'name': sku.find('span', class_='sku-name').text.strip() if sku.find('span', class_='sku-name') else '',
'price': sku.find('span', class_='sku-price').text.strip() if sku.find('span', class_='sku-price') else '',
'stock': sku.find('span', class_='sku-stock').text.strip() if sku.find('span', class_='sku-stock') else ''
}
sku_data.append(sku_info)
return sku_data
def save_to_json(self, data, filename):
"""保存数据到JSON文件"""
with open(filename, 'w', encoding='utf-8') as f:
json.dump(data, f, ensure_ascii=False, indent=2)
print(f"数据已保存到 {filename}")
# 使用示例
if __name__ == "__main__":
scraper = WeidianScraper()
# 获取商品列表(需要替换为实际的店铺ID)
shop_id = "123456789" # 示例店铺ID
products = scraper.get_product_list(shop_id)
if products:
print(f"获取到 {len(products)} 个商品")
# 获取第一个商品的详情
if products[0]['link']:
detail = scraper.get_product_detail(products[0]['link'])
print(f"商品详情: {detail['title']}")
# 保存数据
scraper.save_to_json(products, 'weidian_products.json')
else:
print("未获取到商品数据")
关键点说明:
实际开发注意事项:
简单建议:先分析页面结构,找到真实数据接口再写代码。