Python中如何编写抓取淘宝MM图片的爬虫程序

我平时写后台的，看你们爬得高兴也来凑个热闹。做得很粗糙，没有考虑出错恢复什么的，有时间再加吧。地址是 https://github.com/carlonelong/TaobaoMMCrawler

yuanlaile 1楼作者

这是抓啥的？？？

gougou168 2楼

我无法理解你的问题。

gougou168 3楼

mm 相册

htzhanglong 4楼

原来是抓淘女郎……
话说抓过某特定关键词的买家秀，惊喜多多… 楼主可以试试… 记住分类排除内衣的（不让上图

htzhanglong 5楼

这个刺激了

htzhanglong 6楼

能抓东京的大姐姐吗

sinazl 7楼

抓 cosplay 店的

sinazl 8楼

来提供一个~~

wuwangju 9楼

有道理

yuanlaile 10楼作者

报错了

start downloading 田媛媛
current page 1
start downloading album 10000702574 45ÕÅ 张
Traceback (most recent call last):
File “/Users/hunter/Downloads/TaobaoMMCrawler-master/crawler.py”, line 83, in <module>
c.getAlbums()
File “/Users/hunter/Downloads/TaobaoMMCrawler-master/crawler.py”, line 58, in getAlbums
self.getImages(model_id, album_id, album_img_count.strip(u’张’))
File “/Users/hunter/Downloads/TaobaoMMCrawler-master/crawler.py”, line 65, in getImages
for page in xrange(1, (int(image_count)-1)/16+2):
ValueError: invalid literal for int() with base 10: ‘45\xd5\xc5’

yibo5220 11楼

编码出问题了。。你是啥环境啊

yibo5220 12楼

美图秀秀修过度的图，不如看看那些国内的擦边套图

phonegap100 13楼

好像有 BUG 啊

 $ python <a target="_blank" href="http://crawler.py" rel="nofollow noopener">crawler.py</a> start downloading 田媛媛 current page 1 start downloading album 10000702574 45ÕÅ 张 Traceback (most recent call last): File "<a target="_blank" href="http://crawler.py" rel="nofollow noopener">crawler.py</a>", line 83, in <module> c.getAlbums() File "<a target="_blank" href="http://crawler.py" rel="nofollow noopener">crawler.py</a>", line 58, in getAlbums self.getImages(model_id, album_id, album_img_count.strip(u'张')) File "<a target="_blank" href="http://crawler.py" rel="nofollow noopener">crawler.py</a>", line 65, in getImages for page in xrange(1, (int(image_count)-1)/16+2): ValueError: invalid literal for int() with base 10: '45\xd5\xc5'

ionicwang 14楼

抓淘宝 MM
好 h

zlyuanteng 15楼

41 行 soup = bs(self.readHtml(model_url).decode(‘gbk’), ‘html.parser’) 修改成功不报错了

gougou168 16楼

好 thx 我改一下

wuwangju 17楼

Python 版本要多少啊？
我 2.7 在 Mac 和 Windows 下都报同样的错呢
 Traceback (most recent call last): File "<a target="_blank" href="http://TaobaoMMCrawler.py" rel="nofollow noopener">TaobaoMMCrawler.py</a>", line 5, in <module> from bs4 import BeautifulSoup as bs ImportError: No module named bs4