Python中使用pymongo进行count操作很慢，求正确优化方法

三万条数据, 每条数据只包含一个随机数 {"digit": 随机数}
要求: 统计出现最多次数的数字
数据库表 table

def main():
    digits = []
    for d in table.find():
        n = d['digit']
        digits.append(n)
    dig = set(digits)
news = []
i = 0
for d in dig:
    c = table.find({"digit": d}).count()
    zz = (d, c)
    news.append(zz)
    print(i)
    i += 1
if name == ‘main’:
start = time.time()
main()
print(‘Cost: {}’.format(time.time() - start))

运行一次需要五六分钟吧, 用多线程开 100 也快不了多少, 风扇还特响...
请问正确姿势是怎样的

itying888 1楼作者

帖子标题：Python中使用pymongo进行count操作很慢，求正确优化方法

遇到pymongo的count()慢，通常是因为它在扫描集合。别用count()了，用count_documents()配合查询条件，或者直接用estimated_document_count()。

1. 需要精确计数（带查询条件） 用count_documents()，它走索引，比老count()快得多。

from pymongo import MongoClient

client = MongoClient('mongodb://localhost:27017/')
db = client['your_database']
collection = db['your_collection']

# 精确计数，带查询条件
query = {"status": "active"}
count = collection.count_documents(query)
print(f"Active documents: {count}")

2. 只需要大概总数（不带查询条件） 用estimated_document_count()，直接从集合元数据里拿，快如闪电。

# 估计文档总数，超快
estimated_count = collection.estimated_document_count()
print(f"Estimated total documents: {estimated_count}")

关键点：

老版的count()方法已经被弃用了，别再用。
确保你的查询字段有索引，count_documents()才能飞起来。
如果只是要个大概数，estimated_document_count()是最佳选择。

总结：换count_documents()或estimated_document_count()就对了。

zlyuanteng 2楼

aggregate

h691938207 3楼

是的, 忘了补充了…thx~