Python中的cleanroom库:如何优雅地使用多进程处理(Multiprocessing for Humans)

https://github.com/huntzhan/cleanroom

刚写了个玩具,在多进程上包了一层实现类函数调用的自动转发。
Python中的cleanroom库:如何优雅地使用多进程处理(Multiprocessing for Humans)

1 回复

cleanroom库我没用过,但Python标准库的multiprocessing模块本身就是“给人类用的多进程处理工具”。如果你想要更优雅的写法,我推荐用concurrent.futures.ProcessPoolExecutor,这是目前最Pythonic的方式。

直接上代码示例,对比下传统写法和优雅写法:

# 传统multiprocessing写法(啰嗦)
import multiprocessing

def worker(x):
    return x * x

if __name__ == '__main__':
    with multiprocessing.Pool(4) as pool:
        results = pool.map(worker, range(10))
    print(results)  # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
# 更优雅的ProcessPoolExecutor写法
from concurrent.futures import ProcessPoolExecutor

def worker(x):
    return x * x

if __name__ == '__main__':
    with ProcessPoolExecutor(max_workers=4) as executor:
        # map用法
        results = list(executor.map(worker, range(10)))
        print(results)
        
        # 或者用submit获取单个结果
        future = executor.submit(worker, 5)
        print(future.result())  # 25
        
        # 批量submit
        futures = [executor.submit(worker, i) for i in range(5, 10)]
        print([f.result() for f in futures])

几个关键点:

  1. 自动资源管理with语句确保进程池正确关闭
  2. Future模式submit()返回Future对象,可以异步获取结果
  3. 异常处理友好:Future.exception()能捕获子进程异常
  4. 兼容性高:和ThreadPoolExecutor接口一致,切换线程/进程只需改一行

如果数据量很大,考虑用imap避免内存爆炸:

with ProcessPoolExecutor() as executor:
    for result in executor.map(worker, range(1000000)):
        # 流式处理,不一次性加载所有结果
        process(result)

简单说就是:用ProcessPoolExecutor替代直接操作multiprocessing

回到顶部