Python中如何实现OCR识别功能？

tesseract-ocr 需要自己训练样本准确度才会高。

用Python做OCR，Tesseract是首选方案，配合pytesseract和PIL/Pillow就能快速上手。下面是完整代码示例：

# 安装依赖：pip install pytesseract Pillow
# 还需要单独安装Tesseract引擎：https://github.com/UB-Mannheim/tesseract/wiki

import pytesseract
from PIL import Image
import os

# 设置Tesseract路径（Windows需要，Linux/Mac通常不需要）
# pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

def basic_ocr(image_path):
    """基础OCR识别"""
    try:
        img = Image.open(image_path)
        text = pytesseract.image_to_string(img, lang='chi_sim+eng')  # 中英文混合识别
        return text.strip()
    except Exception as e:
        return f"识别失败: {str(e)}"

def ocr_with_preprocess(image_path):
    """带预处理的OCR识别（提高准确率）"""
    from PIL import ImageEnhance, ImageFilter
    
    img = Image.open(image_path)
    
    # 预处理步骤
    img = img.convert('L')  # 转灰度
    img = ImageEnhance.Contrast(img).enhance(2.0)  # 增强对比度
    img = ImageEnhance.Sharpness(img).enhance(2.0)  # 锐化
    img = img.filter(ImageFilter.SHARPEN)  # 再次锐化
    
    # 自定义配置
    custom_config = r'--oem 3 --psm 6'
    text = pytesseract.image_to_string(img, lang='chi_sim', config=custom_config)
    return text.strip()

# 使用示例
if __name__ == "__main__":
    # 方法1：基础识别
    result1 = basic_ocr("test.png")
    print("基础识别结果:\n", result1)
    
    # 方法2：预处理后识别
    result2 = ocr_with_preprocess("test.png")
    print("\n预处理后识别结果:\n", result2)
    
    # 获取边界框信息（用于文本定位）
    img = Image.open("test.png")
    data = pytesseract.image_to_data(img, output_type=pytesseract.Output.DICT)
    print("\n文本位置信息:", data['text'])

关键点说明：

Tesseract安装：需要单独安装OCR引擎，Windows下注意设置tesseract_cmd路径
语言包：lang='chi_sim'用中文包，'eng'用英文，'chi_sim+eng'混合识别
预处理：灰度化、对比度增强、锐化能显著提升识别率
配置参数：--oem 3用默认LSTM引擎，--psm 6假设为统一文本块

替代方案：

复杂场景用easyocr：pip install easyocr，对中文支持更好
需要高精度用商业API：百度/腾讯OCR

总结建议：简单场景用Tesseract，复杂需求上easyocr。

sinazl 3楼

baidu ocr 接口，最近一直用这个，5W 次 /天免费？

h691938207 4楼

baidu ocr 能设计单行模式吗？

eggper 5楼

#2 #2 讲道理一天 5W 次?

有什么其他的地址吗?

nodeper 6楼

什么不单行模式，只要其中某行的结果吗？如果是这样那为什么不先截出来再上传识别呢

caililin 7楼

哇靠，你怎么是这样！我从控制台的进去的就是 5W

wuwangju 8楼

控制台跟你的一样… 宣传页面是 500…

wuwangju 9楼

嗯嗯，之前还做进对比：

百度的黄色的正确是中英标点有误（中文的识别成英文），绿色正确是完全正确，单个标红是除了红色部分其它全部识别正确

阿里的就是直接的识别结果

可以看出来百度在文字识别这块还是很强的

yuanlaile 10楼

类似 tesseract 的 psm 7
Page segmentation modes:
0 Orientation and script detection (OSD) only.
1 Automatic page segmentation with OSD.
2 Automatic page segmentation, but no OSD, or OCR.
3 Fully automatic page segmentation, but no OSD. (Default)
4 Assume a single column of text of variable sizes.
5 Assume a single uniform block of vertically aligned text.
6 Assume a single uniform block of text.
7 Treat the image as a single text line.

wuwangju 11楼

我是直接下载的对应的中文训练集，就是不知道自己训练的话要如何实现呢？

yuanlaile 12楼

大佬发个链接 0.0

h691938207 13楼

官网（可以直接在“功能演示”上传图片体验下）
https://cloud.baidu.com/product/ocr/general
文档
https://cloud.baidu.com/doc/OCR/OCR-Python-SDK.html#.E5.BF.AB.E9.80.9F.E5.85.A5.E9.97.A8

yibo5220 14楼

谢谢啦

yuanlaile 15楼

靠效果确实不错就是只有 500 次

h691938207 16楼

通用识别（非高精度）应该有 5W 的，也够用

itying888 17楼

ocr 腾讯家最强

h691938207 18楼

通用的只有 500 次啊，要是 5w 次的话，我就不用再找了

htzhanglong 19楼

靠，控制台进去 5w 次，外面 500 次，醉了

sinazl 20楼

我买的谷歌云提供的 vision

yuanlaile 21楼

效果和费用怎么样

yuanlaile 22楼

公网的话，用免费的 ocr 接口，不是特殊需求日常食用足以。
要内网使用，特殊的字符需求，数据集小的 [tesseract]( https://github.com/tesseract-ocr/tesseract) 3.05 训练好了。
数据集量够的话，可以看看深度学习的方法了：注意力+长短期记忆。