Python中如何用Keras搭建卷积神经网络模型识别验证码

项目地址： https://github.com/chxj1992/captcha_cracker

Demo： http://captcha.chxj.name

验证码是用一个 PHP 库生成的 mews/Captcha

实现的比较简陋，图片分割直接四等分，肯定影响识别率

16000 个样本跑 200 轮, 单个字符训练准确率大概在 98%+，对新数据集准确率在 90%+，所以四个字符的验证码识别率大概 6，70%，不过搞事情还是可以了

Python中如何用Keras搭建卷积神经网络模型识别验证码

eggper 1楼

在Keras里搭个CNN来识别验证码，核心就是处理好多标签分类问题。验证码通常有多个字符，每个字符都是一个独立的分类任务。

直接上代码，用函数式API更灵活：

import numpy as np
from tensorflow.keras import layers, models, Input
from tensorflow.keras.utils import to_categorical

def build_captcha_cnn(img_height, img_width, num_channels, num_chars, char_set_size):
    """
    构建验证码识别CNN模型
    
    参数:
    img_height: 图像高度
    img_width: 图像宽度  
    num_channels: 通道数 (1 for grayscale, 3 for RGB)
    num_chars: 验证码字符数
    char_set_size: 字符集大小
    """
    
    # 输入层
    input_img = Input(shape=(img_height, img_width, num_channels))
    
    # 卷积块1
    x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
    x = layers.MaxPooling2D((2, 2))(x)
    
    # 卷积块2  
    x = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x)
    x = layers.MaxPooling2D((2, 2))(x)
    
    # 卷积块3
    x = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
    x = layers.MaxPooling2D((2, 2))(x)
    
    # 展平层
    x = layers.Flatten()(x)
    x = layers.Dropout(0.5)(x)
    
    # 全连接层
    x = layers.Dense(512, activation='relu')(x)
    x = layers.Dropout(0.5)(x)
    
    # 多输出层：每个字符位置一个输出
    outputs = []
    for i in range(num_chars):
        output = layers.Dense(char_set_size, activation='softmax', name=f'char_{i}')(x)
        outputs.append(output)
    
    # 构建模型
    model = models.Model(inputs=input_img, outputs=outputs)
    
    return model

# 示例：构建识别4位数字验证码的模型
model = build_captcha_cnn(
    img_height=60,      # 图像高度
    img_width=160,      # 图像宽度
    num_channels=1,     # 灰度图
    num_chars=4,        # 4位验证码
    char_set_size=10    # 数字0-9
)

# 编译模型 - 每个输出用分类交叉熵
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# 查看模型结构
model.summary()

# 准备训练数据示例
def prepare_data(X_train, y_train_list, num_chars, char_set_size):
    """
    准备多输出训练数据
    
    X_train: 图像数据 [num_samples, height, width, channels]
    y_train_list: 每个字符位置的标签列表 [num_chars][num_samples]
    """
    # 对每个字符位置的标签进行one-hot编码
    y_train_encoded = []
    for i in range(num_chars):
        y_encoded = to_categorical(y_train_list[i], num_classes=char_set_size)
        y_train_encoded.append(y_encoded)
    
    return X_train, y_train_encoded

# 训练模型示例
# X_train: 训练图像
# y_chars: 包含4个数组的列表，每个数组对应一个字符位置的标签
# model.fit(X_train, y_chars, epochs=10, batch_size=32)

# 预测示例
def predict_captcha(model, image, char_set):
    """
    预测验证码
    
    model: 训练好的模型
    image: 输入图像 [1, height, width, channels]
    char_set: 字符映射列表，如 ['0','1','2',...,'9']
    """
    predictions = model.predict(image)
    
    captcha = ''
    for i in range(len(predictions)):
        char_idx = np.argmax(predictions[i][0])
        captcha += char_set[char_idx]
    
    return captcha

关键点：

用函数式API创建多输出模型，每个字符位置一个输出头
输出层用softmax激活，每个字符独立分类
数据准备时要拆分成多个标签数组
预测时分别取每个输出的argmax再组合

预处理很重要，验证码要先二值化、去噪、字符分割（如果位置固定可以不用分割直接用CNN学）。字符集用0-9就是10类，如果包含字母就是36类。

数据量少的话可以数据增强，旋转、缩放、加噪声。用CTC损失可以处理变长验证码，但固定长度的用这个多输出头更简单直接。

简单说就是多输出CNN加仔细的数据预处理。

wuwangju 2楼

如果是用的生成库，基本没有什么意义，因为你的生成库的数据是无限的，只要给时间，你跑到 99.999999 都有可能，不过，这有用吗？不同的网站，不同的字体，不同的大小，颜色都不同，还有躁点啥的，你实际应用之后你就知道了。

nodeper 3楼

当然无法通用，只对 https://github.com/mewebstudio/captcha/ 生成的验证码有用