HarmonyOS鸿蒙Next中图文识别，能否按照图片里的文字排版输出

HarmonyOS鸿蒙Next中图文识别，能否按照图片里的文字排版输出【问题描述】：想要提取文字后，按照图片里的文字排版输出。如何实现

【问题现象】：就比如下图这种文字识别后,能否保留图片的排版

【版本信息】：不涉及

【复现代码】：文字识别在附件

更多关于HarmonyOS鸿蒙Next中图文识别，能否按照图片里的文字排版输出的实战教程也可以访问 https://www.itying.com/category-93-b0.html

itying888 1楼

开发者您好：

【解决方案】

请问您的具体使用场景是什么呢？是否使用的是textRecognition.recognizeText，如果是的话，该接口返回了blocks: Array<TextBlock>，里面包含了识别块的坐标和具体内容，可以参考类似代码实现，如果不满足的烦请反馈。参考代码如下，遍历返回结果，定义字符，通过计算定位来动态拼接最终结果：

import { textRecognition } from '@kit.CoreVisionKit';
import { image } from '@kit.ImageKit';
import { hilog } from '@kit.PerformanceAnalysisKit';
import { BusinessError } from '@kit.BasicServicesKit';
import { fileIo } from '@kit.CoreFileKit';
import { photoAccessHelper } from '@kit.MediaLibraryKit';

@Entry
@Component
struct Index {
  private imageSource: image.ImageSource | undefined = undefined;
  @State chooseImage: PixelMap | undefined = undefined;
  @State dataValues: string = '';

  async aboutToAppear(): Promise<void> {
    const initResult = await textRecognition.init();
    hilog.info(0x0000, 'OCRDemo', `OCR service initialization result:${initResult}`);
  }

  async aboutToDisappear(): Promise<void> {
    await textRecognition.release();
    hilog.info(0x0000, 'OCRDemo', 'OCR service released successfully');
  }

  build() {
    Column() {
      Image(this.chooseImage)
        .objectFit(ImageFit.Fill)
        .height('60%');

      Scroll() {
        Text(this.dataValues)
          .copyOption(CopyOptions.LocalDevice)
          .margin(10);
      }
      .height(200);

      Button('选择图片')
        .type(ButtonType.Capsule)
        .fontColor(Color.White)
        .alignSelf(ItemAlign.Center)
        .width('80%')
        .margin(10)
        .onClick(() => {
          // 拉起图库，获取图片资源
          void this.selectImage();
        });

      Button('开始识别')
        .type(ButtonType.Capsule)
        .fontColor(Color.White)
        .alignSelf(ItemAlign.Center)
        .width('80%')
        .margin(10)
        .onClick(() => {
          this.textRecognitionTest();
        });
    }
    .width('100%')
    .height('100%')
    .justifyContent(FlexAlign.Center);
  }

  private textRecognitionTest() {
    if (!this.chooseImage) {
      return;
    }
    // 调用文本识别接口
    let visionInfo: textRecognition.VisionInfo = {
      pixelMap: this.chooseImage
    };
    let textConfiguration: textRecognition.TextRecognitionConfiguration = {
      isDirectionDetectionSupported: false
    };
    textRecognition.recognizeText(visionInfo, textConfiguration)
      .then((data: textRecognition.TextRecognitionResult) => {
        // 识别成功，获取对应的结果
        let recognitionString = JSON.stringify(data);
        hilog.info(0x0000, 'OCRDemo', `Succeeded in recognizing text: ${recognitionString}`);
        let finalString = '';

        data.blocks.forEach((block, index) => {
          const item = block.lines[0];
          const label = item.value;
          const points = item.cornerPoints;

          if (index == 0) {
            // 计算距离左边多少宽度
            const left = points[0].x;
            const spaceCount = Math.round(left / 30);
            finalString += ' '.repeat(spaceCount);
            finalString += label;
          } else {
            // 判断是否在一行
            const current = points;
            const prev = data.blocks[index-1].lines[0].cornerPoints;
            if ((current[0].y - prev[3].y) > 30) {
              // 不在一行加换行符
              const left = points[0].x;
              const spaceCount = Math.round(left / 30);
              finalString += '\n';
              finalString += ' '.repeat(spaceCount);
              finalString += label;
            } else {
              const left = Math.abs(current[0].x - prev[0].x);
              const spaceCount = Math.round(left / 30);
              finalString += ' '.repeat(spaceCount);
              finalString += label;
            }
          }
        });

        this.dataValues = finalString;
      })
      .catch((error: BusinessError) => {
        hilog.error(0x0000, 'OCRDemo', `Failed to recognize text. Code: ${error.code}, message: ${error.message}`);
        this.dataValues = `Error: ${error.message}`;
      });
  }

  private async selectImage() {
    let uri = await this.openPhoto();
    if (uri === undefined) {
      hilog.error(0x0000, 'OCRDemo', 'Failed to get uri.');
      return;
    }
    this.loadImage(uri);
  }

  private async openPhoto(): Promise<string> {
    return new Promise<string>((resolve) => {
      let photoPicker: photoAccessHelper.PhotoViewPicker = new photoAccessHelper.PhotoViewPicker();
      photoPicker.select({
        MIMEType: photoAccessHelper.PhotoViewMIMETypes.IMAGE_TYPE,
        maxSelectNumber: 1
      }).then((res: photoAccessHelper.PhotoSelectResult) => {
        resolve(res.photoUris[0]);
      }).catch((err: BusinessError) => {
        hilog.error(0x0000, 'OCRDemo', `Failed to get photo image uri. code: ${err.code}, message: ${err.message}`);
        resolve('');
      });
    });
  }

  private loadImage(name: string) {
    setTimeout(async () => {
      try {
        let fileSource = await fileIo.open(name, fileIo.OpenMode.READ_ONLY);
        this.imageSource = image.createImageSource(fileSource.fd);
        this.chooseImage = await this.imageSource.createPixelMap();
        await fileIo.close(fileSource);
      } catch (error) {
        hilog.error(0x0000, 'OCRDemo', `Failed to open file. Error: ${error}`);
      }
    }, 100);
  }
}

【背景知识】

TextRecognitionResult：textRecognition（文字识别）-ArkTS API-Core Vision Kit（基础视觉服务）-AI - 华为HarmonyOS开发者。

更多关于HarmonyOS鸿蒙Next中图文识别，能否按照图片里的文字排版输出的实战系列教程也可以访问 https://www.itying.com/category-93-b0.html

h691938207 2楼

HarmonyOS Next的图文识别能力基于AI模型实现文字提取。识别结果通常以纯文本形式输出，不保留原始图片中的排版格式（如字体、颜色、布局）。系统提供文字内容识别，但不会还原原始版式。

wuwangju 3楼

在HarmonyOS Next中，通过@ohos.ai.textRecognition（OCR）能力提取图片文字时，默认输出的是纯文本内容，不包含原始图片的排版信息（如文字位置、行间距、字体样式等）。目前，该接口返回的TextRecognitionResult对象主要包含识别出的文本字符串及其对应的置信度，并未提供每个文字块或行的坐标、布局等结构化排版数据。

因此，直接使用当前的OCR接口无法实现“按照图片里的文字排版输出”。输出的结果是连续的文本字符串，原有的段落、分行、缩进等格式会丢失。

如果您需要保留排版，目前可以考虑以下技术思路：

获取文字位置信息：虽然标准OCR接口不返回排版数据，但您可以尝试通过图像处理或自定义模型分析，获取文字区域的位置坐标（如每个字或每行文字的边界框）。但这需要较高的自定义开发工作量。
结合布局分析：在OCR识别后，通过额外算法对文字位置信息进行后处理，推断出原始排版结构（如标题、段落、列表等），并转换为带格式的文本（如HTML或富文本）。这通常需要自行实现或集成第三方布局分析引擎。
使用系统能力组合：如果场景涉及文档或印刷品，可评估是否结合@ohos.file.picker（文件选择）和文档解析能力，但系统目前未直接提供图文混排的识别输出。

总结：HarmonyOS Next现有的文字识别功能侧重于文本内容提取，排版还原需依赖上层应用自行通过坐标处理、布局重建等技术实现。