Rust文件格式解析库file-format的使用,支持多种文件类型的高效识别与数据提取

Rust文件格式解析库file-format的使用,支持多种文件类型的高效识别与数据提取

用于确定给定文件或流的文件格式的crate。

它提供了多种功能来识别各种文件格式,包括ZIP、复合文件二进制(CFB)、可扩展标记语言(XML)等。

它检查文件的签名以确定其格式,并在可用时智能地使用特定阅读器进行准确识别。如果签名未被识别,crate会回退到默认文件格式,即任意二进制数据(BIN)。

示例

从文件确定:

use file_format::{FileFormat, Kind};

let fmt = FileFormat::from_file("fixtures/document/sample.pdf")?;
assert_eq!(fmt, FileFormat::PortableDocumentFormat);
assert_eq!(fmt.name(), "Portable Document Format");
assert_eq!(fmt.short_name(), Some("PDF"));
assert_eq!(fmt.media_type(), "application/pdf");
assert_eq!(fmt.extension(), "pdf");
assert_eq!(fmt.kind(), Kind::Document);

从字节确定:

use file_format::{FileFormat, Kind};

let fmt = FileFormat::from_bytes(&[0xFF, 0xD8, 0xFF]);
assert_eq!(fmt, FileFormat::JointPhotographicExpertsGroup);
assert_eq!(fmt.name(), "Joint Photographic Experts Group");
assert_eq!(fmt.short_name(), Some("JPEG"));
assert_eq!(fmt.media_type(), "image/jpeg");
assert_eq!(fmt.extension(), "jpg");
assert_eq!(fmt.kind(), Kind::Image);

完整示例代码

// 添加依赖到Cargo.toml
// [dependencies]
// file-format = "0.28"

use file_format::{FileFormat, Kind};
use std::error::Error;

fn main() -> Result<(), Box<dyn Error>> {
    // 示例1:从文件识别格式
    let file_path = "sample.pdf";
    let fmt = FileFormat::from_file(file_path)?;
    
    println!("文件: {}", file_path);
    println!("格式名称: {}", fmt.name());
    println!("短名称: {:?}", fmt.short_name());
    println!("媒体类型: {}", fmt.media_type());
    println!("扩展名: {}", fmt.extension());
    println!("类型: {:?}", fmt.kind());
    
    // 示例2:从字节识别格式
    let jpeg_bytes = [0xFF, 0xD8, 0xFF, 0xE0, 0x00, 0x10, 0x4A, 0x46, 0x49, 0x46];
    let fmt_bytes = FileFormat::from_bytes(&jpeg_bytes);
    
    println!("\n从字节识别:");
    println!("格式名称: {}", fmt_bytes.name());
    println!("短名称: {:?}", fmt_bytes.short_name());
    println!("媒体类型: {}", fmt_bytes.media_type());
    println!("扩展名: {}", fmt_bytes.extension());
    println!("类型: {:?}", fmt_bytes.kind());
    
    // 示例3:检查特定文件类型
    if fmt == FileFormat::PortableDocumentFormat {
        println!("\n这是一个PDF文件!");
    }
    
    // 示例4:根据文件类型分类处理
    match fmt.kind() {
        Kind::Document => println!("\n这是一个文档文件"),
        Kind::Image => println!("\n这是一个图像文件"),
        Kind::Audio => println!("\n这是一个音频文件"),
        Kind::Video => println!("\n这是一个视频文件"),
        Kind::Archive => println!("\n这是一个归档文件"),
        Kind::Executable => println!("\n这是一个可执行文件"),
        _ => println!("\n这是其他类型的文件"),
    }
    
    Ok(())
}

使用

添加到您的Cargo.toml

[dependencies]
file-format = "0.28"

Crate功能

以下所有功能默认禁用。

阅读器功能

这些功能启用需要特定阅读器进行识别的文件格式检测。

  • reader - 启用所有阅读器功能。
  • reader-asf - 启用基于高级系统格式(ASF)的文件格式检测。
  • reader-cfb - 启用基于复合文件二进制(CFB)的文件格式检测。
  • reader-ebml - 启用基于可扩展二进制元语言(EBML)的文件格式检测。
  • reader-exe - 启用基于MS-DOS可执行文件(EXE)的文件格式检测。
  • reader-id3v2 - 启用基于ID3v2(ID3)的文件格式检测。
  • reader-mp4 - 启用基于MPEG-4第14部分(MP4)的文件格式检测。
  • reader-pdf - 启用基于便携式文档格式(PDF)的文件格式检测。
  • reader-rm - 启用基于RealMedia(RM)的文件格式检测。
  • reader-sqlite3 - 启用基于SQLite 3的文件格式检测。
  • reader-txt - 启用纯文本(TXT)文件格式检测。
  • reader-xml - 启用基于可扩展标记语言(XML)的文件格式检测。
  • reader-zip - 启用基于ZIP的文件格式检测。

支持的文件格式

归档

  • 7-Zip (7Z)
  • ACE
  • ALZ
  • Archived by Robert Jung (ARJ)
  • Cabinet (CAB)
  • Extensible Archive (XAR)
  • LArc (LZS)
  • LHA
  • Mozilla Archive (MAR)
  • Multi Layer Archive (MLA)
  • PMarc (PMA)
  • Roshal Archive (RAR)
  • SeqBox (SBX)
  • Squashfs
  • StuffIt (SIT)
  • StuffIt X (SITX)
  • Tape Archive (TAR)
  • UNIX archiver (archiver)
  • Windows Imaging Format (WIM)
  • ZIP
  • ZPAQ
  • cpio
  • zoo

音频

  • 8-Bit Sampled Voice (8SVX)
  • Adaptive Multi-Rate (AMR)
  • Advanced Audio Coding (AAC)
  • Apple iTunes Audio (M4A)
  • Apple iTunes Audiobook (M4B)
  • Apple iTunes Protected Audio (M4P)
  • Au
  • Audio Codec 3 (AC-3)
  • Audio Interchange File Format (AIFF)
  • Audio Visual Research (AVR)
  • Creative Voice (VOC)
  • FastTracker 2 Extended Module (XM)
  • Flash MP4 Audio (F4A)
  • Flash MP4 Audiobook (F4B)
  • Free Lossless Audio Codec (FLAC)
  • Impulse Tracker Module (IT)
  • MPEG-1/2 Audio Layer 2 (MP2)
  • MPEG-1/2 Audio Layer 3 (MP3)
  • MPEG-4 Part 14 Audio (MP4)
  • Matroska Audio (MKA)
  • Monkey’s Audio (APE)
  • Musepack (MPC)
  • Musical Instrument Digital Interface (MIDI)
  • Ogg FLAC (OGA)
  • Ogg Opus (Opus)
  • Ogg Speex (Speex)
  • Ogg Vorbis (Vorbis)
  • Qualcomm PureVoice (QCP)
  • Quite OK Audio (QOA)
  • RealAudio (RA)
  • Scream Tracker 3 Module (S3M)
  • Sony DSD Stream File (DSF)
  • SoundFont 2 (SF2)
  • Ultimate Soundtracker Module (MOD)
  • WavPack (WV)
  • Waveform Audio (WAV)
  • Windows Media Audio (WMA)

压缩

  • BZip3 (BZ3)
  • LZ4
  • Lempel-Ziv Finite State Entropy (LZFSE)
  • Lempel-Ziv-Markov chain algorithm (LZMA)
  • Long Range ZIP (LRZIP)
  • Snappy
  • UNIX compress (compress)
  • XZ
  • Zstandard (zstd)
  • bzip (BZ)
  • bzip2 (BZ2)
  • gzip (GZ)
  • lzip (LZ)
  • lzop (LZO)
  • rzip (RZ)

数据库

  • Microsoft Access 2007 Database (ACCDB)
  • Microsoft Access Database (MDB)
  • Microsoft Works Database (WDB)
  • OpenDocument Database (ODB)
  • SQLite 3

图表

  • Circuit Diagram Document (CDDX)
  • Microsoft Visio Drawing (VSD)
  • Office Open XML Drawing (VSDX)
  • StarChart (SDS)
  • draw.io (DRAWIO)

磁盘

  • Amiga Disk File (ADF)
  • Apple Disk Image (DMG)
  • ISO 9660 (ISO)
  • Microsoft Virtual Hard Disk (VHD)
  • Microsoft Virtual Hard Disk 2 (VHDX)
  • QEMU Copy On Write (QCOW)
  • Virtual Machine Disk (VMDK)
  • VirtualBox Virtual Disk Image (VDI)

文档

  • AbiWord (ABW)
  • AbiWord Template (AWT)
  • Adobe InDesign Document (INDD)
  • DjVu
  • InDesign Markup Language (IDML)
  • LaTeX (TeX)
  • Microsoft Publisher Document (PUB)
  • Microsoft Word Document (DOC)
  • Microsoft Works Word Processor (WPS)
  • Microsoft Write (WRI)
  • Office Open XML Document (DOCX)
  • OpenDocument Text (ODT)
  • OpenDocument Text Master (ODM)
  • OpenDocument Text Master Template (OTM)
  • OpenDocument Text Template (OTT)
  • OpenXPS (OXPS)
  • Portable Document Format (PDF)
  • PostScript (PS)
  • Rich Text Format (RTF)
  • StarWriter (SDW)
  • Sun XML Writer (SXW)
  • Sun XML Writer Global (SGW)
  • Sun XML Writer Template (STW)
  • Uniform Office Format Text (UOT)
  • WordPerfect Document (WPD)

电子书

  • Broad Band eBook (BBeB)
  • Electronic Publication (EPUB)
  • FictionBook (FB2)
  • FictionBook ZIP (FBZ)
  • Microsoft Reader (LIT)
  • Mobipocket (MOBI)

可执行文件

  • Commodore 64 Program (PRG)
  • Common Object File Format (COFF)
  • Dalvik Executable (DEX)
  • Dynamic Link Library (DLL)
  • Executable and Linkable Format (ELF)
  • Java Class
  • LLVM Bitcode (BC)
  • Linear Executable (LE)
  • Lua Bytecode
  • MS-DOS Executable (EXE)
  • Mach-O
  • New Executable (NE)
  • Nintendo Switch Executable (NSO)
  • Optimized Dalvik Executable (DEY)
  • Portable Executable (PE)
  • WebAssembly Binary (Wasm)
  • Xbox 360 Executable (XEX)
  • Xbox Executable (XBE)

字体

  • BMFont ASCII (FNT)
  • BMFont Binary (FNT)
  • Embedded OpenType (EOT)
  • FIGlet Font (FLF)
  • Glyphs
  • OpenType (OTF)
  • TrueType (TTF)
  • TrueType Collection (TTC)
  • Web Open Font Format (WOFF)
  • Web Open Font Format 2 (WOFF2)

公式

  • Mathematical Markup Language (MathML)
  • OpenDocument Formula (ODF)
  • OpenDocument Formula Template (OTF)
  • StarMath (SMF)
  • Sun XML Math (SXM)

地理空间

  • Flexible and Interoperable Data Transfer (FIT)
  • GPS Exchange Format (GPX)
  • Geography Markup Language (GML)
  • Keyhole Markup Language (KML)
  • Keyhole Markup Language ZIP (KMZ)
  • Shapefile (SHP)
  • Training Center XML (TCX)

图像

  • AV1 Image File Format (AVIF)
  • AV1 Image File Format Sequence (AVIFS)
  • Adaptable Scalable Texture Compression (ASTC)
  • Adobe Illustrator Artwork (AI)
  • Adobe Photoshop Document (PSD)
  • Animated Portable Network Graphics (APNG)
  • Apple Icon Image (ICNS)
  • Better Portable Graphics (BPG)
  • Canon Raw (CRW)
  • Canon Raw 2 (CR2)
  • Canon Raw 3 (CR3)
  • Cineon (CIN)
  • Digital Picture Exchange (DPX)
  • Encapsulated PostScript (EPS)
  • Enhanced Metafile (EMF)
  • Experimental Computing Facility (XCF)
  • Figma Design (FIG)
  • Free Lossless Image Format (FLIF)
  • Fujifilm Raw (RAF)
  • Graphics Interchange Format (GIF)
  • High Efficiency Image Coding (HEIC)
  • High Efficiency Image Coding Sequence (HEICS)
  • High Efficiency Image File Format (HEIF)
  • High Efficiency Image File Format Sequence (HEIFS)
  • JPEG 2000 Codestream (J2C)
  • JPEG 2000 Part 1 (JP2)
  • JPEG 2000 Part 2 (JPX)
  • JPEG 2000 Part 6 (JPM)
  • JPEG Extended Range (JXR)
  • JPEG Network Graphics (JNG)
  • JPEG XL (JXL)
  • JPEG-LS (JLS)
  • Joint Photographic Experts Group (JPEG)
  • Khronos Texture (KTX)
  • Khronos Texture 2 (KTX2)
  • Magick Image File Format (MIFF)
  • Microsoft DirectDraw Surface (DDS)
  • Multiple-image Network Graphics (MNG)
  • Nikon Electronic File (NEF)
  • Olympus Raw Format (ORF)
  • OpenDocument Graphics (ODG)
  • OpenDocument Graphics Template (OTG)
  • OpenEXR (EXR)
  • OpenRaster (ORA)
  • Panasonic Raw (RW2)
  • Picture Exchange (PCX)
  • Portable Arbitrary Map (PAM)
  • Portable BitMap (PBM)
  • Portable FloatMap (PFM)
  • Portable GrayMap (PGM)
  • Portable Network Graphics (PNG)
  • Portable PixMap (PPM)
  • Quite OK Image (QOI)
  • Radiance HDR (HDR)
  • Scalable Vector Graphics (SVG)
  • Silicon Graphics Image (SGI)
  • Sketch
  • Sketch 43
  • StarDraw (SDA)
  • Sun XML Draw (SXD)
  • Sun XML Draw Template (STD)
  • Tag Image File Format (TIFF)
  • WebP
  • Windows Animated Cursor (ANI)
  • Windows Bitmap (BMP)
  • Windows Cursor (CUR)
  • Windows Icon (ICO)
  • Windows Metafile (WMF)
  • WordPerfect Graphics (WPG)
  • X PixMap (XPM)
  • farbfeld (FF)

元数据

  • Android Binary XML (AXML)
  • BitTorrent (Torrent)
  • CD Audio (CDA)
  • ID3v2 (ID3)
  • Meta Information Encapsulation (MIE)
  • TASTy
  • Windows Shortcut (LNK)
  • macOS Alias

模型

  • 3D Manufacturing Format (3MF)
  • 3D Studio (3DS)
  • 3D Studio Max (MAX)
  • Additive Manufacturing Format (AMF)
  • AutoCAD Drawing (DWG)
  • Autodesk 123D (123DX)
  • Autodesk Alias (WIRE)
  • Autodesk Inventor Assembly (IAM)
  • Autodesk Inventor Drawing (IDW)
  • Autodesk Inventor Part (IPT)
  • Autodesk Inventor Presentation (IPN)
  • Blender (BLEND)
  • Cinema 4D (C4D)
  • Collaborative Design Activity (COLLADA)
  • Design Web Format (DWF)
  • Design Web Format XPS (DWFX)
  • Drawing Exchange Format ASCII (DXF)
  • Drawing Exchange Format Binary (DXF)
  • Extensible 3D (X3D)
  • Filmbox (FBX)
  • Fusion 360 (F3D)
  • GL Transmission Format Binary (GLB)
  • Google Draco (Draco)
  • Initial Graphics Exchange Specification (IGES)
  • Inter-Quake Export (IQE)
  • Inter-Quake Model (IQM)
  • MagicaVoxel (VOX)
  • Maya ASCII (MA)
  • Maya Binary (MB)
  • Model 3D ASCII (A3D)
  • Model 3D Binary (M3D)
  • Polygon ASCII (PLY)
  • Polygon Binary (PLY)
  • SketchUp (SKP)
  • SolidWorks Assembly (SLDASM)
  • SolidWorks Drawing (SLDDRW)
  • SolidWorks Part (SLDPRT)
  • SpaceClaim Document (SCDOC)
  • Standard for the Exchange of Product model data (STEP)
  • Stereolithography ASCII (STL)
  • Universal 3D (U3D)
  • Universal Scene Description ASCII (USDA)
  • Universal Scene Description Binary (USDC)
  • Universal Scene Description ZIP (USDZ)
  • Virtual Reality Modeling Language (VRML)
  • openNURBS (3DM)

其他

  • ActiveMime (MSO)
  • Advanced Systems Format (ASF)
  • Android Resource Storage Container (ARSC)
  • Apache Arrow Columnar (Arrow)
  • Apache Avro (Avro)
  • Apache Parquet (Parquet)
  • Arbitrary Binary Data (BIN)
  • Atom
  • Clojure Script
  • Compound File Binary (CFB)
  • DER Certificate (DER)
  • Digital Imaging and Communications in Medicine (DICOM)
  • Empty
  • Extensible Binary Meta Language (EBML)
  • Extensible Markup Language (XML)
  • Extensible Stylesheet Language Transformations (XSLT)
  • Flash CS5 Project (FLA)
  • Flash Project (FLA)
  • Flexible Image Transport System (FITS)
  • HyperText Markup Language (HTML)
  • ICC Profile (ICC)
  • JSON Feed
  • Java KeyStore (JKS)
  • Lua Script
  • MPEG-4 Part 14 (MP4)
  • MS-DOS Batch (Batch)
  • Microsoft Compiled HTML Help (CHM)
  • Microsoft Project Plan (MPP)
  • Microsoft Visual Studio Solution (SLN)
  • MusicXML
  • MusicXML ZIP (MXL)
  • Ogg Multiplexed Media (OGX)
  • PCAP Dump (PCAP)
  • PCAP Next Generation Dump (PCAPNG)
  • PEM Certificate (PEM)
  • PEM Certificate Signing Request (PEM)
  • PEM Private Key

1 回复

Rust文件格式解析库file-format的使用指南

概述

file-format是一个高效的Rust库,专门用于识别和解析多种文件格式。它支持超过300种常见文件类型,包括文档、图像、音频、视频、压缩文件等格式。该库通过分析文件签名和内部结构来实现快速准确的文件类型识别。

主要特性

  • 支持300+文件格式识别
  • 零依赖设计
  • 高效的文件签名匹配算法
  • 提供详细的格式信息提取
  • 支持自定义格式扩展

安装方法

在Cargo.toml中添加依赖:

[dependencies]
file-format = "0.16"

基本使用方法

1. 文件类型识别

use file_format::FileFormat;

fn main() {
    let data = std::fs::read("example.pdf").unwrap();
    let format = FileFormat::from_bytes(&data);
    
    println!("文件格式: {}", format.name());
    println!("媒体类型: {}", format.media_type());
    println!("扩展名: {}", format.extension());
}

2. 批量文件识别

use file_format::FileFormat;
use std::path::Path;

fn identify_files_in_directory(dir: &Path) {
    for entry in std::fs::read_dir(dir).unwrap() {
        let path = entry.unwrap().path();
        if path.is_file() {
            let data = std::fs::read(&path).unwrap();
            let format = FileFormat::from_bytes(&data);
            println!("{}: {}", path.display(), format.name());
        }
    }
}

3. 特定格式检查

use file_format::FileFormat;

fn is_pdf_file(data: &[u8]) -> bool {
    let format = FileFormat::from_bytes(data);
    format == FileFormat::PortableDocumentFormat
}

fn is_image_file(data: &[u8]) -> bool {
    let format = FileFormat::from_bytes(data);
    format.media_type().starts_with("image/")
}

4. 错误处理示例

use file_format::FileFormat;

fn safe_file_identification(path: &str) -> Result<String, Box<dyn std::error::Error>> {
    let data = std::fs::read(path)?;
    let format = FileFormat::from_bytes(&data);
    
    if format == FileFormat::Unknown {
        Err("无法识别的文件格式".into())
    } else {
        Ok(format.name().to_string())
    }
}

高级用法

自定义格式识别

use file_format::{FileFormat, CustomFormat};

fn setup_custom_format() {
    // 创建自定义文件格式
    let custom_format = CustomFormat::new(
        "My Custom Format",
        "application/x-custom",
        "cust",
        &[0x89, 0x43, 0x55, 0x53, 0x54, 0x4F, 0x4D], // 文件签名
    );
    
    // 注册自定义格式
    FileFormat::register_custom_format(custom_format);
}

性能优化示例

use file_format::FileFormat;
use std::io::Read;

fn efficient_identification(path: &str) -> FileFormat {
    // 只读取文件开头部分进行识别
    let mut file = std::fs::File::open(path).unwrap();
    let mut buffer = [0; 512]; // 读取前512字节
    file.read_exact(&mut buffer).unwrap();
    
    FileFormat::from_bytes(&buffer)
}

实际应用场景

文件上传验证

use file_format::FileFormat;

fn validate_uploaded_file(data: &[u8]) -> Result<(), String> {
    let format = FileFormat::from_bytes(data);
    
    match format {
        FileFormat::PortableDocumentFormat
        | FileFormat::Jpeg
        | FileFormat::Png => Ok(()),
        _ => Err("不支持的文件格式".to_string()),
    }
}

文件分类器

use file_format::FileFormat;

struct FileCategorizer;

impl FileCategorizer {
    fn categorize(data: &[u8]) -> &'static str {
        let format = FileFormat::from_bytes(data);
        
        match format.media_type() {
            s if s.starts_with("image/") => "图像文件",
            s if s.starts_with("audio/") => "音频文件",
            s if s.starts_with("video/") => "视频文件",
            s if s.starts_with("text/") => "文本文件",
            _ => "其他文件",
        }
    }
}

完整示例demo

use file_format::{FileFormat, CustomFormat};
use std::path::Path;
use std::io::Read;

fn main() {
    // 示例1: 基本文件识别
    println!("=== 基本文件识别示例 ===");
    match std::fs::read("example.pdf") {
        Ok(data) => {
            let format = FileFormat::from_bytes(&data);
            println!("文件格式: {}", format.name());
            println!("媒体类型: {}", format.media_type());
            println!("扩展名: {}", format.extension());
        }
        Err(e) => println!("读取文件失败: {}", e),
    }

    // 示例2: 批量文件识别
    println!("\n=== 批量文件识别示例 ===");
    let current_dir = Path::new(".");
    identify_files_in_directory(&current_dir);

    // 示例3: 特定格式检查
    println!("\n=== 特定格式检查示例 ===");
    if let Ok(data) = std::fs::read("example.jpg") {
        println!("是PDF文件: {}", is_pdf_file(&data));
        println!("是图像文件: {}", is_image_file(&data));
    }

    // 示例4: 自定义格式
    println!("\n=== 自定义格式示例 ===");
    setup_custom_format();

    // 示例5: 性能优化
    println!("\n=== 性能优化示例 ===");
    if let Ok(_) = std::fs::File::open("example.txt") {
        let format = efficient_identification("example.txt");
        println!("优化识别结果: {}", format.name());
    }

    // 示例6: 文件上传验证
    println!("\n=== 文件上传验证示例 ===");
    if let Ok(data) = std::fs::read("example.png") {
        match validate_uploaded_file(&data) {
            Ok(()) => println!("文件验证通过"),
            Err(e) => println!("文件验证失败: {}", e),
        }
    }

    // 示例7: 文件分类
    println!("\n=== 文件分类示例 ===");
    if let Ok(data) = std::fs::read("example.mp3") {
        let category = FileCategorizer::categorize(&data);
        println!("文件分类: {}", category);
    }
}

// 批量文件识别函数
fn identify_files_in_directory(dir: &Path) {
    if let Ok(entries) = std::fs::read_dir(dir) {
        for entry in entries {
            if let Ok(entry) = entry {
                let path = entry.path();
                if path.is_file() {
                    if let Ok(data) = std::fs::read(&path) {
                        let format = FileFormat::from_bytes(&data);
                        println!("{}: {}", path.display(), format.name());
                    }
                }
            }
        }
    }
}

// PDF文件检查函数
fn is_pdf_file(data: &[u8]) -> bool {
    let format = FileFormat::from_bytes(data);
    format == FileFormat::PortableDocumentFormat
}

// 图像文件检查函数
fn is_image_file(data: &[u8]) -> bool {
    let format = FileFormat::from_bytes(data);
    format.media_type().starts_with("image/")
}

// 自定义格式设置函数
fn setup_custom_format() {
    let custom_format = CustomFormat::new(
        "My Custom Format",
        "application/x-custom",
        "cust",
        &[0x89, 0x43, 0x55, 0x53, 0x54, 0x4F, 0x4D],
    );
    FileFormat::register_custom_format(custom_format);
    println!("自定义格式注册成功");
}

// 高效识别函数
fn efficient_identification(path: &str) -> FileFormat {
    let mut file = match std::fs::File::open(path) {
        Ok(file) => file,
        Err(_) => return FileFormat::Unknown,
    };
    
    let mut buffer = [0; 512];
    if let Ok(_) = file.read_exact(&mut buffer) {
        FileFormat::from_bytes(&buffer)
    } else {
        FileFormat::Unknown
    }
}

// 文件上传验证函数
fn validate_uploaded_file(data: &[u8]) -> Result<(), String> {
    let format = FileFormat::from_bytes(data);
    
    match format {
        FileFormat::PortableDocumentFormat
        | FileFormat::Jpeg
        | FileFormat::Png => Ok(()),
        _ => Err("不支持的文件格式".to_string()),
    }
}

// 文件分类器结构体
struct FileCategorizer;

impl FileCategorizer {
    fn categorize(data: &[u8]) -> &'static str {
        let format = FileFormat::from_bytes(data);
        
        match format.media_type() {
            s if s.starts_with("image/") => "图像文件",
            s if s.starts_with("audio/") => "音频文件",
            s if s.starts_with("video/") => "视频文件",
            s if s.starts_with("text/") => "文本文件",
            _ => "其他文件",
        }
    }
}

注意事项

  1. 对于非常大的文件,建议使用FileFormat::from_reader方法避免内存问题
  2. 某些格式可能有重叠的签名,库会返回最可能的匹配
  3. 自定义格式的优先级高于内置格式
  4. 该库主要用于识别而非深度解析文件内容

这个库为文件处理应用提供了强大的格式识别能力,适合用于文件管理器、安全扫描、数据处理管道等场景。

回到顶部