Python中如何解码一个未知编码的字符串？

贴个二进制编码的看看

import chardet
from typing import Optional

def decode_unknown_string(byte_data: bytes) -> Optional[str]:
    """
    解码未知编码的字节数据
    
    参数:
        byte_data: 原始字节数据
        
    返回:
        解码后的字符串，如果解码失败则返回None
    """
    if not isinstance(byte_data, bytes):
        raise TypeError("输入必须是bytes类型")
    
    # 检测编码
    result = chardet.detect(byte_data)
    encoding = result['encoding']
    confidence = result['confidence']
    
    if encoding is None:
        print("无法检测到编码")
        return None
    
    print(f"检测到编码: {encoding} (置信度: {confidence:.2%})")
    
    try:
        # 尝试用检测到的编码解码
        decoded_str = byte_data.decode(encoding)
        return decoded_str
    except UnicodeDecodeError:
        print(f"用 {encoding} 解码失败，尝试常见编码...")
        
        # 备选编码列表
        common_encodings = ['utf-8', 'gbk', 'gb2312', 'gb18030', 
                           'big5', 'shift_jis', 'euc-jp', 'iso-8859-1']
        
        for enc in common_encodings:
            try:
                decoded_str = byte_data.decode(enc)
                print(f"使用备选编码 {enc} 解码成功")
                return decoded_str
            except UnicodeDecodeError:
                continue
        
        print("所有尝试的编码都解码失败")
        return None

# 使用示例
if __name__ == "__main__":
    # 示例1: UTF-8编码的字符串
    utf8_bytes = "你好，世界！".encode('utf-8')
    result1 = decode_unknown_string(utf8_bytes)
    print(f"解码结果1: {result1}\n")
    
    # 示例2: GBK编码的字符串
    gbk_bytes = "测试文本".encode('gbk')
    result2 = decode_unknown_string(gbk_bytes)
    print(f"解码结果2: {result2}\n")
    
    # 示例3: 混合编码的情况
    mixed_bytes = b'\xc4\xe3\xba\xc3'  # GBK编码的"你好"
    result3 = decode_unknown_string(mixed_bytes)
    print(f"解码结果3: {result3}")

核心思路：

先用chardet库自动检测编码（它通过统计分析字节模式来猜测编码）
如果检测失败或解码出错，就遍历常见编码列表手动尝试
优先使用检测结果，备选方案覆盖中文、日文、西欧等常见编码

注意点：

输入必须是bytes类型，字符串需要先编码
chardet不是100%准确，置信度低时要谨慎
备选编码列表可根据实际场景调整

一句话建议： 先用chardet检测，失败再遍历常见编码尝试。

eggper 3楼

知道中文就简单啦，遍历所有编码即可
https://imgur.com/YCPKD04

eggper 4楼

感谢各位回复，发帖之后又查了一下终于发现就是 cp437 编码😂，二楼厉害，我回去看看这个库

nodeper 5楼

顶！

yibo5220 6楼

FYI，[Standard Encodings]( https://docs.python.org/3.6/library/codecs.html#standard-encodings) 列出了所有的 codecs，比所使用的 encodings.aliases.aliases 多了几个没有别名的，而且还有对应的常用的语言。

![]( )

eggper 7楼

大佬，发现一个用你的方法也编不出来的字符串：08.��ű�

htzhanglong 8楼

因为你发的这个已经软件 /浏览器不识别而替换掉了，参见 wikipedia
https://en.wikipedia.org/wiki/Specials_(Unicode_block)
� REPLACEMENT CHARACTER used to replace an unknown, unrecognized or unrepresentable character