Python中如何读取文本文件中的一段内容？

Python 新手

有个文本文件，格式大概是这样的

一些内容....
###START RECORD
一些内容....
###END
一些内容...

我想读取从 ###START RECORD 开始到 ###END 这段的文本

请问有什么比较好的方法吗？

Python中如何读取文本文件中的一段内容？

用 find 找到两个 pattern 然后 string[find1+len (pattern1):find2]

zlyuanteng 2楼

def read_file_section(filename, start_line=1, end_line=None):
    """
    读取文本文件的指定行范围
    
    Args:
        filename: 文件名
        start_line: 起始行号（从1开始）
        end_line: 结束行号（包含），None表示读到文件末尾
    
    Returns:
        list: 包含指定行内容的列表
    """
    result = []
    
    try:
        with open(filename, 'r', encoding='utf-8') as file:
            for current_line_num, line in enumerate(file, start=1):
                # 如果当前行号在指定范围内
                if current_line_num >= start_line:
                    if end_line is None or current_line_num <= end_line:
                        result.append(line.rstrip('\n'))
                    else:
                        break  # 已超过结束行，停止读取
                
                # 如果还没到起始行，继续读取下一行
                # 这里不需要额外处理，循环会自动继续
    except FileNotFoundError:
        print(f"错误：文件 '{filename}' 不存在")
        return []
    except Exception as e:
        print(f"读取文件时出错：{e}")
        return []
    
    return result

# 使用示例
if __name__ == "__main__":
    # 示例1：读取第3到第7行
    lines = read_file_section("example.txt", start_line=3, end_line=7)
    print("第3-7行内容：")
    for line in lines:
        print(line)
    
    # 示例2：从第5行读到文件末尾
    lines = read_file_section("example.txt", start_line=5)
    print("\n从第5行到末尾：")
    for line in lines:
        print(line)
    
    # 示例3：读取整个文件
    lines = read_file_section("example.txt")
    print("\n整个文件：")
    for line in lines:
        print(line)

这个函数的核心思路是逐行读取文件，通过行号计数器判断当前行是否在指定范围内。enumerate(file, start=1) 让行号从1开始计数，更符合人类的阅读习惯。使用 with open() 确保文件正确关闭，rstrip('\n') 去掉行尾换行符。

如果你需要读取特定标记之间的内容（比如两个特殊字符串之间的段落），可以这样修改：

def read_between_markers(filename, start_marker, end_marker):
    """读取两个标记之间的内容"""
    result = []
    in_section = False
    
    with open(filename, 'r', encoding='utf-8') as file:
        for line in file:
            line = line.rstrip('\n')
            
            if line == start_marker:
                in_section = True
                continue
            elif line == end_marker:
                break
            
            if in_section:
                result.append(line)
    
    return result

用行号定位简单直接，用标记定位更灵活。

总结：根据需求选择行号或标记定位。

h691938207 3楼

应该不能，操作系统读取文件的系统调用不能根据文件内容来判断吧，只能读到内存中再做处理了

一行行读了判断呗………

bupafengyu 5楼

呐把每一行先读出来就像 3 楼说的存到列表然后开始找开始行和结束行的第一个索引找到之后把内容存到另一个列表并删除旧列表里面响应内容然后继续循环

正则表达式

h691938207 7楼

感觉效率比较高的方法是一行一行读，然后根据格式判断。
正则表达式就有点杀鸡用牛刀的感觉了。

yuanlaile 8楼作者

正则表达式效率会比较低吗?

按行读入内存，读到 start 后每行加入列表，再读到 end 就 break

songsunli 10楼

用正则文本多了会很低。。
要讲效率，应该一行一行处理是比较高的

zlyuanteng 11楼

def read_part(filename, start=’###START RECORD’, end=’###END’):
----content = []
----recording = False

----with open(filename) as f:
--------for line in f:
------------line = line.strip()

------------if line == end:
----------------break

------------if recording:
----------------content.append(line)

------------if line == start:
----------------recording = True
----return ‘\n’.join(content)

phonegap100 12楼

查一下是第几行然后 seed

zlyuanteng 13楼

多谢，我明白了

bupafengyu 14楼

mmap

回到顶部