Python中处理文件内容的正确姿势该怎样？

大神们：

我想把 htm 文件中的第一个<link 到第二个<link 之间的所有内容另存为一个 htm 该怎么写比较简洁。

<meta http-equiv=“X-UA-Compatible” content=“IE=edge”>

<link rel=“prefetch” href=“https://ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js”>

<meta name=“application-name” content=“Python.org”>
<meta name=“msapplication-tooltip” content=“The official home of the Python Programming Language”>
<meta name=“apple-mobile-web-app-title” content=“Python.org”>
<meta name=“apple-mobile-web-app-capable” content=“yes”>
<meta name=“apple-mobile-web-app-status-bar-style” content=“black”>

<meta name=“viewport” content=“width=device-width, initial-scale=1.0”>
<meta name=“HandheldFriendly” content=“True”>
<meta name=“format-detection” content=“telephone=no”>
<meta http-equiv=“cleartype” content=“on”>
<meta http-equiv=“imagetoolbar” content=“false”>

<script type=“text/javascript” async="" src=“https://ssl.google-analytics.com/ga.js”></script><script src="./Welcome to Python.org_files/modernizr.js.下载"></script><style type=“text/css” adt=“123”></style>

<link href="./Welcome to Python.org_files/style.css" rel=“stylesheet” type=“text/css” title=“default”>
<link href="./Welcome to Python.org_files/mq.css" rel=“stylesheet” type=“text/css” media=“not print, braille, embossed, speech, tty”>

提取的内容应该是：

<link rel=“prefetch” href=“https://ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js”>

<meta name=“application-name” content=“Python.org”>
<meta name=“msapplication-tooltip” content=“The official home of the Python Programming Language”>
<meta name=“apple-mobile-web-app-title” content=“Python.org”>
<meta name=“apple-mobile-web-app-capable” content=“yes”>
<meta name=“apple-mobile-web-app-status-bar-style” content=“black”>

<meta name=“viewport” content=“width=device-width, initial-scale=1.0”>
<meta name=“HandheldFriendly” content=“True”>
<meta name=“format-detection” content=“telephone=no”>
<meta http-equiv=“cleartype” content=“on”>
<meta http-equiv=“imagetoolbar” content=“false”>

<script type=“text/javascript” async="" src=“https://ssl.google-analytics.com/ga.js”></script><script src="./Welcome to Python.org_files/modernizr.js.下载"></script><style type=“text/css” adt=“123”></style>

<link
Python中处理文件内容的正确姿势该怎样？

phonegap100 1楼

处理文件内容，核心就两点：用 with 语句保证资源释放，选对读写模式。

基础操作：

# 读整个文件
with open('file.txt', 'r', encoding='utf-8') as f:
    content = f.read()

# 逐行读取（推荐大文件）
with open('file.txt', 'r', encoding='utf-8') as f:
    for line in f:
        process(line)

# 写入文件（会覆盖）
with open('output.txt', 'w', encoding='utf-8') as f:
    f.write('Hello\nWorld')

# 追加内容
with open('output.txt', 'a', encoding='utf-8') as f:
    f.write('\nNew line')

关键细节：

编码用 utf-8 最稳妥
'r' 只读，'w' 覆盖写，'a' 追加，'r+' 读写
大文件用迭代器逐行处理，别一次性读入内存

实用技巧：

# 读取所有行到列表
with open('file.txt', 'r') as f:
    lines = f.readlines()

# 同时读写多个文件
with open('input.txt', 'r') as fin, open('output.txt', 'w') as fout:
    for line in fin:
        fout.write(line.upper())

记住：with open() as f 是标准写法，手动 close() 容易忘。

wuwangju 2楼

xpath or regex

phonegap100 3楼

‘<link’ + html.split(’<link’)[1]
手机打的没测试