Python下载文件时如何获取文件名？

看一篇帖子
某 HR 业务网站逻辑漏洞挖掘案例以及 POC 编写思路分享 - FreeBuf 互联网安全新媒体平台
在页面上搜索 [打开 IDM ] ，跳到指定位置。

刚好没碰到过这个情况，以前都是下载后根据 id 啥的自定义文件名的。
所以想问问

比如链接是：
http://www.123.com/file?id=1234

如果用下载软件会反映出文件名
如果用 request 的话，这种情况是会跳转还是怎么样的？
直接用 python 文件名怎么获取啊？
有测试页面（文件）吗？

Python下载文件时如何获取文件名？

eggper 1楼

header 里面有一个头说明名字，不记得是什么名字了

eggper 2楼

情况一：如果你知道下载链接且服务器在响应头中提供了文件名：

import requests
import re
from urllib.parse import unquote, urlparse

def get_filename_from_url_and_headers(url, headers=None):
    """
    综合从URL和响应头中提取文件名
    """
    # 1. 先尝试从Content-Disposition头获取（最准确）
    response = requests.get(url, headers=headers, stream=True)
    content_disposition = response.headers.get('Content-Disposition')
    
    if content_disposition:
        # 解析形如 "attachment; filename=\"example.zip\"" 的头信息
        if 'filename=' in content_disposition:
            filename = content_disposition.split('filename=')[1]
            # 处理引号和编码
            filename = filename.strip('"\'')
            filename = unquote(filename)  # URL解码
            return filename
    
    # 2. 如果头信息没有，从URL路径中提取
    parsed_url = urlparse(url)
    path = parsed_url.path
    if path:
        # 获取路径的最后一部分
        filename = path.split('/')[-1]
        if filename:
            filename = unquote(filename)  # URL解码
            # 简单清理查询参数（如果有）
            filename = filename.split('?')[0]
            return filename
    
    # 3. 如果都没有，返回默认名或None
    return None

# 使用示例
url = "https://example.com/files/document.pdf"
filename = get_filename_from_url_and_headers(url)
print(f"获取到的文件名: {filename}")

情况二：如果你已经下载了文件但不知道原始文件名：

import mimetypes
import os

def guess_filename_from_content(file_path):
    """
    根据文件内容猜测可能的文件名（不准确，最后手段）
    """
    # 获取文件扩展名
    mime_type, _ = mimetypes.guess_type(file_path)
    if mime_type:
        extension = mimetypes.guess_extension(mime_type)
        return f"downloaded_file{extension}"
    return "downloaded_file"

# 使用示例
file_path = "/path/to/your/downloaded/file"
if not os.path.exists(file_path):
    print("文件不存在")
else:
    guessed_name = guess_filename_from_content(file_path)
    print(f"猜测的文件名: {guessed_name}")

关键点总结：

优先检查Content-Disposition响应头，这是服务器指定下载文件名的最标准方式
备选方案是从URL路径解析，但可能不准确（特别是经过重定向或URL重写时）
最差情况：如果都没有，只能自己指定或根据内容猜测

实际使用建议：

# 完整的下载示例
def download_file_with_proper_name(url, save_dir='.'):
    import os
    
    # 获取响应和文件名
    response = requests.get(url, stream=True)
    filename = get_filename_from_url_and_headers(url)
    
    # 如果没获取到文件名，使用默认名
    if not filename:
        filename = "downloaded_file"
    
    # 确保保存目录存在
    os.makedirs(save_dir, exist_ok=True)
    
    # 保存文件
    save_path = os.path.join(save_dir, filename)
    with open(save_path, 'wb') as f:
        for chunk in response.iter_content(chunk_size=8192):
            f.write(chunk)
    
    return save_path

# 使用
downloaded_file = download_file_with_proper_name("https://example.com/file.zip")
print(f"文件已保存到: {downloaded_file}")

一句话总结：先查Content-Disposition头，没有就从URL路径提取。

gougou168 3楼

原来如此，感谢感谢。