Python中如何使用正则表达式匹配URL最后的文件名

例如：例如： https://cn.bing.com/az/hprichbg/rb/SphinxObservatory_ZH-CN7733546261_1920x1080.jpg
目标匹
目标匹配：SphinxObservatory_ZH-CN7733546261_1920x1080.jpg

我是这样匹配的：r’/(.*?.jpg)'
结果匹配的是：/cn.bing.com/az/hprichbg/rb/SphinxObservatory_ZH-CN7733546261_1920x1080.jpg
查了一下非贪婪匹配的问题，加不加“？”结果都一样，谁能指导一下
Python中如何使用正则表达式匹配URL最后的文件名

h691938207 1楼

import re

def extract_filename_from_url(url):
    """
    从URL中提取文件名（包含扩展名）
    
    参数:
        url (str): 完整的URL字符串
    
    返回:
        str: 文件名（如果找到），否则返回None
    """
    # 正则表达式模式
    pattern = r'/([^/?#]+)(?=[?#]|$)'
    
    # 查找匹配
    match = re.search(pattern, url)
    
    if match:
        return match.group(1)
    return None

# 测试示例
test_urls = [
    "https://example.com/path/to/file.txt",
    "https://example.com/image.jpg?width=200&height=300",
    "https://example.com/document.pdf#page=2",
    "https://example.com/",
    "https://example.com/path/to/file.tar.gz",
    "https://example.com/file_with_underscores.py",
    "https://example.com/file-with-dashes.html",
    "https://example.com/file123_v2.1.0.zip"
]

print("URL文件名提取测试：")
for url in test_urls:
    filename = extract_filename_from_url(url)
    print(f"URL: {url}")
    print(f"文件名: {filename}")
    print("-" * 50)

代码解释：

正则表达式模式 r'/([^/?#]+)(?=[?#]|$)'：
- /：匹配斜杠（路径分隔符）
- ([^/?#]+)：捕获组，匹配一个或多个非/、?、#的字符
- (?=[?#]|$)：正向预查，确保后面是?、#或字符串结尾
关键点：
- 正确处理查询参数（?之后的内容）
- 正确处理片段标识符（#之后的内容）
- 支持各种文件名格式（包含点、下划线、连字符等）
使用示例：

# 基本使用
url = "https://example.com/docs/report.pdf"
filename = extract_filename_from_url(url)  # 返回 "report.pdf"

# 带查询参数的URL
url2 = "https://example.com/image.png?size=large"
filename2 = extract_filename_from_url(url2)  # 返回 "image.png"

一句话总结：用/([^/?#]+)(?=[?#]|$)这个正则就能准确提取URL末尾的文件名。