Python中使用正则表达式提取不到内容怎么办？

爬取猫眼 100 名电影，结果检测正则表达式提取内容的时候返回的结果为空。

def parse_one_page(html):
pattern = re.compile(
‘<dd>.?board-index.?>(.?).?data-src="(.?)".?name.?a.?>(.?)</a>.?star.?>(.?).?releasetime.?>(.?).?integer.?>(.?).?fraction.?>(.?).?</dd>’,
re.S)
items = re.findall(pattern, html)
print(items)
这是第一个；

import re
def parse_one_page(html):

pattern = re.compile(’<dd>.?board-index.?>(\d+).?data-src="(.?)".?name">’
+ '<a.?>(.?)</a>.?“star”>(.?).?releasetime">(.?)’
+ '.?integer">(.?).?fraction">(.?).?</dd>’, re.S)

items = re.findall(pattern, html)

for item in items:
yield {
‘index’: item[0],
‘image’: item[1],
‘title’: item[2],
‘actor’: item[3].strip()[3:],
‘time’: item[4].strip()[5:],
‘score’: item[5] + item[6]
}
def main():
url = ‘http://maoyan.com/board/4’
html = get_one_page(url)
for item in parse_one_page(html):
print(item)
这是第二种方式。
发现都提取不出来内容，但是如果用完整的代码则在最后运行的时候会正确显示……
Python中使用正则表达式提取不到内容怎么办？

zlyuanteng 1楼

https://regex101.com
推荐你用这个测试正则，右上会有语法解析

wuwangju 2楼

正则表达式提取不到内容，常见原因和排查方法如下：

1. 检查正则表达式模式

import re

text = "价格：¥199.99"
pattern = r"¥(\d+\.?\d*)"  # 匹配¥符号后的数字
match = re.search(pattern, text)
if match:
    print(f"提取到：{match.group(1)}")  # 输出：199.99
else:
    print("未匹配到内容")

2. 确认文本编码和特殊字符

# 处理换行符
text = "第一行\n第二行"
pattern = r"第二行"
match = re.search(pattern, text, re.DOTALL)  # DOTALL模式让.匹配换行符

3. 使用正确的匹配方法

# search() vs match()
text = "开头 中间内容 结尾"
print(re.match(r"中间", text))    # None，match只匹配开头
print(re.search(r"中间", text))   # 匹配成功

# findall() 提取所有匹配
text = "苹果10元，香蕉20元"
prices = re.findall(r"\d+元", text)  # ['10元', '20元']

4. 调试技巧

import re

def debug_regex(pattern, text):
    try:
        matches = re.findall(pattern, text)
        print(f"模式: {pattern}")
        print(f"文本: {text}")
        print(f"匹配结果: {matches}")
        return matches
    except re.error as e:
        print(f"正则表达式错误: {e}")

# 测试
debug_regex(r"\d+", "abc123def")

5. 常见陷阱