Python中正则表达式的两种写法有什么区别？

第一种写法

import re
line = "Cats are smarter than dogs"
m = re.match( r'(.*) are (.*?) .*', line)
print m.group()
print m.group(1)
print m.group(2)

第二种写法

import re
pattern = re.compile(r'(.*) are (.*?) .*')
m = pattern.match("Cats are smarter than dogs")
print m.group()
print m.group(1)
print m.group(2)

这两种写法有些啥区别？为啥要定义两种写法呢？

Python中正则表达式的两种写法有什么区别？

zlyuanteng 1楼

def match(pattern, string, flags=0):
“”“Try to apply the pattern at the start of the string, returning
a Match object, or None if no match was found.”""
return _compile(pattern, flags).match(string)

这是标准库的定义，所以说你的两种写法本质上是一样的

在Python里用正则，主要有两种写法：re.compile() 预编译 和 re模块函数直接调用。核心区别在于性能和代码复用。

1. 直接调用 (如 re.search(pattern, string))

import re
result = re.search(r'\d+', 'abc123def')

每次调用都会在内部编译一次正则表达式。如果这个模式在循环里或者被频繁调用，就会重复编译，造成不必要的开销。

2. 预编译 (使用 re.compile())

import re
pattern = re.compile(r'\d+')
result1 = pattern.search('abc123def')
result2 = pattern.findall('xyz456')

先把正则字符串编译成一个正则表达式对象 (Pattern对象)。之后可以反复调用这个对象的 search(), findall(), sub() 等方法。一次编译，多次使用，在循环或高频调用场景下性能更好。

简单总结：

单次或偶尔使用：直接用 re.search() 更简洁。
多次复用同一个模式（尤其在循环里）：一定要用 re.compile() 预编译，这是好习惯。

一句话建议： 需要复用就编译，一次性的直接调。

oop 当然是为了复用啦
你以为正则都是只用一次就舍弃？

compile 一个复用一个不复用

楼主这明显还是 py2 的写法；第二种是效率更高一点点。

sinazl 6楼作者

第一种每次调用 match 都有一个正则的编译时间，编译是指正则相当于一种简单的语言，需要对其进行解析，形成某种结构比如语法树，才好对字符串进行匹配，第二种是提前对正则进行了编译，而不是每次调用都有，效率比前者高点

h691938207 7楼

其实 python3 的 re.py 中_compile()函数内部是有个_cache 的
https://github.com/python/cpython/blob/3.7/Lib/re.py#L268
所以并不会每次调用都会编译一遍

bupafengyu 8楼

其实 python2 就有了，只是那时候是简单的计数淘汰，当达到计数上限时，就把所有的编译过的 re 都扔了 0 0

俩种写法书上看过，好像只推荐了其中一种

itying888 10楼

第二种更好。

zlyuanteng 11楼

就这个例子来说没啥区别，因为两种都是编译一次用了一次。但是如果同一个正则式反复用的话，可以调用一个 compile，然后反复用 pattern.match 可以减少多次 compile 的时间。

h691938207 12楼

第一种，每次你 match 的时候都要执行一遍 pattern=re.compile(r’(.) are (.?) .*’)
第二种，你先 compile 好了以后就不用每次都 compile 了，效率更高一点

你只用一次这个正则没什么区别，多次使用时有区别

回到顶部