[新人求助] Python中如何用正则表达式提取 gallery: JSON.parse 中的JSON字符串内容，以便用json.loads解析

BASE_DATA.galleryInfo = {
title: ‘图虫街拍摄影：街拍’,
isOriginal: false,
mediaInfo: BASE_DATA.mediaInfo,
gallery: JSON.parse("{“count”:5,“sub_images”:[{“url”:“http:\/\/p3.pstatp.com\/origin\/tuchong.fullscreen\/22261904_tt”,“width”:797,“url_list”:[{“url”:“http:\/\/p3.pstatp.com\/origin\/tuchong.fullscreen\/22261904_tt”},{“url”:“http:\/\/pb9.pstatp.com\/origin\/tuchong.fullscreen\/22261904_tt”},{“url”:“http:\/\/pb1.pstatp.com\/origin\/tuchong.fullscreen\/22261904_tt”}],“uri”:“origin\/tuchong.fullscreen\/22261904_tt”,“height”:1200},{“url”:“http:\/\/p99.pstatp.com\/origin\/tuchong.fullscreen\/22261905_tt”,“width”:1200,“url_list”:[{“url”:“http:\/\/p99.pstatp.com\/origin\/tuchong.fullscreen\/22261905_tt”},{“url”:“http:\/\/pb3.pstatp.com\/origin\/tuchong.fullscreen\/22261905_tt”},{“url”:“http:\/\/pb1.pstatp.com\/origin\/tuchong.fullscreen\/22261905_tt”}],“uri”:“origin\/tuchong.fullscreen\/22261905_tt”,“height”:797},{“url”:“http:\/\/p99.pstatp.com\/origin\/tuchong.fullscreen\/22261906_tt”,“width”:1200,“url_list”:[{“url”:“http:\/\/p99.pstatp.com\/origin\/tuchong.fullscreen\/22261906_tt”},{“url”:“http:\/\/pb3.pstatp.com\/origin\/tuchong.fullscreen\/22261906_tt”},{“url”:“http:\/\/pb1.pstatp.com\/origin\/tuchong.fullscreen\/22261906_tt”}],“uri”:“origin\/tuchong.fullscreen\/22261906_tt”,“height”:797},{“url”:“http:\/\/p99.pstatp.com\/origin\/tuchong.fullscreen\/22261914_tt”,“width”:1200,“url_list”:[{“url”:“http:\/\/p99.pstatp.com\/origin\/tuchong.fullscreen\/22261914_tt”},{“url”:“http:\/\/pb3.pstatp.com\/origin\/tuchong.fullscreen\/22261914_tt”},{“url”:“http:\/\/pb1.pstatp.com\/origin\/tuchong.fullscreen\/22261914_tt”}],“uri”:“origin\/tuchong.fullscreen\/22261914_tt”,“height”:797},{“url”:“http:\/\/p99.pstatp.com\/origin\/tuchong.fullscreen\/22261925_tt”,“width”:1200,“url_list”:[{“url”:“http:\/\/p99.pstatp.com\/origin\/tuchong.fullscreen\/22261925_tt”},{“url”:“http:\/\/pb3.pstatp.com\/origin\/tuchong.fullscreen\/22261925_tt”},{“url”:“http:\/\/pb1.pstatp.com\/origin\/tuchong.fullscreen\/22261925_tt”}],“uri”:“origin\/tuchong.fullscreen\/22261925_tt”,“height”:797}],“max_img_width”:1200,“labels”:["\u6444\u5f71"],“sub_abstracts”:[" \u6444\u5f71\uff1a\u6df1\u84dd1970"," “,” “,” “,” “],“sub_titles”:[”\u56fe\u866b\u8857\u62cd\u6444\u5f71\uff1a\u8857\u62cd","\u56fe\u866b\u8857\u62cd\u6444\u5f71\uff1a\u8857\u62cd","\u56fe\u866b\u8857\u62cd\u6444\u5f71\uff1a\u8857\u62cd","\u56fe\u866b\u8857\u62cd\u6444\u5f71\uff1a\u8857\u62cd","\u56fe\u866b\u8857\u62cd\u6444\u5f71\uff1a\u8857\u62cd"]}"),
siblingList: [],
publish_time: ‘2018-03-10 09:08:40’,
group_id: ‘6531116766146331139’,
item_id: ‘6531116766146331139’,
share_url: ‘https://m.toutiao.com/group/6531116766146331139/’,
abstract: ‘’.replace(/<br />/ig, ‘’),
repin: 0
[新人求助] Python中如何用正则表达式提取 gallery: JSON.parse 中的JSON字符串内容，以便用json.loads解析

gougou168 1楼

gallery: JSON.parse(),这些内容是不要的，只要括号里面的内容，搞了好久搞不出来，求大佬帮忙

yibo5220 2楼

帖子内容： [新人求助] Python中如何用正则表达式提取 gallery: JSON.parse 中的JSON字符串内容，以便用json.loads解析

回答：这个问题很常见，就是从一个包含gallery: JSON.parse('...')这种模式的字符串里，把单引号里的JSON字符串抠出来。

直接上代码，假设你的原始字符串是这样的：

import re
import json

# 这是你的原始字符串，比如从网页源码里拿到的
raw_string = "some html... gallery: JSON.parse('{\"id\": 123, \"images\": [\"a.jpg\", \"b.jpg\"]}') ...more html"

# 关键：用正则匹配 `JSON.parse(` 后面，到下一个 `)` 之前，被单引号包裹的内容
# 模式解释：
# JSON\.parse\(  匹配固定的字符 "JSON.parse("
# '             匹配开始的单引号
# (.*?)         非贪婪匹配任意字符，并捕获到分组里，这就是我们要的JSON字符串
# '             匹配结束的单引号
# \)            匹配右括号
pattern = r"JSON\.parse\('(.*?)'\)"

match = re.search(pattern, raw_string)
if match:
    # match.group(1) 就是我们捕获到的JSON字符串
    json_string = match.group(1)
    print("提取到的JSON字符串:", json_string)

    # 现在可以用json.loads解析了
    try:
        data = json.loads(json_string)
        print("解析后的Python对象:", data)
        # 比如访问id
        print("id:", data.get('id'))
    except json.JSONDecodeError as e:
        print("JSON解析失败:", e)
else:
    print("没有找到匹配的 JSON.parse('...') 模式")

注意点：

正则里的.默认不匹配换行符。如果你的JSON字符串跨行了，需要在模式里加上re.DOTALL标志，像这样：re.search(pattern, raw_string, re.DOTALL)。
这个模式假设JSON字符串是用单引号包裹的。如果实际数据里用的是双引号，比如JSON.parse("...")，就把正则里的单引号'换成双引号"。
提取出来的json_string变量里，双引号"前面可能有反斜杠\"，这是正确的转义形式，json.loads能处理。

总结建议： 用JSON\.parse$'(.*?)'$这个正则就能搞定。

gougou168 3楼

一遇到要正则我就吓得用前后截取哈哈哈
preg_match_all(’/"{[\s\S]*}"/’, $input_lines, $output_array);

yuanlaile 4楼作者

一个正则在线工具
https://www.phpliveregex.com/#tab-preg-match-all

"{[\s\S]*}"

wuwangju 5楼

(?<=JSON.parse)(.*?)

vueper 6楼

JSON.parse("([\s\S]*?)") 这样也可以

gougou168 7楼

这网站太慢了打不开

phonegap100 8楼

老哥，可以了，感谢感谢