Python中json格式化时遇到报错如何解决?
我直接 json.loads(response.text)
返回的报错
Traceback (most recent call last): 780 File "/home/shenjianlin/.local/lib/python3.4/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks 781 current.result = callback(current.result, *args, **kw) 782 File "/home/shenjianlin/my_project/Espider/Espider/spiders/xxgkmiit.py", line 31, in parse 783 _origin=json.loads(response.text) 784 File "/usr/lib64/python3.4/json/init.py", line 318, in loads 785 return _default_decoder.decode(s) 786 File "/usr/lib64/python3.4/json/decoder.py", line 343, in decode 787 obj, end = self.raw_decode(s, idx=_w(s, 0).end()) 788 File "/usr/lib64/python3.4/json/decoder.py", line 361, in raw_decode 789 raise ValueError(errmsg("Expecting value", s, err.value)) from None 790 ValueError: Expecting value: line 1 column 1 (char 0)
Python中json格式化时遇到报错如何解决?
遇到JSON格式化报错,先别慌,大部分问题都出在数据格式上。Python的json模块要求序列化的对象必须是JSON兼容的,比如字典、列表、字符串、数字、布尔值和None。常见的坑主要有两个:
1. 对象包含非JSON兼容类型
比如datetime对象、自定义类实例、set集合等。处理方法是把它们转换成基本类型。
import json
from datetime import datetime
data = {
'name': 'test',
'time': datetime.now(), # 直接放datetime会报错
'numbers': {1, 2, 3} # set也不行
}
# 解决方案:自定义序列化函数
def custom_serializer(obj):
if isinstance(obj, datetime):
return obj.isoformat() # 转成字符串
elif isinstance(obj, set):
return list(obj) # 转成列表
raise TypeError(f"Object of type {type(obj)} is not JSON serializable")
# 使用default参数
json_str = json.dumps(data, default=custom_serializer)
print(json_str) # 正常输出
2. 字符串编码问题 包含特殊字符或二进制数据时容易出错。确保字符串是有效的UTF-8。
# 处理特殊字符
data = {'text': '包含特殊字符\x00的字符串'}
# 方案1:忽略错误字符
json_str = json.dumps(data, ensure_ascii=False, errors='ignore')
# 方案2:手动清理
import re
cleaned_text = re.sub(r'[\x00-\x1f\x7f-\x9f]', '', data['text'])
data['text'] = cleaned_text
json_str = json.dumps(data, ensure_ascii=False)
快速排查步骤:
- 先用
print(type(your_data))看看整体结构 - 递归检查数据中的每个元素是否都是基础类型
- 如果数据复杂,考虑用
json.dumps(data, default=str)先转成字符串看看哪里出错
总结建议:确保所有数据都是JSON兼容的基本类型。
把外围的 jQuery111102456514014162614_1546997791362(); 去掉才是合法 json 格式。
说明做了 MIIT 的程序员做了防 JSON 劫持。原因可以参见这里: http://www.10tiao.com/html/788/201811/2247489959/1.html
请参考 #1 #2
http://xxgk.miit.gov.cn/gdnps/searchIndex.jsp?params=%257B%2522goPage%2522%253A4%252C%2522orderBy%2522%253A%255B%257B%2522orderBy%2522%253A%2522publishTime%2522%252C%2522reverse%2522%253Atrue%257D%252C%257B%2522orderBy%2522%253A%2522orderTime%2522%252C%2522reverse%2522%253Atrue%257D%255D%252C%2522pageSize%2522%253A10%252C%2522queryParam%2522%253A%255B%257B%257D%252C%257B%257D%252C%257B%2522shortName%2522%253A%2522fbjg%2522%252C%2522value%2522%253A%2522%252F1%252F29%252F1146295%252F1652858%252F1652930%2522%257D%255D%257D
可以用这个获取
然后
https://ae01.alicdn.com/kf/HTB1K9ExasnrK1RkHFrdq6xCoFXaZ.jpg
import requests
import json
url=‘http://xxgk.miit.gov.cn/gdnps/searchIndex.jsp?params=%257B%2522goPage%2522%253A4%252C%2522orderBy%2522%253A%255B%257B%2522orderBy%2522%253A%2522publishTime%2522%252C%2522reverse%2522%253Atrue%257D%252C%257B%2522orderBy%2522%253A%2522orderTime%2522%252C%2522reverse%2522%253Atrue%257D%255D%252C%2522pageSize%2522%253A10%252C%2522queryParam%2522%253A%255B%257B%257D%252C%257B%257D%252C%257B%2522shortName%2522%253A%2522fbjg%2522%252C%2522value%2522%253A%2522%252F1%252F29%252F1146295%252F1652858%252F1652930%2522%257D%255D%257D_=1546997791366’
r=requests.get(url)
js=json.loads(r.text.split(’);\r\n’)[0][1:])
print(js)
Traceback (most recent call last):
922 File “/home/shenjianlin/.local/lib/python3.4/site-packages/twisted/internet/defer.py”, line 653, in _runCallbacks
923 current.result = callback(current.result, *args, **kw)
924 File “/home/shenjianlin/my_project/Espider/Espider/spiders/xxgkmiit.py”, line 30, in parse
925 _origin=json.loads(response.text.split(’);\r\n’)[0][1:])
926 File “/usr/lib64/python3.4/json/init.py”, line 318, in loads
927 return _default_decoder.decode(s)
928 File “/usr/lib64/python3.4/json/decoder.py”, line 343, in decode
929 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
930 File “/usr/lib64/python3.4/json/decoder.py”, line 361, in raw_decode
931 raise ValueError(errmsg(“Expecting value”, s, err.value)) from None
932 ValueError: Expecting value: line 1 column 1 (char 0)
加这个明明是为了 jsonp 好吗… callback(json)
如何去掉呢
你把地址给 urldecode 回去,就可以看到一个 callback 的方法
http://xxgk.miit.gov.cn/gdnps/searchIndex.jsp?params=%7B%22goPage%22%3A4%2C%22orderBy%22%3A%5B%7B%22orderBy%22%3A%22publishTime%22%2C%22reverse%22%3Atrue%7D%2C%7B%22orderBy%22%3A%22orderTime%22%2C%22reverse%22%3Atrue%7D%5D%2C%22pageSize%22%3A10%2C%22queryParam%22%3A%5B%7B%7D%2C%7B%7D%2C%7B%22shortName%22%3A%22fbjg%22%2C%22value%22%3A%22%2F1%2F29%2F1146295%2F1652858%2F1652930%22%7D%5D%7D&callback=jQuery111102456514014162614_1546997791362&_=1546997791366
callback=jQuery111102456514014162614_1546997791362
你可以修改这个值为你自己需要的值。
连字符串操作都不会还写什么代码。。。。
原理就是 script 的方式加载接口解决跨域的问题,然后 js 内容是执行一个函数,至于函数名称就是 你自己定义的 callback
然后加载好 js 后就会跑你定义的 callback,你就可以拿到数据。
jsonp 知识了解下:
https://blog.csdn.net/hansexploration/article/details/80314948
错人了
应该是
如果你不知道如何去掉外围的字符串,我建议你是停下来先认真学习一遍 Python 和编程基础知识。
我了解这是为了 jsonp,但是 lz 的需求是用 Python 读取这段 json,最直接的方法就是去掉 callback() 后 json.loads,他肯定不懂且也不需要去了解 jsonp 知识。

