如何在Python中读取带注释的SQL文件并写入数据库？

有一个 sql 文件, 是别的模块在安装的时候需要的,
内容应该是从数据库导出的, 1000+行,
找了很多范例, 大部分是默认没有注释, 用结束符做分割来分条插入,
在 so 还找到了一个 mysql 官方的示例:

import mysql.connector
cnx = mysql.connector.connect(database=‘world’)
cursor = cnx.cursor()
cursor.execute(operation, params=None, multi=False)
iterator = cursor.execute(operation, params=None, multi=True)

https://stackoverflow.com/a/51809025/11691764

但官方示例内, 也是不带注释的连续 sql 语句
https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-execute.html

有什么好的解决办法吗?
除了先把 sql 文件处理一遍去掉注释外

如何在Python中读取带注释的SQL文件并写入数据库？

h691938207 1楼

调用 mysql 的 mysqldump 命令
或者找到 mysqldump 的代码编个 python 接口.

import re
import sqlite3
from pathlib import Path

def execute_sql_file_with_comments(file_path, db_connection):
    """
    读取带注释的SQL文件并执行到数据库
    支持：
    - 单行注释（-- 或 #）
    - 多行注释（/* */）
    - 保留分号作为语句分隔符
    """
    # 读取文件内容
    sql_content = Path(file_path).read_text(encoding='utf-8')
    
    # 移除多行注释 /* */
    sql_content = re.sub(r'/\*.*?\*/', '', sql_content, flags=re.DOTALL)
    
    # 移除单行注释 --
    sql_content = re.sub(r'--.*$', '', sql_content, flags=re.MULTILINE)
    
    # 移除单行注释 #
    sql_content = re.sub(r'#.*$', '', sql_content, flags=re.MULTILINE)
    
    # 按分号分割SQL语句，过滤空语句
    statements = [
        stmt.strip() 
        for stmt in sql_content.split(';') 
        if stmt.strip()
    ]
    
    # 执行每条SQL语句
    cursor = db_connection.cursor()
    for stmt in statements:
        try:
            cursor.execute(stmt)
            print(f"执行成功: {stmt[:50]}...")
        except Exception as e:
            print(f"执行失败: {stmt[:50]}... 错误: {e}")
    
    db_connection.commit()
    cursor.close()

# 使用示例
if __name__ == "__main__":
    # 创建SQLite内存数据库（示例）
    conn = sqlite3.connect(':memory:')
    
    # 示例SQL文件内容（实际使用时从文件读取）
    sample_sql = """
    -- 创建用户表
    CREATE TABLE users (
        id INTEGER PRIMARY KEY,
        /* 用户名字段 */
        username TEXT NOT NULL,
        email TEXT UNIQUE,  # 邮箱地址
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
    );
    
    /*
      批量插入测试数据
      这是一个多行注释
    */
    INSERT INTO users (username, email) 
    VALUES ('alice', 'alice@example.com');
    
    INSERT INTO users (username, email) 
    VALUES ('bob', 'bob@example.com'); -- 这是Bob的用户
    """
    
    # 将示例SQL写入临时文件
    temp_file = 'temp_sample.sql'
    Path(temp_file).write_text(sample_sql)
    
    # 执行SQL文件
    execute_sql_file_with_comments(temp_file, conn)
    
    # 验证结果
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM users")
    print("\n插入的数据:", cursor.fetchall())
    
    # 清理
    Path(temp_file).unlink(missing_ok=True)
    conn.close()

核心思路：

用正则表达式分三步移除注释：先处理多行注释/* */，再处理单行注释--和#
按分号分割得到独立SQL语句
逐条执行并处理异常

注意点：

正则re.DOTALL让.匹配换行符，确保多行注释能正确移除
语句分割前先移除注释，避免注释中的分号干扰分割
实际生产环境需要更完善的错误处理和事务管理

一句话建议： 用正则过滤注释后按分号分割执行最可靠。

htzhanglong 3楼

#1 是哦, 我去看看 pyMSQL 的源码

bupafengyu 4楼

关注一下，前两天的做法就是直接粗暴的删掉注释操作的

#3 但从数据库导出的 sql 里面不是很多注释是带变量的吗

我搞过…最后别人给了思路不要正则
用写语法解析器的思路

入栈出栈按规范处理 23333

水平不够写得超级难看

zlyuanteng 7楼

你说的是这个吗?我没理解你遇到了什么问题
可以的形式
In [7]: cursor.execute("select * from test")
Out[7]: 35

In [8]: cursor.execute("select * #123 \n from test")
Out[8]: 35

In [10]: cursor.execute("select * /123/ from test")
Out[10]: 35

不行的
In [9]: cursor.execute("select * #123 from 访问详单")

#6 不是, 就是把 .sql 文件内的语句写到数据库, 一般这个文件内有各种换行, 转义符, 格式符, 以及注释

调用命令行 mysql 呀

#9 你是说这个吗?
import shlex
from subprocess import Popen, PIPE

# 这个并不会有效, 会命令解析错误, 即使加上 shell=True 也不行
res = Popen(shlex.split(‘mysql -uroot -padmin < test.sql’), stdout=PIPE, stderr=PIPE).communicate()

# 也许你会想把 sql 文件作为输入
res = Popen(shlex.split(‘mysql -uroot -padmin’) , stdout=PIPE, stderr=PIPE, stdin=PIPE).communicate(open(‘test.sql’).read())
# 这样在遇到注释内的变量的时候会报错

itying888 11楼

https://github.com/lolizeppelin/simpleservice/blob/master/simpleservice/ormdb/tools/sqldecode.py

之前稍微测试了一下后来一直没用没管了

有现成的 sql 解析库啊，是需要这种东西吗
https://github.com/andialbrecht/sqlparse

#12 在试你这个, 有中文解码问题, 有结果了再告诉你

bupafengyu 14楼

#11
#12
最后迫于赶时间
用了最暴力的
with open(src + ‘sql.sh’, ‘w’) as f:
f.write(‘mysql -u{0} -p{1} -h {2} -P 3306 < {3}ve.sql’.format(db_user, db_password, db_host, src))
res = Popen(shlex.split(’/usr/bin/sh ’ + src + ‘sql.sh’), stdout=PIPE, stderr=PIPE).communicate()
os.remove(src + ‘sql.sh’)

回到顶部