golang快速生成关系型数据CSV文件的插件库dg的使用
Golang快速生成关系型数据CSV文件的插件库dg使用指南
dg是一个快速生成关系型数据并输出为CSV文件的工具库。
安装
下载与您系统架构匹配的版本,解压并将可执行文件移动到PATH路径中:
$ tar -xvf dg_[VERSION]-rc1_macOS.tar.gz
使用
基本命令格式:
$ dg
Usage dg:
-c string
the absolute or relative path to the config file
-cpuprofile string
write cpu profile to file
-i string
write import statements to file
-o string
the absolute or relative path to the output dir (default ".")
-p int
port to serve files from (omit to generate without serving)
-version
display the current version number
完整示例
下面是一个完整的配置示例,生成人员、事件、人员类型表以及它们之间的关系表:
tables:
- name: person
count: 10000
columns:
# 为每个人生成随机UUID
- name: id
type: gen
processor:
value: ${uuid}
- name: event
count: 50
columns:
# 为每个事件生成随机UUID
- name: id
type: gen
processor:
value: ${uuid}
- name: person_type
count: 5
columns:
# 为每个人类型生成随机UUID
- name: id
type: gen
processor:
value: ${uuid}
# 生成16位随机数并左填充为5位
- name: name
type: gen
processor:
value: ${uint16}
format: "%05d"
- name: person_event
columns:
# 为每个人事件关系生成随机UUID
- name: id
type: gen
processor:
value: ${uuid}
# 从person_type表中随机选择id
- name: person_type
type: ref
processor:
table: person_type
column: id
# 为person表中的每个id生成person_id列
- name: person_id
type: each
processor:
table: person
column: id
# 为event表中的每个id生成event_id列
- name: event_id
type: each
processor:
table: event
column: id
运行命令生成数据:
$ dg -c your_config_file.yaml -o your_output_dir -p 3000
loaded config file took: 428µs
generated table: person took: 41ms
generated table: event took: 159µs
generated table: person_type took: 42µs
generated table: person_event took: 1s
generated all tables took: 1s
wrote csv: person took: 1ms
wrote csv: event took: 139µs
wrote csv: person_type took: 110µs
wrote csv: person_event took: 144ms
wrote all csvs took: 145ms
输出目录结构:
your_output_dir
├── event.csv
├── person.csv
├── person_event.csv
└── person_type.csv
表配置详解
表配置支持以下字段:
tables:
- name: person
unique_columns: [col_a, col_b] # 可选,基于这些列去重
count: 10 # 可选,生成的行数
columns: ... # 必填,列配置
处理器类型
gen - 生成随机值
- name: sku
type: gen
processor:
value: SKU${uint16} # 使用随机函数
format: "%05d" # 格式化输出
set - 从集合中选择值
- name: user_type
type: set
processor:
values: [admin, regular, read-only] # 等概率选择
- name: favourite_animal
type: set
processor:
values: [rabbit, dog, cat] # 带权重选择
weights: [10, 60, 30] # 选择概率分别为10%,60%,30%
inc - 生成递增数字
- name: id
type: inc
processor:
start: 1 # 起始值
format: "P%03d" # 格式化
ref - 引用其他表的值
- name: ptype
type: ref
processor:
table: person_type # 引用表名
column: id # 引用列名
each - 为每个引用值生成行
- name: person_id
type: each
processor:
table: person # 引用表
column: id # 引用列
range - 生成范围内的值
# 生成递增ID
- name: id
type: range
processor:
type: int
from: 1
step: 1
# 生成日期范围
- name: date
type: range
processor:
type: date
from: 2020-01-01
to: 2023-01-01
format: 2006-01-02
step: 24h
输入配置
可以从CSV文件读取数据作为输入:
inputs:
- name: significant_event
type: csv
source:
file_name: significant_dates.csv
可用函数
dg支持大量随机生成函数,例如:
${uuid}
- 生成UUID${first_name}
- 生成名字${last_name}
- 生成姓氏${email}
- 生成邮箱${date}
- 生成日期${uint16}
- 生成16位无符号整数- 等等…
完整函数列表请参考文档中的函数表。
构建发布
本地构建发布版本:
$ VERSION=0.1.0 make release
dg是一个功能强大的关系型数据生成工具,特别适合生成测试数据和模拟数据。通过灵活的配置,可以快速生成具有复杂关系的数据集。
更多关于golang快速生成关系型数据CSV文件的插件库dg的使用的实战教程也可以访问 https://www.itying.com/category-94-b0.html
1 回复
更多关于golang快速生成关系型数据CSV文件的插件库dg的使用的实战系列教程也可以访问 https://www.itying.com/category-94-b0.html
使用dg库快速生成关系型数据CSV文件
dg是一个Go语言库,专门用于快速生成关系型数据的CSV文件。它特别适合测试数据生成、模拟数据场景等用途。下面我将详细介绍如何使用dg库。
安装dg库
首先需要安装dg库:
go get github.com/darwayne/dg
基本使用方法
1. 创建简单CSV文件
package main
import (
"github.com/darwayne/dg"
"os"
)
func main() {
// 创建CSV生成器
generator := dg.NewCSVGenerator(os.Stdout)
// 定义表结构
generator.AddTable("users", []dg.Column{
{Name: "id", Type: dg.Int, AutoIncrement: true},
{Name: "name", Type: dg.String},
{Name: "email", Type: dg.String},
{Name: "created_at", Type: dg.Time},
})
// 生成5条记录
generator.Generate("users", 5)
}
2. 生成关联数据
dg的强大之处在于可以生成关联数据:
func main() {
generator := dg.NewCSVGenerator(os.Stdout)
// 定义用户表
generator.AddTable("users", []dg.Column{
{Name: "id", Type: dg.Int, AutoIncrement: true},
{Name: "name", Type: dg.String},
{Name: "email", Type: dg.String},
})
// 定义订单表,关联用户
generator.AddTable("orders", []dg.Column{
{Name: "id", Type: dg.Int, AutoIncrement: true},
{Name: "user_id", Type: dg.Int, ForeignKey: "users.id"},
{Name: "amount", Type: dg.Float, Min: 10.0, Max: 1000.0},
{Name: "created_at", Type: dg.Time},
})
// 生成10个用户,每个用户有3-5个订单
generator.GenerateWithRelationships(map[string]int{
"users": 10,
"orders": dg.PerParent(3, 5),
})
}
3. 自定义数据生成器
你可以为特定列定义自定义生成逻辑:
func main() {
generator := dg.NewCSVGenerator(os.Stdout)
generator.AddTable("products", []dg.Column{
{Name: "id", Type: dg.Int, AutoIncrement: true},
{Name: "name", Type: dg.String, Generator: func() interface{} {
// 自定义产品名称生成器
products := []string{"Laptop", "Phone", "Tablet", "Monitor", "Keyboard"}
return products[rand.Intn(len(products))] + " " + strconv.Itoa(rand.Intn(1000))
}},
{Name: "price", Type: dg.Float, Min: 50.0, Max: 2000.0},
})
generator.Generate("products", 20)
}
高级功能
1. 数据分布控制
generator.AddTable("users", []dg.Column{
{Name: "status", Type: dg.String, Distribution: map[interface{}]float64{
"active": 0.7,
"inactive": 0.2,
"banned": 0.1,
}},
})
2. 唯一约束
generator.AddTable("users", []dg.Column{
{Name: "username", Type: dg.String, Unique: true},
})
3. 生成到文件
func main() {
file, err := os.Create("output.csv")
if err != nil {
panic(err)
}
defer file.Close()
generator := dg.NewCSVGenerator(file)
// ... 配置表结构和生成数据
}
完整示例
package main
import (
"github.com/darwayne/dg"
"os"
"math/rand"
"time"
)
func main() {
rand.Seed(time.Now().UnixNano())
file, err := os.Create("ecommerce_data.csv")
if err != nil {
panic(err)
}
defer file.Close()
generator := dg.NewCSVGenerator(file)
// 用户表
generator.AddTable("customers", []dg.Column{
{Name: "customer_id", Type: dg.Int, AutoIncrement: true},
{Name: "first_name", Type: dg.String, MinLen: 3, MaxLen: 10},
{Name: "last_name", Type: dg.String, MinLen: 3, MaxLen: 15},
{Name: "email", Type: dg.String, Format: "{first_name}.{last_name}@example.com"},
{Name: "join_date", Type: dg.Time, Min: time.Now().AddDate(-2, 0, 0), Max: time.Now()},
{Name: "status", Type: dg.String, Distribution: map[interface{}]float64{
"active": 0.8,
"inactive": 0.2,
}},
})
// 产品表
generator.AddTable("products", []dg.Column{
{Name: "product_id", Type: dg.Int, AutoIncrement: true},
{Name: "name", Type: dg.String, Generator: productNameGenerator},
{Name: "category", Type: dg.String, Values: []interface{}{"Electronics", "Clothing", "Home", "Books"}},
{Name: "price", Type: dg.Float, Min: 5.0, Max: 500.0},
{Name: "stock", Type: dg.Int, Min: 0, Max: 1000},
})
// 订单表
generator.AddTable("orders", []dg.Column{
{Name: "order_id", Type: dg.Int, AutoIncrement: true},
{Name: "customer_id", Type: dg.Int, ForeignKey: "customers.customer_id"},
{Name: "order_date", Type: dg.Time, Min: time.Now().AddDate(-1, 0, 0), Max: time.Now()},
{Name: "status", Type: dg.String, Values: []interface{}{"pending", "shipped", "delivered", "cancelled"}},
})
// 订单详情表
generator.AddTable("order_items", []dg.Column{
{Name: "item_id", Type: dg.Int, AutoIncrement: true},
{Name: "order_id", Type: dg.Int, ForeignKey: "orders.order_id"},
{Name: "product_id", Type: dg.Int, ForeignKey: "products.product_id"},
{Name: "quantity", Type: dg.Int, Min: 1, Max: 5},
{Name: "unit_price", Type: dg.Float, ForeignCol: "products.price"},
})
// 生成数据
generator.GenerateWithRelationships(map[string]int{
"customers": 50,
"products": 30,
"orders": dg.PerParent(3, 10), // 每个客户3-10个订单
"order_items": dg.PerParent(1, 5), // 每个订单1-5个商品
})
}
func productNameGenerator() interface{} {
prefixes := []string{"Pro", "Super", "Mega", "Ultra", "Smart"}
products := []string{"Laptop", "Phone", "Tablet", "Watch", "Camera", "Speaker"}
suffixes := []string{"X", "Pro", "Max", "Plus", "2023"}
return prefixes[rand.Intn(len(prefixes))] + " " +
products[rand.Intn(len(products))] + " " +
suffixes[rand.Intn(len(suffixes))]
}
这个示例会生成一个完整的电子商务数据集,包含客户、产品、订单和订单详情表,所有表之间都有正确的关联关系。
dg库提供了丰富的数据生成选项,可以满足大多数测试数据生成需求。通过合理配置,你可以快速生成符合业务逻辑的关联数据。