golang快速生成关系型数据CSV文件的插件库dg的使用

Golang快速生成关系型数据CSV文件的插件库dg使用指南

dg库封面图

dg是一个快速生成关系型数据并输出为CSV文件的工具库。

安装

下载与您系统架构匹配的版本,解压并将可执行文件移动到PATH路径中:

$ tar -xvf dg_[VERSION]-rc1_macOS.tar.gz

使用

基本命令格式:

$ dg
Usage dg:
  -c string
        the absolute or relative path to the config file
  -cpuprofile string
        write cpu profile to file
  -i string
        write import statements to file
  -o string
        the absolute or relative path to the output dir (default ".")
  -p int
        port to serve files from (omit to generate without serving)
  -version
        display the current version number

完整示例

下面是一个完整的配置示例,生成人员、事件、人员类型表以及它们之间的关系表:

tables:
  - name: person
    count: 10000
    columns:
      # 为每个人生成随机UUID
      - name: id
        type: gen
        processor:
          value: ${uuid}

  - name: event
    count: 50
    columns:
      # 为每个事件生成随机UUID
      - name: id
        type: gen
        processor:
          value: ${uuid}

  - name: person_type
    count: 5
    columns:
      # 为每个人类型生成随机UUID
      - name: id
        type: gen
        processor:
          value: ${uuid}

      # 生成16位随机数并左填充为5位
      - name: name
        type: gen
        processor:
          value: ${uint16}
          format: "%05d"

  - name: person_event
    columns:
      # 为每个人事件关系生成随机UUID
      - name: id
        type: gen
        processor:
          value: ${uuid}

      # 从person_type表中随机选择id
      - name: person_type
        type: ref
        processor:
          table: person_type
          column: id

      # 为person表中的每个id生成person_id列
      - name: person_id
        type: each
        processor:
          table: person
          column: id

      # 为event表中的每个id生成event_id列
      - name: event_id
        type: each
        processor:
          table: event
          column: id

运行命令生成数据:

$ dg -c your_config_file.yaml -o your_output_dir -p 3000
loaded config file                       took: 428µs
generated table: person                  took: 41ms
generated table: event                   took: 159µs
generated table: person_type             took: 42µs
generated table: person_event            took: 1s
generated all tables                     took: 1s
wrote csv: person                        took: 1ms
wrote csv: event                         took: 139µs
wrote csv: person_type                   took: 110µs
wrote csv: person_event                  took: 144ms
wrote all csvs                           took: 145ms

输出目录结构:

your_output_dir
├── event.csv
├── person.csv
├── person_event.csv
└── person_type.csv

表配置详解

表配置支持以下字段:

tables:
  - name: person
    unique_columns: [col_a, col_b]  # 可选,基于这些列去重
    count: 10                       # 可选,生成的行数
    columns: ...                    # 必填,列配置

处理器类型

gen - 生成随机值

- name: sku
  type: gen
  processor:
    value: SKU${uint16}  # 使用随机函数
    format: "%05d"       # 格式化输出

set - 从集合中选择值

- name: user_type
  type: set
  processor:
    values: [admin, regular, read-only]  # 等概率选择

- name: favourite_animal
  type: set
  processor:
    values: [rabbit, dog, cat]  # 带权重选择
    weights: [10, 60, 30]       # 选择概率分别为10%,60%,30%

inc - 生成递增数字

- name: id
  type: inc
  processor:
    start: 1          # 起始值
    format: "P%03d"   # 格式化

ref - 引用其他表的值

- name: ptype
  type: ref
  processor:
    table: person_type  # 引用表名
    column: id          # 引用列名

each - 为每个引用值生成行

- name: person_id
  type: each
  processor:
    table: person  # 引用表
    column: id     # 引用列

range - 生成范围内的值

# 生成递增ID
- name: id
  type: range
  processor:
    type: int
    from: 1
    step: 1

# 生成日期范围
- name: date
  type: range
  processor:
    type: date
    from: 2020-01-01
    to: 2023-01-01
    format: 2006-01-02
    step: 24h

输入配置

可以从CSV文件读取数据作为输入:

inputs:
  - name: significant_event
    type: csv
    source:
      file_name: significant_dates.csv

可用函数

dg支持大量随机生成函数,例如:

  • ${uuid} - 生成UUID
  • ${first_name} - 生成名字
  • ${last_name} - 生成姓氏
  • ${email} - 生成邮箱
  • ${date} - 生成日期
  • ${uint16} - 生成16位无符号整数
  • 等等…

完整函数列表请参考文档中的函数表。

构建发布

本地构建发布版本:

$ VERSION=0.1.0 make release

dg是一个功能强大的关系型数据生成工具,特别适合生成测试数据和模拟数据。通过灵活的配置,可以快速生成具有复杂关系的数据集。


更多关于golang快速生成关系型数据CSV文件的插件库dg的使用的实战教程也可以访问 https://www.itying.com/category-94-b0.html

1 回复

更多关于golang快速生成关系型数据CSV文件的插件库dg的使用的实战系列教程也可以访问 https://www.itying.com/category-94-b0.html


使用dg库快速生成关系型数据CSV文件

dg是一个Go语言库,专门用于快速生成关系型数据的CSV文件。它特别适合测试数据生成、模拟数据场景等用途。下面我将详细介绍如何使用dg库。

安装dg库

首先需要安装dg库:

go get github.com/darwayne/dg

基本使用方法

1. 创建简单CSV文件

package main

import (
	"github.com/darwayne/dg"
	"os"
)

func main() {
	// 创建CSV生成器
	generator := dg.NewCSVGenerator(os.Stdout)
	
	// 定义表结构
	generator.AddTable("users", []dg.Column{
		{Name: "id", Type: dg.Int, AutoIncrement: true},
		{Name: "name", Type: dg.String},
		{Name: "email", Type: dg.String},
		{Name: "created_at", Type: dg.Time},
	})
	
	// 生成5条记录
	generator.Generate("users", 5)
}

2. 生成关联数据

dg的强大之处在于可以生成关联数据:

func main() {
	generator := dg.NewCSVGenerator(os.Stdout)
	
	// 定义用户表
	generator.AddTable("users", []dg.Column{
		{Name: "id", Type: dg.Int, AutoIncrement: true},
		{Name: "name", Type: dg.String},
		{Name: "email", Type: dg.String},
	})
	
	// 定义订单表,关联用户
	generator.AddTable("orders", []dg.Column{
		{Name: "id", Type: dg.Int, AutoIncrement: true},
		{Name: "user_id", Type: dg.Int, ForeignKey: "users.id"},
		{Name: "amount", Type: dg.Float, Min: 10.0, Max: 1000.0},
		{Name: "created_at", Type: dg.Time},
	})
	
	// 生成10个用户,每个用户有3-5个订单
	generator.GenerateWithRelationships(map[string]int{
		"users": 10,
		"orders": dg.PerParent(3, 5),
	})
}

3. 自定义数据生成器

你可以为特定列定义自定义生成逻辑:

func main() {
	generator := dg.NewCSVGenerator(os.Stdout)
	
	generator.AddTable("products", []dg.Column{
		{Name: "id", Type: dg.Int, AutoIncrement: true},
		{Name: "name", Type: dg.String, Generator: func() interface{} {
			// 自定义产品名称生成器
			products := []string{"Laptop", "Phone", "Tablet", "Monitor", "Keyboard"}
			return products[rand.Intn(len(products))] + " " + strconv.Itoa(rand.Intn(1000))
		}},
		{Name: "price", Type: dg.Float, Min: 50.0, Max: 2000.0},
	})
	
	generator.Generate("products", 20)
}

高级功能

1. 数据分布控制

generator.AddTable("users", []dg.Column{
	{Name: "status", Type: dg.String, Distribution: map[interface{}]float64{
		"active": 0.7,
		"inactive": 0.2,
		"banned": 0.1,
	}},
})

2. 唯一约束

generator.AddTable("users", []dg.Column{
	{Name: "username", Type: dg.String, Unique: true},
})

3. 生成到文件

func main() {
	file, err := os.Create("output.csv")
	if err != nil {
		panic(err)
	}
	defer file.Close()
	
	generator := dg.NewCSVGenerator(file)
	// ... 配置表结构和生成数据
}

完整示例

package main

import (
	"github.com/darwayne/dg"
	"os"
	"math/rand"
	"time"
)

func main() {
	rand.Seed(time.Now().UnixNano())
	
	file, err := os.Create("ecommerce_data.csv")
	if err != nil {
		panic(err)
	}
	defer file.Close()
	
	generator := dg.NewCSVGenerator(file)
	
	// 用户表
	generator.AddTable("customers", []dg.Column{
		{Name: "customer_id", Type: dg.Int, AutoIncrement: true},
		{Name: "first_name", Type: dg.String, MinLen: 3, MaxLen: 10},
		{Name: "last_name", Type: dg.String, MinLen: 3, MaxLen: 15},
		{Name: "email", Type: dg.String, Format: "{first_name}.{last_name}@example.com"},
		{Name: "join_date", Type: dg.Time, Min: time.Now().AddDate(-2, 0, 0), Max: time.Now()},
		{Name: "status", Type: dg.String, Distribution: map[interface{}]float64{
			"active": 0.8,
			"inactive": 0.2,
		}},
	})
	
	// 产品表
	generator.AddTable("products", []dg.Column{
		{Name: "product_id", Type: dg.Int, AutoIncrement: true},
		{Name: "name", Type: dg.String, Generator: productNameGenerator},
		{Name: "category", Type: dg.String, Values: []interface{}{"Electronics", "Clothing", "Home", "Books"}},
		{Name: "price", Type: dg.Float, Min: 5.0, Max: 500.0},
		{Name: "stock", Type: dg.Int, Min: 0, Max: 1000},
	})
	
	// 订单表
	generator.AddTable("orders", []dg.Column{
		{Name: "order_id", Type: dg.Int, AutoIncrement: true},
		{Name: "customer_id", Type: dg.Int, ForeignKey: "customers.customer_id"},
		{Name: "order_date", Type: dg.Time, Min: time.Now().AddDate(-1, 0, 0), Max: time.Now()},
		{Name: "status", Type: dg.String, Values: []interface{}{"pending", "shipped", "delivered", "cancelled"}},
	})
	
	// 订单详情表
	generator.AddTable("order_items", []dg.Column{
		{Name: "item_id", Type: dg.Int, AutoIncrement: true},
		{Name: "order_id", Type: dg.Int, ForeignKey: "orders.order_id"},
		{Name: "product_id", Type: dg.Int, ForeignKey: "products.product_id"},
		{Name: "quantity", Type: dg.Int, Min: 1, Max: 5},
		{Name: "unit_price", Type: dg.Float, ForeignCol: "products.price"},
	})
	
	// 生成数据
	generator.GenerateWithRelationships(map[string]int{
		"customers": 50,
		"products": 30,
		"orders": dg.PerParent(3, 10), // 每个客户3-10个订单
		"order_items": dg.PerParent(1, 5), // 每个订单1-5个商品
	})
}

func productNameGenerator() interface{} {
	prefixes := []string{"Pro", "Super", "Mega", "Ultra", "Smart"}
	products := []string{"Laptop", "Phone", "Tablet", "Watch", "Camera", "Speaker"}
	suffixes := []string{"X", "Pro", "Max", "Plus", "2023"}
	
	return prefixes[rand.Intn(len(prefixes))] + " " + 
		products[rand.Intn(len(products))] + " " + 
		suffixes[rand.Intn(len(suffixes))]
}

这个示例会生成一个完整的电子商务数据集,包含客户、产品、订单和订单详情表,所有表之间都有正确的关联关系。

dg库提供了丰富的数据生成选项,可以满足大多数测试数据生成需求。通过合理配置,你可以快速生成符合业务逻辑的关联数据。

回到顶部