在Go语言中,可以使用 github.com/xitongsys/parquet-go 库来读写Parquet文件。以下是基本操作示例:
1. 安装依赖
go get github.com/xitongsys/parquet-go
go get github.com/xitongsys/parquet-go/parquet
go get github.com/xitongsys/parquet-go/source
2. 写入Parquet文件
package main
import (
"log"
"github.com/xitongsys/parquet-go-source/local"
"github.com/xitongsys/parquet-go/parquet"
"github.com/xitongsys/parquet-go/writer"
)
type Student struct {
Name string `parquet:"name=name, type=UTF8"`
Age int32 `parquet:"name=age, type=INT32"`
Weight float64 `parquet:"name=weight, type=DOUBLE"`
}
func main() {
fw, err := local.NewLocalFileWriter("output.parquet")
if err != nil {
log.Fatal(err)
}
defer fw.Close()
pw, err := writer.NewParquetWriter(fw, new(Student), 4)
if err != nil {
log.Fatal(err)
}
pw.CompressionType = parquet.CompressionCodec_SNAPPY
students := []Student{
{"Alice", 20, 55.5},
{"Bob", 22, 65.0},
{"Charlie", 21, 60.5},
}
for _, student := range students {
if err = pw.Write(student); err != nil {
log.Fatal(err)
}
}
if err = pw.WriteStop(); err != nil {
log.Fatal(err)
}
}
3. 读取Parquet文件
package main
import (
"log"
"github.com/xitongsys/parquet-go-source/local"
"github.com/xitongsys/parquet-go/reader"
)
func main() {
fr, err := local.NewLocalFileReader("output.parquet")
if err != nil {
log.Fatal(err)
}
defer fr.Close()
pr, err := reader.NewParquetReader(fr, nil, 4)
if err != nil {
log.Fatal(err)
}
defer pr.ReadStop()
num := int(pr.GetNumRows())
students := make([]Student, num)
if err = pr.Read(&students); err != nil {
log.Fatal(err)
}
for _, student := range students {
log.Printf("Name: %s, Age: %d, Weight: %.1f",
student.Name, student.Age, student.Weight)
}
}
主要特性:
- 支持所有Parquet数据类型
- 支持Snappy、GZIP等压缩格式
- 支持嵌套数据结构
- 支持并行读写
注意事项:
- 结构体字段必须使用
parquet标签定义元数据
- 读写完成后需要正确关闭资源
- 可以通过调整
goroutine数量优化性能
这个库提供了完整的Parquet文件支持,适合大数据量场景下的列式存储需求。