Golang服务监控系统开发

最近在用Golang开发服务监控系统,想请教几个问题:

  1. 有哪些推荐的Golang监控库或框架?Prometheus和OpenTelemetry哪个更适合中小型项目?
  2. 如何高效采集服务的CPU、内存等指标?需要自己实现还是直接用现有库?
  3. 监控数据存储选型有什么建议?InfluxDB和TimescaleDB在性能上差异大吗?
  4. 有没有最佳实践可以分享?比如告警策略设计或可视化方案。
    希望有经验的大佬能指点一下,谢谢!
2 回复

推荐使用Prometheus + Grafana组合。Prometheus负责指标采集,Grafana用于可视化展示。可配合client_golang库暴露自定义指标,实现服务健康状态、性能监控和告警功能。

更多关于Golang服务监控系统开发的实战系列教程也可以访问 https://www.itying.com/category-94-b0.html


Golang服务监控系统开发指南

核心组件设计

1. 指标收集

package metrics

import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

var (
    RequestCount = prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "http_requests_total",
            Help: "Total number of HTTP requests",
        },
        []string{"method", "endpoint", "status"},
    )
    
    ResponseTime = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{
            Name: "http_response_time_seconds",
            Help: "HTTP response time in seconds",
        },
        []string{"method", "endpoint"},
    )
)

func Init() {
    prometheus.MustRegister(RequestCount)
    prometheus.MustRegister(ResponseTime)
}

2. HTTP中间件

package middleware

import (
    "net/http"
    "time"
    "your-app/metrics"
)

func MetricsMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        start := time.Now()
        
        // 包装ResponseWriter以获取状态码
        rw := &responseWriter{ResponseWriter: w, statusCode: http.StatusOK}
        
        next.ServeHTTP(rw, r)
        
        duration := time.Since(start).Seconds()
        
        // 记录指标
        metrics.RequestCount.WithLabelValues(
            r.Method,
            r.URL.Path,
            http.StatusText(rw.statusCode),
        ).Inc()
        
        metrics.ResponseTime.WithLabelValues(
            r.Method,
            r.URL.Path,
        ).Observe(duration)
    })
}

type responseWriter struct {
    http.ResponseWriter
    statusCode int
}

func (rw *responseWriter) WriteHeader(code int) {
    rw.statusCode = code
    rw.ResponseWriter.WriteHeader(code)
}

3. 健康检查端点

package health

import (
    "encoding/json"
    "net/http"
)

type HealthStatus struct {
    Status    string            `json:"status"`
    Timestamp string            `json:"timestamp"`
    Details   map[string]string `json:"details,omitempty"`
}

func HealthHandler(w http.ResponseWriter, r *http.Request) {
    status := HealthStatus{
        Status:    "healthy",
        Timestamp: time.Now().Format(time.RFC3339),
        Details: map[string]string{
            "database": "connected",
            "cache":    "active",
        },
    }
    
    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(status)
}

4. 主程序集成

package main

import (
    "log"
    "net/http"
    "your-app/metrics"
    "your-app/middleware"
    "your-app/health"
)

func main() {
    // 初始化指标
    metrics.Init()
    
    mux := http.NewServeMux()
    
    // 业务路由
    mux.HandleFunc("/api/users", usersHandler)
    mux.HandleFunc("/api/orders", ordersHandler)
    
    // 监控端点
    mux.Handle("/metrics", promhttp.Handler())
    mux.Handle("/health", health.HealthHandler)
    
    // 包装中间件
    handler := middleware.MetricsMiddleware(mux)
    
    log.Println("Server starting on :8080")
    log.Fatal(http.ListenAndServe(":8080", handler))
}

关键特性

  1. 实时指标收集:请求计数、响应时间、错误率
  2. Prometheus集成:标准指标格式,便于与监控系统集成
  3. 健康检查:服务状态监控和依赖检查
  4. 性能监控:响应时间分布统计

部署建议

  1. 配置Prometheus抓取 /metrics 端点
  2. 使用Grafana进行数据可视化
  3. 设置告警规则(如:错误率 > 5%)
  4. 集成日志收集(如:ELK Stack)

这个监控系统提供了基础的观测能力,可根据具体需求扩展更多监控维度和告警功能。

回到顶部