Golang中Cron导致CPU使用率逐日上升的问题如何解决

Golang中Cron导致CPU使用率逐日上升的问题如何解决我正在运行一个用Golang编写的cron任务，并使用MongoDB（版本4.4）作为数据库。该cron任务每隔5分钟运行一次。我注意到，当我启动cron服务后，在一周内CPU使用率会从0%飙升至60%甚至更高，因此我不得不重启cron服务，以免影响后续的cron任务执行。这表明存在内存泄漏，资源没有得到正确释放。我无法确定具体原因。在整个cron任务中，我使用单一的数据库连接，假设一个cron任务服务于2000个商户。我使用带缓冲通道的工作池来运行cron任务。以下是我使用工作池运行cron任务的代码：

func RunCronForDbNames(cronType, subsType string) {
	cronStatus := GetCronStatusConditions(cronType)
	if cronStatus {
		dbNames, _ := controller.GetDbNamesForCron(subsType, cronType)
		if len(dbNames) == 0 {
			return
		}
		client, ctx := ConnectDb("ip")
		defer client.Disconnect(ctx)
		cronDataChannel := make(chan view.CronChannelData, len(dbNames))
		var wg sync.WaitGroup
		startTime := time.Now().Unix()
		var cronChData view.CronChannelData
		cronChData.Database.Client = client
		cronChData.Database.Ctx = ctx
		cronChData.StartTime = startTime
		for _, dbName := range dbNames {
			isContinue := db.CheckGlobalCronStatus(cronType, dbNames)
			if !isContinue {
				continue
			}
			wg.Add(1)
			contextKeys := make(map[string]interface{})
			contextKeys["db_name"] = dbName
			contextKeys["role"] = ""
			var c gin.Context
			c.Keys = make(map[string]any)
			c.Keys = contextKeys
			cronChData.C = c
			cronChData.Database.MainDatabase = dbName
			cronDataChannel <- cronChData
		}
		close(cronDataChannel)
		for w := 1; w <= 10; w++ {
			go Worker(w, cronDataChannel, &wg, cronType)
		}
		wg.Wait()
	}
	return
}

我的工作器一次运行10个：

func Worker(id int, jobs <-chan view.CronChannelData, wg *sync.WaitGroup, cronType string) {
    switch cronType {
    case config.CompleteBookingCron:
        for job := range jobs {
            controller.CompleteBookingsCron(job, wg)
        }
    }
    return
}

在CompleteBookingsCron函数中，我使用wg.Done()来减少WaitGroup的计数，并将预订标记为已完成，然后根据客户设置发送邮件和短信。发送邮件和短信时使用了goroutine。

有人能帮我找出CPU使用率不断升高的原因吗？我应该遵循哪些做法来确保资源被正确释放，从而避免CPU使用率上升？

更多关于Golang中Cron导致CPU使用率逐日上升的问题如何解决的实战教程也可以访问 https://www.itying.com/category-94-b0.html

yuanlaile 1楼

更多关于Golang中Cron导致CPU使用率逐日上升的问题如何解决的实战系列教程也可以访问 https://www.itying.com/category-94-b0.html

从代码分析来看，CPU使用率逐日上升可能有以下几个原因：

1. Goroutine泄漏

在CompleteBookingsCron函数中使用goroutine发送邮件和短信，如果没有正确的同步机制，可能会导致goroutine堆积：

// 可能的问题代码示例
func CompleteBookingsCron(job view.CronChannelData, wg *sync.WaitGroup) {
    defer wg.Done()
    
    // 发送邮件 - 如果没有等待完成，goroutine会泄漏
    go sendEmail()
    
    // 发送短信 - 同上
    go sendSMS()
    
    // 主逻辑...
}

修复方案：使用WaitGroup等待所有goroutine完成：

func CompleteBookingsCron(job view.CronChannelData, wg *sync.WaitGroup) {
    defer wg.Done()
    
    var emailWg, smsWg sync.WaitGroup
    
    emailWg.Add(1)
    go func() {
        defer emailWg.Done()
        sendEmail()
    }()
    
    smsWg.Add(1)
    go func() {
        defer smsWg.Done()
        sendSMS()
    }()
    
    emailWg.Wait()
    smsWg.Wait()
    // 主逻辑...
}

2. MongoDB连接未正确关闭

虽然代码中有defer client.Disconnect(ctx)，但在worker中可能有额外的连接创建：

// 检查CompleteBookingsCron中是否有新的数据库连接
func CompleteBookingsCron(job view.CronChannelData, wg *sync.WaitGroup) {
    defer wg.Done()
    
    // 确保使用传入的连接，而不是创建新连接
    collection := job.Database.Client.Database(job.Database.MainDatabase).Collection("bookings")
    
    // 如果有查询操作，确保游标关闭
    cursor, err := collection.Find(job.Database.Ctx, bson.M{})
    if err != nil {
        return
    }
    defer cursor.Close(job.Database.Ctx) // 必须关闭游标
    
    // 处理数据...
}

3. 上下文未正确传播和取消

缺少上下文取消机制，可能导致goroutine挂起：

func RunCronForDbNames(cronType, subsType string) {
    cronStatus := GetCronStatusConditions(cronType)
    if cronStatus {
        dbNames, _ := controller.GetDbNamesForCron(subsType, cronType)
        if len(dbNames) == 0 {
            return
        }
        
        // 创建可取消的上下文
        ctx, cancel := context.WithTimeout(context.Background(), 4*time.Minute)
        defer cancel()
        
        client, _ := ConnectDb("ip")
        defer client.Disconnect(ctx)
        
        cronDataChannel := make(chan view.CronChannelData, len(dbNames))
        var wg sync.WaitGroup
        
        // 将上下文传递给worker
        for w := 1; w <= 10; w++ {
            go Worker(ctx, w, cronDataChannel, &wg, cronType)
        }
        
        // ... 其余代码
    }
}

4. 资源密集型操作未限制速率

如果商户数量大，可能需要限制并发：

func RunCronForDbNames(cronType, subsType string) {
    // ... 前面的代码
    
    // 使用信号量限制并发
    sem := make(chan struct{}, 50) // 限制最多50个并发操作
    
    for w := 1; w <= 10; w++ {
        go func(workerID int) {
            for job := range cronDataChannel {
                sem <- struct{}{}
                controller.CompleteBookingsCron(job, wg, sem)
            }
        }(w)
    }
    
    // ... 其余代码
}

func CompleteBookingsCron(job view.CronChannelData, wg *sync.WaitGroup, sem chan struct{}) {
    defer wg.Done()
    defer func() { <-sem }()
    
    // ... 业务逻辑
}

5. 内存泄漏检测

添加pprof监控来识别泄漏：

import (
    _ "net/http/pprof"
    "net/http"
)

func main() {
    // 启动pprof服务器
    go func() {
        http.ListenAndServe("localhost:6060", nil)
    }()
    
    // 主程序逻辑...
}

运行后可以通过go tool pprof http://localhost:6060/debug/pprof/heap分析内存使用情况。

6. 定时器未正确清理

如果使用了time.Ticker或time.Timer，确保正确清理：

func Worker(id int, jobs <-chan view.CronChannelData, wg *sync.WaitGroup, cronType string) {
    // 如果使用ticker，确保在函数退出时停止
    ticker := time.NewTicker(30 * time.Second)
    defer ticker.Stop() // 重要：防止goroutine泄漏
    
    for job := range jobs {
        select {
        case <-ticker.C:
            // 定期执行的操作
        default:
            controller.CompleteBookingsCron(job, wg)
        }
    }
}

关键点：

确保所有goroutine都有退出机制
数据库连接和游标必须正确关闭
使用context控制超时和取消
监控goroutine数量：runtime.NumGoroutine()
定期重启可能掩盖问题，需要找到根本原因