Golang二进制文件在两台相同Linux服务器上表现不一致

Golang二进制文件在两台相同Linux服务器上表现不一致 你好!我有一个用 golang 编写的服务,使用 grpc 连接一个 grpc-client 和一组 grpc-servers

我在运行于完全相同的节点上的 grpc-server 遇到了问题。这些节点是我在 linode.com 租用的。

节点规格如下所示(以 server-1server-2 为例):

server-1server-2 具有相同的配置:

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 16.04.5 LTS
Release:	16.04
Codename:	xenial

$ arch
x86_64

我的 grpc-server 有一个加密货币交易所池,这些交易所实现了通用接口。grpc-server 通过 grpc 接收来自 grpc-client 的请求,并在单独的 go 协程中启动池中的所有交易所,通过 HTTP 从互联网收集信息。每个 go 协程开始持续从特定的加密货币交易所收集数据,并将这些数据发送到一个公共通道 DataChan。所有数据从通道中读取,并通过 grpc 流发送到 grpc-client

但与此同时,在 server-1 上一切正常,所有数据都能持续发送。 而在 server-2 上,我的程序在任何一个 go 协程有机会首次向互联网发出请求之前,就开始消耗超过 90% 的 CPU。

我的二进制文件是在具有相同架构和操作系统版本的 bamboo 服务器上构建的。

我没有连接问题,使用 curl 检查过。

可以看到在 server-1 上,collector-ser(我的 grpc-server)建立了许多连接(因此我对交易所的请求按预期工作):

@server-1:~$ netstat -nputw
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 172.104.157.111:44722   104.20.147.108:443      TIME_WAIT   -               
tcp        0      0 172.104.157.111:32844   104.17.177.152:443      ESTABLISHED 21056/collector-ser
tcp        0      0 172.104.157.111:52996   104.20.199.74:443       ESTABLISHED 21056/collector-ser
tcp        0      0 127.0.0.1:9001          127.0.0.1:35250         ESTABLISHED -               
tcp        0      0 172.104.157.111:57636   149.154.167.220:443     TIME_WAIT   -               
tcp        0      0 172.104.157.111:33708   104.17.154.108:443      ESTABLISHED 21056/collector-ser
tcp        0      0 172.104.157.111:7888    89.33.219.167:37860     ESTABLISHED -               
tcp        0    215 172.104.157.111:42774   47.56.56.151:80         ESTABLISHED 21056/collector-ser
tcp        0      0 172.104.157.111:33068   104.17.6.188:443        TIME_WAIT   -               
tcp        0      0 172.104.157.111:34906   104.16.233.188:443      ESTABLISHED 21056/collector-ser
tcp        0      0 172.104.157.111:54140   143.204.101.87:443      ESTABLISHED 21056/collector-ser
tcp        0      0 172.104.157.111:39856   104.17.179.152:443      TIME_WAIT   -               
tcp        0      0 172.104.157.111:50390   104.20.190.108:443      ESTABLISHED 21056/collector-ser
tcp        0    155 172.104.157.111:57586   103.206.42.112:443      ESTABLISHED 21056/collector-ser
tcp        0      0 172.104.157.111:37740   107.154.248.133:443     ESTABLISHED 21056/collector-ser
tcp        0      0 172.104.157.111:59366   104.19.245.31:443       TIME_WAIT   -               
tcp        0      0 172.104.157.111:54814   143.204.101.74:443      TIME_WAIT   -               
tcp        0      0 172.104.157.111:46198   104.25.8.112:443        ESTABLISHED 21056/collector-ser
tcp        0      0 172.104.157.111:35940   143.204.214.30:443      TIME_WAIT   -               
tcp        0      0 172.104.157.111:53358   104.18.216.39:443       ESTABLISHED 21056/collector-ser
tcp        0      0 127.0.0.1:35250         127.0.0.1:9001          ESTABLISHED 17524/darvin-courie
tcp        0      0 172.104.157.111:49054   13.225.78.21:443        ESTABLISHED 21056/collector-ser
tcp        0      0 172.104.157.111:57634   149.154.167.220:443     TIME_WAIT   -               
tcp        0      0 172.104.157.111:33484   2.16.106.88:443         ESTABLISHED 21056/collector-ser
tcp        0      0 172.104.157.111:57144   13.224.196.108:443      ESTABLISHED 21056/collector-ser
tcp        0      0 172.104.157.111:48702   47.244.38.215:443       TIME_WAIT   -               
tcp        0      0 172.104.157.111:49260   47.244.38.215:443       ESTABLISHED 21056/collector-ser
tcp        0     44 172.104.157.111:38022   13.230.49.199:443       ESTABLISHED 21056/collector-ser
tcp        0     44 172.104.157.111:52372   15.164.106.249:443      ESTABLISHED 21056/collector-ser
tcp        0      0 172.104.157.111:43276   217.182.199.239:443     ESTABLISHED 21056/collector-ser
tcp        0      0 172.104.157.111:55928   13.32.158.212:443       TIME_WAIT   -               
tcp        0      0 172.104.157.111:50886   104.19.246.31:443       ESTABLISHED 21056/collector-ser
tcp        0      0 172.104.157.111:49956   104.20.22.137:443       ESTABLISHED 21056/collector-ser
tcp        0      0 172.104.157.111:54632   104.17.185.234:443      ESTABLISHED 21056/collector-ser
tcp        0      0 172.104.157.111:44468   104.25.7.112:443        TIME_WAIT   -               
tcp        0      0 172.104.157.111:59174   104.17.7.188:443        ESTABLISHED 21056/collector-ser
tcp        0      0 172.104.157.111:50304   52.17.153.78:443        ESTABLISHED 21056/collector-ser
tcp        0      0 172.104.157.111:50274   52.17.153.78:443        TIME_WAIT   -               
tcp        0      0 172.104.157.111:47538   104.17.184.234:443      TIME_WAIT   -               
tcp        0      0 172.104.157.111:37416   104.16.127.19:443       TIME_WAIT   -               
tcp        0      0 172.104.157.111:37970   104.16.127.19:443       ESTABLISHED 21056/collector-ser
tcp        0    126 172.104.157.111:33780   13.112.137.99:80        ESTABLISHED 21056/collector-ser
tcp        0      0 172.104.157.111:51914   104.19.212.87:443       ESTABLISHED 21056/collector-ser
tcp        0      0 172.104.157.111:34344   104.16.233.188:443      TIME_WAIT   -               
tcp        0      0 172.104.157.111:51358   104.19.212.87:443       TIME_WAIT   -               
tcp        0      0 172.104.157.111:49832   104.20.190.108:443      TIME_WAIT   -               
tcp        0      0 172.104.157.111:36180   104.16.144.70:443       TIME_WAIT   -               
tcp        0      0 172.104.157.111:45286   104.20.147.108:443      ESTABLISHED 21056/collector-ser
tcp        0    368 172.104.157.111:57412   47.52.123.127:443       ESTABLISHED 21056/collector-ser
tcp        0      0 172.104.157.111:51166   104.17.156.108:443      TIME_WAIT   -               
tcp        0      0 172.104.157.111:52434   104.20.199.74:443       TIME_WAIT   -               
tcp        0      0 172.104.157.111:36730   104.16.144.70:443       ESTABLISHED 21056/collector-ser
tcp        0      0 172.104.157.111:57452   104.20.21.137:443       TIME_WAIT   -    

但是在 server-2 上,你只能看到一个已建立的连接(这是与 grpc-client 的连接):

server-2:~$ netstat -nputw
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 172.104.246.194:7888    89.33.219.167:45392     ESTABLISHED -               
tcp6       0      0 192.168.129.202:9888    192.168.128.90:44900    ESTABLISHED -               
tcp6       0      0 192.168.129.202:9888    192.168.128.90:46058    ESTABLISHED -               
tcp6       0      0 192.168.129.202:9888    192.168.128.90:44912    ESTABLISHED -               
tcp6       0      0 192.168.129.202:9888    192.168.128.90:48092    ESTABLISHED -               
tcp6       0      0 192.168.129.202:9888    192.168.128.90:46878    ESTABLISHED -               
tcp6       0      0 192.168.129.202:9888    192.168.128.90:48084    ESTABLISHED -               
tcp6       0      0 192.168.129.202:9888    192.168.128.90:44902    ESTABLISHED -               
tcp6       0      0 192.168.129.202:9888    192.168.128.90:50540    ESTABLISHED -               
tcp6      30      0 192.168.129.202:10555   192.168.139.182:58262   ESTABLISHED 28591/collector-ser
tcp6       0      0 192.168.129.202:9888    192.168.128.90:44916    ESTABLISHED -         

这是两个服务器的 top 输出:

server-1:~$ top -bn 1 | head -15
top - 11:50:23 up 23 min,  1 user,  load average: 0.33, 0.47, 0.33
Tasks: 125 total,   2 running,  59 sleeping,   0 stopped,   0 zombie
%Cpu(s):  8.5 us,  1.2 sy,  0.0 ni, 81.3 id,  0.1 wa,  0.7 hi,  0.4 si,  7.9 st
KiB Mem :  4022248 total,  3398316 free,   130056 used,   493876 buff/cache
KiB Swap:   262140 total,   262140 free,        0 used.  3640232 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1159 plutos    20   0  395252  30816  14236 S  12.5  0.8   4:18.49 collector-serve
 1137 root      20   0   60152  20048   7340 R   6.2  0.5   0:33.70 supervisord
    1 root      20   0   37844   5744   3908 S   0.0  0.1   0:01.65 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kthreadd
    3 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 rcu_gp
    4 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 rcu_par_gp
    5 root      20   0       0      0      0 I   0.0  0.0   0:00.04 kworker/0:0-eve
    6 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 kworker/0:0H-kb
server-2:~$ top -bn 1 | head -15
top - 11:50:35 up 18 min,  1 user,  load average: 1.00, 0.95, 0.61
Tasks: 126 total,   2 running,  59 sleeping,   0 stopped,   0 zombie
%Cpu(s): 25.9 us,  0.4 sy,  0.0 ni, 54.8 id,  0.1 wa,  1.0 hi,  0.0 si, 17.7 st
KiB Mem :  4022248 total,  3765628 free,   119012 used,   137608 buff/cache
KiB Swap:   262140 total,   262140 free,        0 used.  3696904 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1247 plutos    20   0  378092  16524  10972 R  93.8  0.4   7:16.65 collector-serve
 1367 plutos    20   0   41664   3584   3068 R   6.2  0.1   0:00.01 top
    1 root      20   0   37804   5788   3956 S   0.0  0.1   0:01.76 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.01 kthreadd
    3 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 rcu_gp
    4 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 rcu_par_gp
    6 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 kworker/0:0H-kb
    8 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 mm_percpu_wq

下面是我的部分代码(为了简化,我没有提供完整示例):

我的入口点 main.go

package main

import (
	server2 "path-to-grpc-server-implementation/server"
	pb "path-to-proto-files/proto"
	"fmt"
	"google.golang.org/grpc"
	"net"	
)

func main() {
	lis, err := net.Listen("tcp", fmt.Sprintf("0.0.0.0:%d", myPort))
	if

更多关于Golang二进制文件在两台相同Linux服务器上表现不一致的实战教程也可以访问 https://www.itying.com/category-94-b0.html

1 回复

更多关于Golang二进制文件在两台相同Linux服务器上表现不一致的实战系列教程也可以访问 https://www.itying.com/category-94-b0.html


根据你的描述,问题很可能出现在goroutine调度或并发控制上。在server-2上观察到的高CPU使用率表明可能存在goroutine泄漏、无限循环或通道阻塞问题。

以下是几个可能的原因和对应的排查方法:

1. 检查goroutine泄漏

在你的grpc-server中添加goroutine监控:

package main

import (
	"net/http"
	_ "net/http/pprof"
	"runtime"
	"time"
)

func monitorGoroutines() {
	go func() {
		for {
			time.Sleep(30 * time.Second)
			num := runtime.NumGoroutine()
			fmt.Printf("当前goroutine数量: %d\n", num)
		}
	}()
}

func main() {
	// 启动pprof用于调试
	go func() {
		http.ListenAndServe(":6060", nil)
	}()
	
	monitorGoroutines()
	// ... 你的其他代码
}

2. 检查通道阻塞

在你的数据收集逻辑中,确保通道操作不会导致goroutine阻塞:

func (e *Exchange) collectData(dataChan chan<- Data) {
	ticker := time.NewTicker(1 * time.Second)
	defer ticker.Stop()
	
	for {
		select {
		case <-ticker.C:
			data, err := e.fetchData()
			if err != nil {
				fmt.Printf("交易所 %s 获取数据失败: %v\n", e.Name, err)
				continue
			}
			
			// 使用select避免通道阻塞
			select {
			case dataChan <- data:
				// 数据成功发送
			case <-time.After(5 * time.Second):
				fmt.Printf("警告: 向通道发送数据超时, 交易所: %s\n", e.Name)
			}
		case <-e.ctx.Done():
			return
		}
	}
}

3. 添加超时控制

为HTTP请求添加超时控制:

func (e *Exchange) fetchData() (Data, error) {
	client := &http.Client{
		Timeout: 10 * time.Second,
	}
	
	req, err := http.NewRequest("GET", e.APIEndpoint, nil)
	if err != nil {
		return Data{}, err
	}
	
	resp, err := client.Do(req)
	if err != nil {
		return Data{}, err
	}
	defer resp.Body.Close()
	
	// 处理响应数据
	// ...
}

4. 检查上下文传播

确保在所有goroutine中正确传播上下文:

func startExchanges(ctx context.Context, exchanges []Exchange, dataChan chan Data) {
	for _, exchange := range exchanges {
		go func(e Exchange) {
			e.collectData(ctx, dataChan)
		}(exchange)
	}
}

func (e *Exchange) collectData(ctx context.Context, dataChan chan<- Data) {
	// 使用传入的上下文
	ticker := time.NewTicker(1 * time.Second)
	defer ticker.Stop()
	
	for {
		select {
		case <-ctx.Done():
			return
		case <-ticker.C:
			// 获取数据逻辑
		}
	}
}

5. 立即检查server-2的状态

在server-2上运行以下命令获取详细信息:

# 查看goroutine堆栈
curl http://localhost:6060/debug/pprof/goroutine?debug=2

# 查看CPU profile
go tool pprof http://localhost:6060/debug/pprof/profile

# 查看进程的线程数量
ps -eLf | grep collector-serve | wc -l

6. 资源限制检查

检查两个服务器的资源限制是否相同:

# 检查文件描述符限制
ulimit -n

# 检查内存限制
cat /proc/$(pgrep collector-serve)/limits

问题最可能出现在goroutine管理或通道操作上。建议先在server-2上启用pprof监控,然后分析高CPU使用率时goroutine的状态和堆栈信息。

回到顶部