Golang二进制文件在两台相同Linux服务器上表现不一致
Golang二进制文件在两台相同Linux服务器上表现不一致
你好!我有一个用 golang 编写的服务,使用 grpc 连接一个 grpc-client 和一组 grpc-servers。
我在运行于完全相同的节点上的 grpc-server 遇到了问题。这些节点是我在 linode.com 租用的。
节点规格如下所示(以 server-1 和 server-2 为例):
server-1 和 server-2 具有相同的配置:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.5 LTS
Release: 16.04
Codename: xenial
$ arch
x86_64
我的 grpc-server 有一个加密货币交易所池,这些交易所实现了通用接口。grpc-server 通过 grpc 接收来自 grpc-client 的请求,并在单独的 go 协程中启动池中的所有交易所,通过 HTTP 从互联网收集信息。每个 go 协程开始持续从特定的加密货币交易所收集数据,并将这些数据发送到一个公共通道 DataChan。所有数据从通道中读取,并通过 grpc 流发送到 grpc-client。
但与此同时,在 server-1 上一切正常,所有数据都能持续发送。 而在 server-2 上,我的程序在任何一个 go 协程有机会首次向互联网发出请求之前,就开始消耗超过 90% 的 CPU。
我的二进制文件是在具有相同架构和操作系统版本的 bamboo 服务器上构建的。
我没有连接问题,使用 curl 检查过。
可以看到在 server-1 上,collector-ser(我的 grpc-server)建立了许多连接(因此我对交易所的请求按预期工作):
@server-1:~$ netstat -nputw
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 172.104.157.111:44722 104.20.147.108:443 TIME_WAIT -
tcp 0 0 172.104.157.111:32844 104.17.177.152:443 ESTABLISHED 21056/collector-ser
tcp 0 0 172.104.157.111:52996 104.20.199.74:443 ESTABLISHED 21056/collector-ser
tcp 0 0 127.0.0.1:9001 127.0.0.1:35250 ESTABLISHED -
tcp 0 0 172.104.157.111:57636 149.154.167.220:443 TIME_WAIT -
tcp 0 0 172.104.157.111:33708 104.17.154.108:443 ESTABLISHED 21056/collector-ser
tcp 0 0 172.104.157.111:7888 89.33.219.167:37860 ESTABLISHED -
tcp 0 215 172.104.157.111:42774 47.56.56.151:80 ESTABLISHED 21056/collector-ser
tcp 0 0 172.104.157.111:33068 104.17.6.188:443 TIME_WAIT -
tcp 0 0 172.104.157.111:34906 104.16.233.188:443 ESTABLISHED 21056/collector-ser
tcp 0 0 172.104.157.111:54140 143.204.101.87:443 ESTABLISHED 21056/collector-ser
tcp 0 0 172.104.157.111:39856 104.17.179.152:443 TIME_WAIT -
tcp 0 0 172.104.157.111:50390 104.20.190.108:443 ESTABLISHED 21056/collector-ser
tcp 0 155 172.104.157.111:57586 103.206.42.112:443 ESTABLISHED 21056/collector-ser
tcp 0 0 172.104.157.111:37740 107.154.248.133:443 ESTABLISHED 21056/collector-ser
tcp 0 0 172.104.157.111:59366 104.19.245.31:443 TIME_WAIT -
tcp 0 0 172.104.157.111:54814 143.204.101.74:443 TIME_WAIT -
tcp 0 0 172.104.157.111:46198 104.25.8.112:443 ESTABLISHED 21056/collector-ser
tcp 0 0 172.104.157.111:35940 143.204.214.30:443 TIME_WAIT -
tcp 0 0 172.104.157.111:53358 104.18.216.39:443 ESTABLISHED 21056/collector-ser
tcp 0 0 127.0.0.1:35250 127.0.0.1:9001 ESTABLISHED 17524/darvin-courie
tcp 0 0 172.104.157.111:49054 13.225.78.21:443 ESTABLISHED 21056/collector-ser
tcp 0 0 172.104.157.111:57634 149.154.167.220:443 TIME_WAIT -
tcp 0 0 172.104.157.111:33484 2.16.106.88:443 ESTABLISHED 21056/collector-ser
tcp 0 0 172.104.157.111:57144 13.224.196.108:443 ESTABLISHED 21056/collector-ser
tcp 0 0 172.104.157.111:48702 47.244.38.215:443 TIME_WAIT -
tcp 0 0 172.104.157.111:49260 47.244.38.215:443 ESTABLISHED 21056/collector-ser
tcp 0 44 172.104.157.111:38022 13.230.49.199:443 ESTABLISHED 21056/collector-ser
tcp 0 44 172.104.157.111:52372 15.164.106.249:443 ESTABLISHED 21056/collector-ser
tcp 0 0 172.104.157.111:43276 217.182.199.239:443 ESTABLISHED 21056/collector-ser
tcp 0 0 172.104.157.111:55928 13.32.158.212:443 TIME_WAIT -
tcp 0 0 172.104.157.111:50886 104.19.246.31:443 ESTABLISHED 21056/collector-ser
tcp 0 0 172.104.157.111:49956 104.20.22.137:443 ESTABLISHED 21056/collector-ser
tcp 0 0 172.104.157.111:54632 104.17.185.234:443 ESTABLISHED 21056/collector-ser
tcp 0 0 172.104.157.111:44468 104.25.7.112:443 TIME_WAIT -
tcp 0 0 172.104.157.111:59174 104.17.7.188:443 ESTABLISHED 21056/collector-ser
tcp 0 0 172.104.157.111:50304 52.17.153.78:443 ESTABLISHED 21056/collector-ser
tcp 0 0 172.104.157.111:50274 52.17.153.78:443 TIME_WAIT -
tcp 0 0 172.104.157.111:47538 104.17.184.234:443 TIME_WAIT -
tcp 0 0 172.104.157.111:37416 104.16.127.19:443 TIME_WAIT -
tcp 0 0 172.104.157.111:37970 104.16.127.19:443 ESTABLISHED 21056/collector-ser
tcp 0 126 172.104.157.111:33780 13.112.137.99:80 ESTABLISHED 21056/collector-ser
tcp 0 0 172.104.157.111:51914 104.19.212.87:443 ESTABLISHED 21056/collector-ser
tcp 0 0 172.104.157.111:34344 104.16.233.188:443 TIME_WAIT -
tcp 0 0 172.104.157.111:51358 104.19.212.87:443 TIME_WAIT -
tcp 0 0 172.104.157.111:49832 104.20.190.108:443 TIME_WAIT -
tcp 0 0 172.104.157.111:36180 104.16.144.70:443 TIME_WAIT -
tcp 0 0 172.104.157.111:45286 104.20.147.108:443 ESTABLISHED 21056/collector-ser
tcp 0 368 172.104.157.111:57412 47.52.123.127:443 ESTABLISHED 21056/collector-ser
tcp 0 0 172.104.157.111:51166 104.17.156.108:443 TIME_WAIT -
tcp 0 0 172.104.157.111:52434 104.20.199.74:443 TIME_WAIT -
tcp 0 0 172.104.157.111:36730 104.16.144.70:443 ESTABLISHED 21056/collector-ser
tcp 0 0 172.104.157.111:57452 104.20.21.137:443 TIME_WAIT -
但是在 server-2 上,你只能看到一个已建立的连接(这是与 grpc-client 的连接):
server-2:~$ netstat -nputw
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 172.104.246.194:7888 89.33.219.167:45392 ESTABLISHED -
tcp6 0 0 192.168.129.202:9888 192.168.128.90:44900 ESTABLISHED -
tcp6 0 0 192.168.129.202:9888 192.168.128.90:46058 ESTABLISHED -
tcp6 0 0 192.168.129.202:9888 192.168.128.90:44912 ESTABLISHED -
tcp6 0 0 192.168.129.202:9888 192.168.128.90:48092 ESTABLISHED -
tcp6 0 0 192.168.129.202:9888 192.168.128.90:46878 ESTABLISHED -
tcp6 0 0 192.168.129.202:9888 192.168.128.90:48084 ESTABLISHED -
tcp6 0 0 192.168.129.202:9888 192.168.128.90:44902 ESTABLISHED -
tcp6 0 0 192.168.129.202:9888 192.168.128.90:50540 ESTABLISHED -
tcp6 30 0 192.168.129.202:10555 192.168.139.182:58262 ESTABLISHED 28591/collector-ser
tcp6 0 0 192.168.129.202:9888 192.168.128.90:44916 ESTABLISHED -
这是两个服务器的 top 输出:
server-1:~$ top -bn 1 | head -15
top - 11:50:23 up 23 min, 1 user, load average: 0.33, 0.47, 0.33
Tasks: 125 total, 2 running, 59 sleeping, 0 stopped, 0 zombie
%Cpu(s): 8.5 us, 1.2 sy, 0.0 ni, 81.3 id, 0.1 wa, 0.7 hi, 0.4 si, 7.9 st
KiB Mem : 4022248 total, 3398316 free, 130056 used, 493876 buff/cache
KiB Swap: 262140 total, 262140 free, 0 used. 3640232 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1159 plutos 20 0 395252 30816 14236 S 12.5 0.8 4:18.49 collector-serve
1137 root 20 0 60152 20048 7340 R 6.2 0.5 0:33.70 supervisord
1 root 20 0 37844 5744 3908 S 0.0 0.1 0:01.65 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par_gp
5 root 20 0 0 0 0 I 0.0 0.0 0:00.04 kworker/0:0-eve
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H-kb
server-2:~$ top -bn 1 | head -15
top - 11:50:35 up 18 min, 1 user, load average: 1.00, 0.95, 0.61
Tasks: 126 total, 2 running, 59 sleeping, 0 stopped, 0 zombie
%Cpu(s): 25.9 us, 0.4 sy, 0.0 ni, 54.8 id, 0.1 wa, 1.0 hi, 0.0 si, 17.7 st
KiB Mem : 4022248 total, 3765628 free, 119012 used, 137608 buff/cache
KiB Swap: 262140 total, 262140 free, 0 used. 3696904 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1247 plutos 20 0 378092 16524 10972 R 93.8 0.4 7:16.65 collector-serve
1367 plutos 20 0 41664 3584 3068 R 6.2 0.1 0:00.01 top
1 root 20 0 37804 5788 3956 S 0.0 0.1 0:01.76 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par_gp
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H-kb
8 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_percpu_wq
下面是我的部分代码(为了简化,我没有提供完整示例):
我的入口点 main.go:
package main
import (
server2 "path-to-grpc-server-implementation/server"
pb "path-to-proto-files/proto"
"fmt"
"google.golang.org/grpc"
"net"
)
func main() {
lis, err := net.Listen("tcp", fmt.Sprintf("0.0.0.0:%d", myPort))
if更多关于Golang二进制文件在两台相同Linux服务器上表现不一致的实战教程也可以访问 https://www.itying.com/category-94-b0.html
更多关于Golang二进制文件在两台相同Linux服务器上表现不一致的实战系列教程也可以访问 https://www.itying.com/category-94-b0.html
根据你的描述,问题很可能出现在goroutine调度或并发控制上。在server-2上观察到的高CPU使用率表明可能存在goroutine泄漏、无限循环或通道阻塞问题。
以下是几个可能的原因和对应的排查方法:
1. 检查goroutine泄漏
在你的grpc-server中添加goroutine监控:
package main
import (
"net/http"
_ "net/http/pprof"
"runtime"
"time"
)
func monitorGoroutines() {
go func() {
for {
time.Sleep(30 * time.Second)
num := runtime.NumGoroutine()
fmt.Printf("当前goroutine数量: %d\n", num)
}
}()
}
func main() {
// 启动pprof用于调试
go func() {
http.ListenAndServe(":6060", nil)
}()
monitorGoroutines()
// ... 你的其他代码
}
2. 检查通道阻塞
在你的数据收集逻辑中,确保通道操作不会导致goroutine阻塞:
func (e *Exchange) collectData(dataChan chan<- Data) {
ticker := time.NewTicker(1 * time.Second)
defer ticker.Stop()
for {
select {
case <-ticker.C:
data, err := e.fetchData()
if err != nil {
fmt.Printf("交易所 %s 获取数据失败: %v\n", e.Name, err)
continue
}
// 使用select避免通道阻塞
select {
case dataChan <- data:
// 数据成功发送
case <-time.After(5 * time.Second):
fmt.Printf("警告: 向通道发送数据超时, 交易所: %s\n", e.Name)
}
case <-e.ctx.Done():
return
}
}
}
3. 添加超时控制
为HTTP请求添加超时控制:
func (e *Exchange) fetchData() (Data, error) {
client := &http.Client{
Timeout: 10 * time.Second,
}
req, err := http.NewRequest("GET", e.APIEndpoint, nil)
if err != nil {
return Data{}, err
}
resp, err := client.Do(req)
if err != nil {
return Data{}, err
}
defer resp.Body.Close()
// 处理响应数据
// ...
}
4. 检查上下文传播
确保在所有goroutine中正确传播上下文:
func startExchanges(ctx context.Context, exchanges []Exchange, dataChan chan Data) {
for _, exchange := range exchanges {
go func(e Exchange) {
e.collectData(ctx, dataChan)
}(exchange)
}
}
func (e *Exchange) collectData(ctx context.Context, dataChan chan<- Data) {
// 使用传入的上下文
ticker := time.NewTicker(1 * time.Second)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
// 获取数据逻辑
}
}
}
5. 立即检查server-2的状态
在server-2上运行以下命令获取详细信息:
# 查看goroutine堆栈
curl http://localhost:6060/debug/pprof/goroutine?debug=2
# 查看CPU profile
go tool pprof http://localhost:6060/debug/pprof/profile
# 查看进程的线程数量
ps -eLf | grep collector-serve | wc -l
6. 资源限制检查
检查两个服务器的资源限制是否相同:
# 检查文件描述符限制
ulimit -n
# 检查内存限制
cat /proc/$(pgrep collector-serve)/limits
问题最可能出现在goroutine管理或通道操作上。建议先在server-2上启用pprof监控,然后分析高CPU使用率时goroutine的状态和堆栈信息。

