Golang中数组的引用传递与值传递对比

Golang中数组的引用传递与值传递对比我有一段正在运行的代码，正试图减少其内存使用量。

我有一个 primes 切片，会将其传递给一个 go 函数 进行并行处理，这个切片可能非常大（多达数亿个质数）。它在创建后是只读的，不会更改。

与相同程序的 Rust 版本相比，go 函数 消耗了大量内存，因为 Rust 版本只通过引用传递 primes，而不是传值（我最初传递的是质数数组的副本，直到我学会了如何通过引用传递，内存使用量才显著下降）。

Go 版本的内存使用行为类似于更改前的 Rust 版本，因此我查阅了文档，并尝试根据此文档通过引用传递 primes：

数组在 Go 语言中的示例 - golangprograms.com

在本教程中，您将学习如何在 Golang 中声明和初始化数组以及如何访问其元素。

以下是原始代码的片段。

  var wg sync.WaitGroup
  for i, r_hi := range rescousins {
    wg.Add(1)
    go func(i, r_hi int) {
      defer wg.Done()
      l, c := cousins_sieve(r_hi, kmin, kmax, kb, start_num, end_num, modpg, primes, resinvrs)
      lastcousins[i] = l; ■■■■[i] = c
      fmt.Printf("\r%d of %d cousinpairs done", (i + 1), pairscnt)
    }(i, r_hi)
  }
  wg.Wait()

这是修改后通过引用传递 primes 的版本。

  refprimes := &primes
  var wg sync.WaitGroup
  for i, r_hi := range rescousins {
    wg.Add(1)
    go func(i, r_hi int) {
      defer wg.Done()
      l, c := cousins_sieve(r_hi, kmin, kmax, kb, start_num, end_num, modpg, *refprimes, resinvrs)
      lastcousins[i] = l; ■■■■[i] = c
      fmt.Printf("\r%d of %d cousinpairs done", (i + 1), pairscnt)
    }(i, r_hi)
  }
  wg.Wait()

在这两种情况下，似乎 primes 的数据仍然被复制到每个线程中使用。

能否以某种方式重写代码，使其不出现这种情况，从而让 primes 数据可以被共享，而不必复制才能使用？

更多关于Golang中数组的引用传递与值传递对比的实战教程也可以访问 https://www.itying.com/category-94-b0.html

gougou168 1楼

你能分享一下 primes 变量的定义吗？或者你能分享完整的脚本吗？我认为你将其定义为了数组而不是切片。

更多关于Golang中数组的引用传递与值传递对比的实战系列教程也可以访问 https://www.itying.com/category-94-b0.html

sinazl 2楼

是的！ *&primes 等于 primes 变量。

在你的代码中： refprimes 等于 &primes *refprimes 等于 *&primes 因此 *refprimes 等于 primes

wuwangju 3楼

我正在使用 1.17.5 版本。

这些方法都无效，而且无论我如何尝试让程序编译，它们都使用了相同的内存。

所以看起来无论怎样，primes 中的数据都没有被共享，而是完整地复制到了每个线程中，这不是我想要的。

因此，最终的问题是：Go 是否允许在每个线程中共享 primes 的数据？如果允许，具体如何实现？

nodeper 4楼

如果我没理解错的话，您没有通过引用来传递数组。

这里有一个示例：

package main

import (
	"fmt"
	"sync"
)

func main() {
	var wg sync.WaitGroup

	arr := [5]int{1, 2, 4, 6, 10}
	fmt.Println(arr)

	wg.Add(1)
	go func() {
		defer wg.Done()
		toPrimes(&arr)
	}()

	wg.Wait()
	fmt.Println(arr)
}

func toPrimes(ptr *[5]int) {
	for i := 0; i < len(*ptr); i++ {
		ptr[i] += 1
	}
}

一个更好的方法可能是使用整数切片而不是整数数组。

package main

import (
	"fmt"
	"sync"
)

func main() {
	var wg sync.WaitGroup

	si := []int{1, 2, 4, 6, 10}
	fmt.Println(si)

	wg.Add(1)
	go func() {
		defer wg.Done()
		toPrimes(si)
	}()

	wg.Wait()
	fmt.Println(si)
}

func toPrimes(primes []int) {
	for i := 0; i < len(primes); i++ {
		primes[i] += 1
	}
}

希望这能有所帮助！

yibo5220 5楼

在Go语言中，切片（slice）本身就是引用类型，包含指向底层数组的指针。你的代码中primes已经是切片，无需额外取地址操作。问题在于你在goroutine中通过*refprimes解引用，这实际上传递的是切片值（包含指针、长度和容量的结构体），但底层数组是共享的。

以下是分析和修正：

// 原始代码（切片已共享底层数组）
var wg sync.WaitGroup
for i, r_hi := range rescousins {
    wg.Add(1)
    go func(i, r_hi int) {
        defer wg.Done()
        // primes切片直接传递，底层数组不会被复制
        l, c := cousins_sieve(r_hi, kmin, kmax, kb, start_num, end_num, modpg, primes, resinvrs)
        lastcousins[i] = l; ■■■■[i] = c
        fmt.Printf("\r%d of %d cousinpairs done", (i+1), pairscnt)
    }(i, r_hi)
}
wg.Wait()

关键点：

切片在Go中传递时，底层数组不会被复制
每个goroutine获得的是切片的副本（包含指针、长度、容量），但指向相同的底层数组
内存占用增加可能是由于其他原因，例如：
- 在cousins_sieve函数内部可能创建了副本
- 切片在goroutine外部被修改导致重新分配

验证切片共享的示例：

package main

import (
    "fmt"
    "sync"
)

func processSlice(id int, s []int, wg *sync.WaitGroup) {
    defer wg.Done()
    // 修改底层数组元素
    if len(s) > 0 {
        s[0] = id // 所有goroutine都会修改同一个底层数组位置
    }
    fmt.Printf("goroutine %d: %v\n", id, s[:3])
}

func main() {
    primes := make([]int, 100)
    for i := range primes {
        primes[i] = i
    }

    var wg sync.WaitGroup
    for i := 0; i < 5; i++ {
        wg.Add(1)
        go processSlice(i, primes, &wg)
    }
    wg.Wait()
    
    fmt.Println("First element after all goroutines:", primes[0])
}

如果确实需要减少内存，检查以下方面：

cousins_sieve函数内部是否创建了切片副本
确保没有对primes进行追加操作（可能导致重新分配）
使用sync.Pool重用临时切片

// 如果cousins_sieve内部需要修改，先复制局部使用的部分
func cousins_sieve(/* 参数 */, primes []int, /* 更多参数 */) (int, int) {
    // 只复制需要的部分，而不是整个切片
    localPrimes := make([]int, len(primes))
    copy(localPrimes, primes)
    // 使用localPrimes进行处理
}

使用指针传递切片结构体（通常不必要）：

// 极少需要这样做，仅当需要修改切片本身（长度/容量）时
func processByReference(primes *[]int) {
    // 通过指针访问切片
    s := *primes
    _ = s[0]
}

你的代码中refprimes := &primes和*refprimes的用法是多余的，直接传递primes切片即可共享底层数组。内存问题可能源于其他地方的切片复制或重新分配。

gougou168 6楼

以下是完整的代码。

gist.github.com

cousinprimes_ssoz.go

// This Go source file is a multiple threaded implementation to perform an
// extremely fast Segmented Sieve of Zakiya (SSoZ) to find Cousin Primes <= N.

// Inputs are single values N, or ranges N1 and N2, of 64-bits, 0 -- 2^64 - 1.
// Output is the number of cousiin primes <= N, or in range N1 to N2; the last
// cousin prime value for the range; and the total time of execution.

// This code was developed on a System76 laptop with an Intel I7 6700HQ cpu,
// 2.6-3.5 GHz clock, with 8 threads, and 16GB of memory. Parameter tuning
// probably needed to optimize for other hardware systems (ARM, PowerPC, etc).

此文件已被截断。显示原文

我在 func sozpg 中创建了 primes，并在 go func cousins_sieve 中将其用作输入。

我的系统内存为 16 GB，因此可以运行以下示例。

➜  go-projects echo 11844600000000000 11844601500991000 | ./cousinprimes_ssoz
threads = 8
using Prime Generator parameters for P11
segment size = 262144 resgroups; seg array is [1 x 4096] 64-bits
cousinprime candidates = 87720435; resgroups = 649781
each of 135 threads has nextp[2 x 6240199] array
setup time = 261.943245ms 
perform cousinprimes ssoz sieve
135 of 135 cousinpairs done
sieve time = 14.021829717s
total time = 14.283803133s
last segment = 125493 resgroups; segment slices = 3
total cousins = 1446744; last cousin = 11844601500989267/-4%

我使用 htop 来监控线程和内存使用情况（在一台 i7-6700HQ 4核|8线程，2.6-3.5 GHz 的 Linux 笔记本电脑上）。

此输入在 Go 版本中最大占用约 14.5 GB 内存；而 Rust 版本最大占用内存小于 4 GB。以下是 Rust 代码。

gist.github.com

cousinprimes_ssoz.rs

// This Rust source file is a multiple threaded implementation to perform an
// extremely fast Segmented Sieve of Zakiya (SSoZ) to find Cousin Primes <= N.

// Inputs are single values N, or ranges N1 and N2, of 64-bits, 0 -- 2^64 - 1.
// Output is the number of cousin primes <= N, or in range N1 to N2; the last
// cousin prime value for the range; and the total time of execution.

// This code was developed on a System76 laptop with an Intel I7 6700HQ cpu,
// 2.6-3.5 GHz clock, with 8 threads, and 16GB of memory. Parameter tuning
// probably needed to optimize for other hardware systems (ARM, PowerPC, etc).

此文件已被截断。显示原文