Golang Go语言单线程原子操作性能怎么这么差?

发布于 1周前 作者 itying888 来自 Go语言
package main
import (
    "sync/atomic"
    "fmt"
    "time"
)

func main() {

<span class="kd">var</span> <span class="nx">t1</span> <span class="kt">uint64</span> <span class="p">=</span> <span class="mi">0</span>
<span class="kd">var</span> <span class="nx">t2</span> <span class="kt">uint64</span> <span class="p">=</span> <span class="mi">0</span>

<span class="nx">endChan</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">(</span><span class="kd">chan</span> <span class="kt">int</span><span class="p">)</span>
<span class="k">for</span> <span class="nx">i</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="p">&lt;</span> <span class="mi">1000</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span> <span class="p">{</span>
    <span class="k">go</span> <span class="kd">func</span><span class="p">()</span> <span class="p">{</span>
        <span class="k">for</span> <span class="nx">i</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="p">&lt;</span> <span class="mi">10000</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span> <span class="p">{</span>
            <span class="nx">atomic</span><span class="p">.</span><span class="nx">AddUint64</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">t1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
            <span class="nx">t2</span> <span class="o">+=</span> <span class="mi">1</span>
        <span class="p">}</span>
        <span class="nx">endChan</span> <span class="o">&lt;-</span> <span class="mi">1</span>
    <span class="p">}()</span>
<span class="p">}</span>

<span class="k">for</span> <span class="nx">i</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="p">&lt;</span> <span class="mi">1000</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span> <span class="p">{</span>
    <span class="o">&lt;-</span><span class="nx">endChan</span>
<span class="p">}</span>

<span class="c1">// 测试非原子操作造成的值不正确</span>
<span class="c1">// t1= 10000000</span>
<span class="c1">// t2= 8513393</span>
<span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">"t1="</span><span class="p">,</span> <span class="nx">t1</span><span class="p">)</span>
<span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">"t2="</span><span class="p">,</span> <span class="nx">t2</span><span class="p">)</span>


<span class="c1">// 性能测试</span>
<span class="kd">func</span><span class="p">()</span> <span class="p">{</span>
    <span class="kd">var</span> <span class="nx">t1</span> <span class="kt">uint64</span> <span class="p">=</span> <span class="mi">0</span>

    <span class="nx">startTime</span> <span class="o">:=</span> <span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">()</span>
    <span class="k">for</span> <span class="nx">i</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="p">&lt;</span> <span class="mi">1000000000</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span> <span class="p">{</span>
        <span class="nx">t1</span> <span class="o">+=</span> <span class="mi">1</span>
    <span class="p">}</span>
    <span class="nx">endTime</span> <span class="o">:=</span> <span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">()</span>
    <span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">"非原子操作耗时:"</span><span class="p">,</span> <span class="nx">endTime</span><span class="p">.</span><span class="nx">Sub</span><span class="p">(</span><span class="nx">startTime</span><span class="p">))</span>
    <span class="c1">// 非原子操作耗时: 535.0303ms</span>

<span class="p">}()</span>

<span class="kd">func</span><span class="p">()</span> <span class="p">{</span>
    <span class="kd">var</span> <span class="nx">t1</span> <span class="kt">uint64</span> <span class="p">=</span> <span class="mi">0</span>

    <span class="nx">startTime</span> <span class="o">:=</span> <span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">()</span>
    <span class="k">for</span> <span class="nx">i</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="p">&lt;</span> <span class="mi">1000000000</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span> <span class="p">{</span>
        <span class="nx">atomic</span><span class="p">.</span><span class="nx">AddUint64</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">t1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
    <span class="p">}</span>
    <span class="nx">endTime</span> <span class="o">:=</span> <span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">()</span>
    <span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">"原子操作耗时:"</span><span class="p">,</span> <span class="nx">endTime</span><span class="p">.</span><span class="nx">Sub</span><span class="p">(</span><span class="nx">startTime</span><span class="p">))</span>
    <span class="c1">//原子操作耗时: 14.7758413s</span>
<span class="p">}()</span>

}

原子操作的实现不是锁总线?单线程应该锁总线应该不会影响性能吧?


Golang Go语言单线程原子操作性能怎么这么差?

更多关于Golang Go语言单线程原子操作性能怎么这么差?的实战系列教程也可以访问 https://www.itying.com/category-94-b0.html

11 回复

赞, 有意思的测试, 我猜测有几个问题:
1. 在测非原子操作耗时的时候, 我不确定 go 的编译器直接优化掉, 有精力的话, 你可以试一下 1. 用 if / else 替代 for 循环, 2. 把 t+=1 封个函数.
2. 即便真的差距这么大, 也容易用指令流水线的原理来解释.

更多关于Golang Go语言单线程原子操作性能怎么这么差?的实战系列教程也可以访问 https://www.itying.com/category-94-b0.html


go 的编译器直接优化掉 for 循环
– 删删改改弄错了.

对了, 如果要有实际应用场景的话, 是不是可以考虑用一个 go routine 来维护 t 这个变量, 即增加的时候往一个有 buffer 的 chan 里写 delta, 这样一般不会阻塞, 至于查询, 如果不需要准确值, 直接读 t 就好, 如果需要准确, 就比较棘手了.

对 go 的内存模型不是很了解,这里原子操作,atomic.AddUint64的[实现]( https://github.com/golang/go/blob/master/src/sync/atomic/64bit_arm.go#L27)其实就是一条[CMPXCHGQ]( https://github.com/golang/go/blob/master/src/sync/atomic/asm_amd64.s#L55)指令,即 CAS ,Q代表 quadword 。

同意 的猜测,在非原子操作的情况下,编译器有可能优化了 for 。但是对于原子操作,为了让t1对所有线程都是可见的, t1 就不会缓存在某个 cpu core 的 cache 或者其他 core 不可见的地方。同时为了线程安全,t1上的操作也不会与其他内存操作进行 reorder 。


把 t+=1 封个函数后,非原子操作耗时: 3.168774395s ,原子操作耗时: 11.310976061s

试了一把加锁版本的,比原子操作慢上两倍。。

C 的原子操作也很慢, 用 OSX 的 OSAtomicAdd64 编译参数-Os 同样的测试也要 8s 多 Go 版本在我这里 10s ,但是 C 版本的非原子操作超级快,应该是编译器优化了

封函数之后还要加 -gcflags ‘-l’ 把 inline 去掉

针对您提到的Golang单线程原子操作性能问题,以下是我的专业回复:

Golang的原子操作通常被认为是一种高效的并发编程工具,因为它们可以在不使用锁的情况下保证数据的一致性,并且避免了锁的开销,如获取锁、释放锁以及可能的线程阻塞。然而,性能感受可能受到多种因素的影响:

  1. 硬件和平台差异:原子操作的效率取决于具体的硬件平台和操作系统。在某些硬件架构上,原子操作的开销可能会比期望的更高。
  2. 上下文切换:虽然原子操作本身在用户态完成,开销较小,但如果代码中启动了大量goroutine,可能会导致过多的上下文切换,影响整体性能。
  3. 内存一致性:原子操作需要确保内存的一致性,这可能会导致额外的内存屏障,从而影响性能。

总的来说,Golang的原子操作在大多数情况下是高效的,但具体性能可能受到多种因素的影响。如果您在单线程环境下遇到性能问题,建议检查代码是否存在其他潜在的性能瓶颈,或者考虑使用性能分析工具进行诊断和优化。同时,也可以尝试在不同的硬件和操作系统平台上进行测试,以获取更全面的性能数据。

回到顶部