Golang Go语言 memorycache v1.1.5 更新:写入速度达到 Ristretto 五倍
https://github.com/lxzan/memorycache
https://pkg.go.dev/github.com/lxzan/memorycache
本次更新, 引入了时间戳缓存, 优势进一步扩大:
// 白嫖 github action 做个测试
goos: linux
goarch: amd64
pkg: github.com/lxzan/memorycache/benchmark
cpu: AMD EPYC 7763 64-Core Processor
BenchmarkMemoryCache_Set-4 11106261 100.6 ns/op 18 B/op 0 allocs/op
BenchmarkMemoryCache_Get-4 635988 77.30 ns/op 0 B/op 0 allocs/op
BenchmarkRistretto_Set-4 7933663 491.8 ns/op 170 B/op 2 allocs/op
BenchmarkRistretto_Get-4 11085688 98.92 ns/op 18 B/op 1 allocs/op
PASS
Golang Go语言 memorycache v1.1.5 更新:写入速度达到 Ristretto 五倍
更多关于Golang Go语言 memorycache v1.1.5 更新:写入速度达到 Ristretto 五倍的实战系列教程也可以访问 https://www.itying.com/category-94-b0.html
不愧是卷王之王😀
更多关于Golang Go语言 memorycache v1.1.5 更新:写入速度达到 Ristretto 五倍的实战系列教程也可以访问 https://www.itying.com/category-94-b0.html
有个 gws 粉丝在鞭策我 🤣
无意中看到了另一个快 5 倍的 https://github.com/maypok86/otter (看 bench 写入比我的 Theine 也快 5 倍), 可以研究一下
源码比我的 MemoryCache 复杂许多, 没看明白它的 TTL 是怎么实现的
在 mac 上测了下<br>goos: darwin<br>goarch: arm64<br>pkg: <a target="_blank" href="http://github.com/lxzan/memorycache/benchmark" rel="nofollow noopener">github.com/lxzan/memorycache/benchmark</a><br>BenchmarkMemoryCache_Set-8 13383870 81.57 ns/op 15 B/op 0 allocs/op<br>BenchmarkMemoryCache_Get-8 14529171 69.02 ns/op 0 B/op 0 allocs/op<br>BenchmarkRistretto_Set-8 13331672 286.1 ns/op 139 B/op 2 allocs/op<br>BenchmarkRistretto_Get-8 15165708 81.48 ns/op 18 B/op 1 allocs/op<br>BenchmarkOtter_Set-8 10669878 114.8 ns/op 39 B/op 0 allocs/op<br>BenchmarkOtter_Get-8 14930769 114.3 ns/op 0 B/op 0 allocs/op<br>PASS<br>ok <a target="_blank" href="http://github.com/lxzan/memorycache/benchmark" rel="nofollow noopener">github.com/lxzan/memorycache/benchmark</a> 34.781s<br>
我测了测纯 GET 确实很快,因为用的是 xsync map( https://github.com/puzpuzpuz/xsync). 不过按照 xsync 作者的说法 GC 压力会更高( https://github.com/puzpuzpuz/xsync/issues/94),这点从 benchmark 的 allocs 上倒是不太能看出来
GC 压力高那还不如用内置 map 了,纯 Get 大家表现都差不多。
我自己也实现过 map ,但是扩容那块写得太简单粗暴,后面干粹用内置 map 了
The only competitor to MemoryCache (MC) is Ristretto, neither of which is GC optimized. GC optimization is not all positive gain, Codec overhead is not small. MC seeks to replace Redis in light-use scenarios, where indexed priority queues are the best data structure, simple and efficient.
1.) any cache library is capable of replacing redis in light-use scenarios and it doesn’t give MC any advantage :)
2.) The claim that MC is a rival to Ristretto at all is very funny because MC simply doesn’t have an eviction policy (MC just removes the element with the smallest expiration time, which isn’t even LRU as written in the README). And in fact the main problem that RTOs solve. As a result we have that MC loses to RTO by hit ratio, loses by performance on real traces Otter and has serious pressure on GC. It seems that a cache library with no eviction policy should offer something in return (like no pressure on GC like bigcache https://github.com/allegro/bigcache). In the end we have that MC is slightly faster than Ristretto but still has a nasty hit ratio because of which the system will spend much more time. And if we compare it to Otter, it will lose on all parameters.
3.) Do you really think that Dgraph labs, PostgreSQL authors and Ben Manes & CO couldn’t think of slicing the map into shards with mutex and bolting a heap on there? That’s just stupid
It’s not like ristretto didn’t consider using a minimal heap https://github.com/dgraph-io/ristretto/blob/5239be55a219459d7ce4f6e9f99c8cea2c0b9d3a/policy.go#L163
lol, the heap isn’t there for what you’re using. The heap is there just to avoid going through all the values in lfu to find the most rare element. You’re using the heap for ttl (no question about that) and as an eviction policy you’re just removing the element with the lowest expiry time, which is terrible. This gives you the speed to beat ristretto (since ristretto has to completely lock the eviction policy for all shards) but the hit ratio of such a cache will be horrible, because such actions are not really different from removing the first inserted item
It’s stupid to try and convince people tirelessly. I don’t see a problem with using indexed priority queues to implement memory caching with TTL, MC works well in my company’s projects. I also don’t see a problem with benchmarking based on random strings, redis keys often contain data ids. In MC, if you only use GetWithTTL and SetWithTTL, it’s just LRU.
Ok, maybe I’m misunderstanding something then I’d like to understand how it works in general.
1.) Suppose you use heap for the list in lru. Then why do it at all for O(logn) when you can do it for O(1)? And then how does deleting expirated items even work if the heap is not ordered by expiry time? It seems to me there is no way and memorycache leaves a large number of items alive until it reaches them in lru.
2.) Cutting lru into shards looks questionable in terms of hit ratio because one of the shards may simply have more items than the others and will constantly evict good items.
3.) Yes, redis can store id’s but that’s not what I was writing about. In a trace of random strings, all elements occur only once. And querying each of them is a cache miss but in reality there is a set of most frequent elements, which is what we want to store. And in memorycache benchmarks, any new item causes the caches to constantly evict items which would not happen in reality. It would also be cool to see benchmarks with a combination of set and get operations. And hit ratio comparison too.
Anyway, I fiddled around and tested benchmarks of mine and memorycache and the verdict is pretty interesting. Here I won’t take trace types into account, because on pure set and get they really don’t affect much, but only on mixed operations.
1.) There is a bug in memorycache benchmarks: https://github.com/lxzan/memorycache/blob/main/benchmark/benchmark_test.go#L31 this add eats about half of the benchmark time, since a bunch of goroutines compete for this atomic, it is better to use a local counter per goroutine or fastrand. And on such benchmarks the result is about the same:
goos: darwin
goarch: arm64
pkg: github.com/lxzan/memorycache/benchmark
BenchmarkMemoryCache_Set-8 27370962 83.20 ns/op
BenchmarkMemoryCache_Get-8 62720229 18.51 ns/op
BenchmarkRistretto_Set-8 19263478 101.4 ns/op
BenchmarkRistretto_Get-8 39636226 26.51 ns/op
BenchmarkOtter_Set-8 26425458 39.04 ns/op
BenchmarkOtter_Get-8 40814108 28.50 ns/op
PASS
ok github.com/lxzan/memorycache/benchmark 22.620s
Also on my benchmarks I got proportional results, memorycache was about a third faster than otter on pure get and about twice as slow on pure set, but things got sad when mixed load was applied
BenchmarkCache/Zipf_Memory_reads=75%,writes=25%-8 4453213 289.6 ns/op 13 B/op 0 allocs/op
BenchmarkCache/Zipf_Otter_reads=75%,writes=25%-8 13881976 75.53 ns/op 6 B/op 0 allocs/op
BenchmarkCache/Zipf_Theine_reads=75%,writes=25%-8 2853026 469.0 ns/op 0 B/op 0 allocs/op
BenchmarkCache/Zipf_Ristretto_reads=75%,writes=25%-8 3112142 390.5 ns/op 46 B/op 1 allocs/op
2.) I really wondered how regular locking with map was able to defeat wait-free ristretto and otter with rare request blocking. I went to the memorycache code and saw this https://github.com/lxzan/memorycache/blob/main/cache.go#L124 that is memorycache in no way accounts for reads of non-expired items in the eviction policy. This is not LRU at all and I don’t know what it is because any eviction policy has to take into account both reads and writes. Like for example here https://github.com/hashicorp/golang-lru/blob/main/simplelru/lru.go#L72. I’ll try to check the hit ratio tomorrow, but I suspect it will be very small
I’d like some kind of answer to this, to be honest, because I’ve been told so much about how only memorycache can fight ristretto, but so far it’s been disappointing.
The heap is used to quickly remove expired elements. Redis seems to randomly check for expired elements, which is not as efficient as the heap. In fact, users don’t care if a library strictly implements the LRU algorithm, all they want is a KV store with TTL.
I’ve updated the benchmarks. My local cachebench hit test was too much of a pain in the ass to run, so I gave up on it.
goos: linux
goarch: amd64
pkg: github.com/lxzan/memorycache/benchmark
cpu: AMD EPYC 7763 64-Core Processor
BenchmarkMemoryCache_Set-4 7657497 133.3 ns/op 27 B/op 0 allocs/op
BenchmarkMemoryCache_Get-4 23179834 58.10 ns/op 0 B/op 0 allocs/op
BenchmarkMemoryCache_SetAndGet-4 20667798 59.09 ns/op 0 B/op 0 allocs/op
BenchmarkRistretto_Set-4 7739505 321.4 ns/op 135 B/op 2 allocs/op
BenchmarkRistretto_Get-4 12482400 97.67 ns/op 18 B/op 1 allocs/op
BenchmarkRistretto_SetAndGet-4 7265832 140.4 ns/op 31 B/op 1 allocs/op
PASS
ok github.com/lxzan/memorycache/benchmark 31.137s
> I’d like some kind of answer to this, to be honest, because I’ve been told so much about how only memorycache can fight ristretto, but so far it’s been disappointing.
MC is just an obscure library, and I’m sure hardly anyone would say that.
It seems to have triggered a bug in the otter, and it’s slowing it down terribly.
go test -benchmem -run=^$ -bench . github.com/lxzan/memorycache/benchmark
goos: linux
goarch: amd64
pkg: github.com/lxzan/memorycache/benchmark
cpu: AMD Ryzen 5 PRO 4650G with Radeon Graphics
BenchmarkMemoryCache_Set-12 21004170 72.60 ns/op 9 B/op 0 allocs/op
BenchmarkMemoryCache_Get-12 43787251 40.11 ns/op 0 B/op 0 allocs/op
BenchmarkMemoryCache_SetAndGet-12 45939994 45.35 ns/op 0 B/op 0 allocs/op
BenchmarkRistretto_Set-12 12190314 122.2 ns/op 112 B/op 2 allocs/op
BenchmarkRistretto_Get-12 25565082 44.60 ns/op 16 B/op 1 allocs/op
BenchmarkRistretto_SetAndGet-12 11713868 97.06 ns/op 27 B/op 1 allocs/op
BenchmarkOtter_SetAndGet-12 13760 89816 ns/op 13887 B/op 0 allocs/op
PASS
ok github.com/lxzan/memorycache/benchmark 44.081s
func BenchmarkOtter_SetAndGet(b *testing.B) {
var builder, _ = otter.NewBuilderstring, int
builder.ShardCount(128)
mc, _ := builder.Build()
for i := 0; i < benchcount; i++ {
mc.SetWithTTL(benchkeys[i%benchcount], 1, time.Hour)
}
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
var i = atomic.Int64{}
for pb.Next() {
index := i.Add(1) % benchcount
if index&7 == 0 {
mc.SetWithTTL(benchkeys[index], 1, time.Hour)
} else {
mc.Get(benchkeys[index])
}
}
})
}
1.) Yes, users of cache libraries usually want a map with limited size, ttl and maybe some other set of features and don’t want to care about preemptive policies within that library. But if this library doesn’t show a good hit ratio, then the rest of its features are of no use at all, because cache misses will waste a huge amount of time on the system. And lru sharding and ignoring get operations in the eviction policy looks like a very small hit ratio (many times smaller than the others). Also as I wrote before, it’s not very clear how lru and ttl coexist together in the same heap.
2.) Yes, measuring hit ratio is not a very nice thing to do. But if you don’t want to write these checks yourself, you can check them on ristretto, theine or otter benchmarks. All these benchmarks have approximately the same values.
3.) Yes, I once got such strange memorycache benchmarks on otter and it was on pure get when benchcount was very large. I’ll try to look at this today tentatively looks like a gc loop, or some bug when accounting for expiration. It would be cool if you could share the full benchmark code so I can profile easier.
4.) I’ve noticed here that they unironically take off some balance for posts, so let’s continue in this issue https://github.com/lxzan/memorycache/issues/13
otter 作者被你炸出来了
好家伙,这个展开出乎我意料。可能这是 V2EX 为数不多的外国朋友
swiss table 的 gc 压力相比内置 map 怎么样?
知识盲区,也许这个能参考一下: https://github.com/golang/go/issues/54766
可以简单说下 Theine 是怎么维护过期时间和 LFU 吗?
MC 使用最小四叉堆高效地维护了过期时间, 但是只实现了 LRU, 命中率方面不如 LFU
我想到了, 再加一个堆, 以查询次数作为比较基准
Hierarchical Timing Wheels, 我是照着 caffeine 的 java 代码翻译的,也可以 google 。LFU 就复杂些了, 建议去看 W-TinyLFU 的论文。简单来说 frequency 数据是存在 Count-Min Sketch 这种概率类数据结构里的,所以占用空间很小
缓存策略相关的论文很多,包括各种改进版的 lru 策略也很多
针对“Golang Go语言 memorycache v1.1.5 更新:写入速度达到 Ristretto 五倍”的帖子,作为IT领域Go语言方面的专家,我认为这一更新体现了Go语言在缓存技术方面的显著进步。以下是我的专业回复:
首先,memorycache v1.1.5版本能够实现如此高的写入速度,无疑是对Go语言性能优化能力的一次有力证明。在当前的大数据和云计算时代,高效的缓存机制对于提升系统性能和响应速度至关重要。
其次,与Ristretto相比,memorycache v1.1.5在写入速度上实现了五倍的提升,这意味着在相同的硬件条件下,memorycache可以处理更多的写入请求,从而提高了系统的吞吐量和并发能力。这对于需要处理大量数据的应用场景来说,无疑是一个巨大的优势。
最后,值得注意的是,虽然写入速度的提升是memorycache v1.1.5版本的一大亮点,但我们也需要关注其在其他方面的表现,如缓存命中率、内存利用率等。只有在这些方面都表现出色,才能真正称得上是一款优秀的缓存解决方案。
综上所述,memorycache v1.1.5版本的更新无疑为Go语言在缓存技术方面树立了新的标杆。