![]() ![]() I forgot the easiest way to see this effect - simply note that the "loaded" latency (for moderate load levels) is much better than the "idle" latency! Here's a test I ran now, where I got a 99.2 ns "dle latency, but with loaded latencies as good as 60 ns. In fact, a large component of that might be that the tests run into this behavior. ![]() The difference is quite significant, and as I recall there were a lot of benchmarks being released around the time of Skylake release showing poorer latency on Skylake and blaming it on DDR4. It could indeed be the uncore frequency staying low. McCalpin, for the useful answer as always (Intel should really give you a stipend or something). ![]() Here, by running things on 2 cores you might get stuff done 3.5 times as fast as 1 core. It's weird because it kind of violates one of the main inequalities of multi-threading: running something on N cores is going to speed it up by at most N, never more. I guess maybe the uncore is ramping down or something between accesses, but having something hot on another core keeps it active. Unlike earlier threads about "spinners" helping out latency (and to a lesser extend, bandwidth), this is a single-socket laptop. I've seen it with other benchmarks: running something else concurrently speeds up the benchmark, but never the nearly 2x speedup this shows. If I just spin up another process that does nothing but hot loop (e.g., while do true done in bash, or just a tight loop in C) in another terminal window, my latency numbers improve dramatically to 55 ns or so. mlc -idle_latency, I get a figure of around 95 - 100ns on my core i7-6700HQ, which seems a bit high. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |