Rendered at 12:29:23 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
BearOso 3 days ago [-]
For function-multiversioning, the intrinsic headers in both gcc and clang have logic to take care of selecting targets. You also don't need to do dispatch manually when writing manual optimizations--the same function name with different targets is supported and dispatches automatically.
pseudohadamard 2 days ago [-]
Is it actually better/faster though? To see the difference between -O and -O2/3, compile some code for an x64 target on Godbolt and look at the output. -O produces optimised x86 code. -O2/3 produces enormous amounts of incomprehensible SSE/AVX/whatever code for even the simplest stuff, leading to a huge blowout in code size that can potentially interact badly with cacheing.
We had a look at this in embedded where you don't have infinite memory to play with and at the moment it's OK because there's no advanced instructions available to use, but it'll get ugly in the future when gcc realises it can use new instructions and produce five times the amount of object code for the same source code.
ranger_danger 23 hours ago [-]
> Is it actually better/faster though?
For their use case, I would say yes. The article does not talk about general program optimization like -O2/3 does, it's about selecting different versions of specific functions depending on which CPU the application is running on.
For example if your program is heavy on image/video processing, using functions that iterate over your buffers, you typically want the fastest method available. A function that can only use MMX/SSE instructions instead of say, AVX2 or AVX-512, is going to be orders of magnitude slower, translating into significant real world FPS differences in performance.
pjmlp 3 days ago [-]
While using C extensions, and yes Microslop rather have you using C++.
We had a look at this in embedded where you don't have infinite memory to play with and at the moment it's OK because there's no advanced instructions available to use, but it'll get ugly in the future when gcc realises it can use new instructions and produce five times the amount of object code for the same source code.
For their use case, I would say yes. The article does not talk about general program optimization like -O2/3 does, it's about selecting different versions of specific functions depending on which CPU the application is running on.
For example if your program is heavy on image/video processing, using functions that iterate over your buffers, you typically want the fastest method available. A function that can only use MMX/SSE instructions instead of say, AVX2 or AVX-512, is going to be orders of magnitude slower, translating into significant real world FPS differences in performance.
https://herbsutter.com/2012/05/03/reader-qa-what-about-vc-an...
Even if in recent years after tbat post they added support for C11 and C17, minus some stuff like aligned mallocs.