Rendered at 20:33:45 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
rurban 2 days ago [-]
It still can be done much cheaper. I just fixed my excessively cheap register allocator for my c compiler rcc.
The register allocator is a simple first-fit bitmask with no spilling to stack except for the two predefined spill slots. Only if all 8 registers are in use, it spills the additional registers on the stack.
What they call guest registers.
No SSA and no BB needed. No crazy mem2reg or graph-coloring. Only once per function.
Only for very big functions one register is spilled, usually just rsi.
> It still can be done much cheaper. I just fixed my excessively cheap register allocator for my c compiler rcc.
From that data, “GCCO2” is about twice as fast on compilation + single execution of some program as your C compiler rcc. That difference will get even larger if the compiled binary is run more than once. Why, then, do you imply your compiler is even cheaper?
More importantly, this paper is about binary translation, that is: taking a binary, reverting its register allocation, and then doing register allocation again for an architecture with a different register configuration. So, I don’t see how benchmarks of C compilers matter much for discussing it.
wffurr 2 days ago [-]
Seems like tcc, slimcc, and gcc -O0 are all better than rcc in this table; fast compilation and better runtime. Only kefir and clang -O0 (which is unfortunately what we use) are worse.
rurban 2 days ago [-]
Because I didnt benchmark -O1 yet, the opt pass. It has to do much less work on pure functions, so even compile time is faster then.
Still in work...
The register allocator is a simple first-fit bitmask with no spilling to stack except for the two predefined spill slots. Only if all 8 registers are in use, it spills the additional registers on the stack. What they call guest registers. No SSA and no BB needed. No crazy mem2reg or graph-coloring. Only once per function.
Only for very big functions one register is spilled, usually just rsi.
Benchmarks:
From that data, “GCCO2” is about twice as fast on compilation + single execution of some program as your C compiler rcc. That difference will get even larger if the compiled binary is run more than once. Why, then, do you imply your compiler is even cheaper?
More importantly, this paper is about binary translation, that is: taking a binary, reverting its register allocation, and then doing register allocation again for an architecture with a different register configuration. So, I don’t see how benchmarks of C compilers matter much for discussing it.