Skip to content

Convert many FP divides to FP multiplies

Mahesh Madhav requested to merge convert_fdivs into master

Through hotspot analysis, found opportunities to speed up computation by pre-computing inverse constants and turning divides into multiplies. Most CPUs take longer to compute FP divides compared to FP multiplies. The gains will be depending on the microarchitecture of your target machine. I see about +4% on my Ampere Altra aarch64 machine, depending on the inputs.

Please make sure these are safe optimizations. My SPEC CPU inputs verify correctly and still give the same answers.

Merge request reports

Loading