bin/linpacksp
Enter array size (q to quit) [100]:
LINPACK benchmark, Single precision.
Machine precision: 6 digits.
Array size 100 X 100.
Memory required: 40K.
Average rolled and unrolled performance:
So, single precision is SLOWER than double precision on emulated G4 ?! (may be gcc autovectorizes by default, and altivec accelerates on AVX-capable host CPU?)
I wonder how older hardfloat patches behaved, anyone still have qemu with them applied around ..?
but result was basically the same (or those params not accepted anymore? qemu's build system uses meson/ninja since some time, and this line was done in 5.x.x times ...)
also, strictly from "because it still works" I tried to use MacOS X 10.4.11 Xdarwin (Xfree 4.4.0) server) for displaying program we maintain - Cinelerra-GG:
I used "-ac" flag for disabling access control on xserver side (because I am too lazy for xauth) and on host side I used e16 as window manager:
DISPLAY=":1" starte16
Xlib: extension "RANDR" missing on display ":1.0".
X connection to :1.0 broken (explicit kill or server shutdown).
For small (320x240) vid it was even quite fast! Sadly, this program is quite FP intensive AND does have few places in code where /proc filesystem queried for 1) number of cpus and 2) name of executable. i think I found solution to both problems, just waiting on Macports to fix pulseaudio build ...
Andrew_R wrote: Mon May 13, 2024 7:02 am
also, strictly from "because it still works" I tried to use MacOS X 10.4.11 Xdarwin (Xfree 4.4.0) server) for displaying program we maintain - Cinelerra-GG:
I used "-ac" flag for disabling access control on xserver side (because I am too lazy for xauth) and on host side I used e16 as window manager:
DISPLAY=":1" starte16
Xlib: extension "RANDR" missing on display ":1.0".
X connection to :1.0 broken (explicit kill or server shutdown).
For small (320x240) vid it was even quite fast! Sadly, this program is quite FP intensive AND does have few places in code where /proc filesystem queried for 1) number of cpus and 2) name of executable. i think I found solution to both problems, just waiting on Macports to fix pulseaudio build ...
Let us know how that goes... there've been a number of FPU improvements in QEMU from 5.x to 9.x, but there's still a few issues with screamer emulation on 10.x, and also with audio emulation pre-Lion on x86. On the x86 side, OpenCore can be used to tweak some of the items to help, but on PPC, we're dealing with the Mac99 virtual machine, so if you find potential fixes, we should get them added to the default machine definition.
Andrew_R wrote: Mon May 13, 2024 6:28 am
Compiled 9.0.0 just to be sure - and performance is back to 10-15 Mflops ...
Note, this is i686 (!) build by gcc 11.2.0
Note that i686 builds give much worse performance than x86_64 builds because i686 has only half the number of host CPU registers compared to x86_64 (I believe on i686 there are only 5 or 6 available host CPU registers available for the JIT). So I'd be interested to see the results of the same benchmark on an x86_64 build.
Andrew_R wrote: Mon May 13, 2024 6:28 am
Compiled 9.0.0 just to be sure - and performance is back to 10-15 Mflops ...
Note, this is i686 (!) build by gcc 11.2.0
Note that i686 builds give much worse performance than x86_64 builds because i686 has only half the number of host CPU registers compared to x86_64 (I believe on i686 there are only 5 or 6 available host CPU registers available for the JIT). So I'd be interested to see the results of the same benchmark on an x86_64 build.