News   RISC OS stuff   Java stuff   x86 stuff   VFP/NEON stuff   LINUX stuff  
Work in progress - Here you can find my recent efforts in 64 Bit ARM coding on 64 Bit LINUX
FracNEON_sp_opt V1.0 (11/06/2021)
Based on my efforts on 32 Bit x86 I coded 64 Bit ARMv8 assembler versions of my Mandelbrot benchmark. The archive contains 3 executables for Linux (tested on Kali 64 Bit Linux and Ubuntu) and the sources. For the results and graphical output I'm using some C++ and SDL2 code. The computation time displayed as a result only covers the calculation, not the graphical output, even if that's neglectable. The 3 versions calculate the exact same result and amount of iterations but differentiate in the assembler implementation to max out - if available - the multiple execution ports, the out-of-order architecture and especially the out-of-order windows of modern ARM cores. In this code case I use the NEON extension in single precision on a single core only. Future versions are planed to use multiple cores and I plan to add double precision and VFP versions. A brief description of the 3 versions:
  • opt1 - 1 instruction block, loop unrolling 3 times
  • opt2 - 2 independent instructions blocks, loop unrolling 1 time
  • opt3 - 3 independent instructions blocks, loop unrolling 1 time
If you got any questions about it just contact me. It's also my first Linux application, so there might be better ways to code the C++ part or the SDL2 implementation. And also in the assembler code I might have missed some possible speed ups. Benchmark results in table and graph: