Benchmarks¶
The question of Speed often crops up, so I’ve listed some comparisons of programming languages and MCU’s below.
Programming Language Benchmark¶
Computing the same 100-term polynomial 500,000 times, smaller is faster.¶
I’ve borrowed the table below from http://dan.corlan.net/bench.html which you will need to read for the full story.
Note
Performed on a 300MHz Pentium using Debian GNU/Linux, so it’s fairly old
Language |
single body (s) |
with call (s) |
---|---|---|
FORTRAN, g77 V2.95.4 |
2.73 |
2.73 |
Ada 95, gnat V3.13p |
2.73 |
2.74 |
C, hand optimized , gcc V2.95.4 |
2.73 |
|
Java, gcj V3.0 |
3.03 |
15.53 |
D, gcc V4.0.3+ |
1)3.43 |
1)3.98 |
C, gcc V2.95.4 |
3.61 |
3.57 |
R translated to lisp using R2cl v0.1 and compiled with cmucl |
3.69 |
|
Lisp, CMU Common Lisp V3.0.8 18c+, build 3030 |
4.69 |
10.69 |
Java, jikes V1.15 (bytecompiled) |
8.23 |
13.54 |
FORTH, hand optimized Gforth 0.6.1 |
1)18.21 |
|
FORTH,** Gforth 0.6.1 |
1)27.26 |
|
Python** +psyco (interpreted) |
1)168.50 |
|
Perl, more optimized$ V5.6.1 (natively compiled) |
209.20 |
|
Perl, more optimized$ V5.6.1 (interpreted) |
258.64 |
|
Perl, hand optimized*** V5.6.1 (bytecompiled) |
306.18 |
|
Perl* V5.6.1 (natively compiled) |
367.23 |
|
Python** V2.1.2 (interpreted) |
505.50 |
|
Perl* V5.6.1 (bytecompiled) |
515.04 |
|
RUBY*** (interpreted) |
1074.52 |
|
R V1.5.1 (interpreted) |
5662.64 |
Power Of A Language Or Performace Of An Implementation ?¶
This is the big question to me:
What if the project itself doesn’t need the maximum processing speed of the target MCU, but rather requires the minimum development time ? I prefer the interactivity of Forth so I can quickly determine hardware characteristics when beginning a new project. This saves me a lot of time correcting invalid assumptions in my code later on.
Benchmarking Different Forths/Mcus¶
Calculate the greatest common divisor for 0 to 200 Download: GCD
Inspired by : http://weblambdazero.blogspot.com.au/search/label/forth
Benchmark: Calculate The Greatest Common Divisor For 0 to 200¶
Less time is best, ordered by fastest at top.
MCU |
Clock (MHz) |
Time (sec) |
Comments |
---|---|---|---|
STM32F103 |
75 |
0.1505 |
Mecrisp-Stellaris RA 2.5.1 with M3 core for STM32F103 |
STM32F051 |
96 (overclocked) |
0.3 |
Mecrisp-Stellaris RA 2.3.7 with M0 core for STM32F051 |
STM32F051 |
72 (overclocked) |
0.516 |
Mecrisp-Stellaris RA 2.5.3 with M0 core for STM32F051 |
STM32F051 |
48 |
0.6 |
Mecrisp-Stellaris RA 2.3.7 with M0 core for STM32F051 |
STM32F051 |
8 |
2.59 |
Mecrisp-Stellaris RA 2.3.7 with M0 core for STM32F051 |
Atmega328 |
16 |
4 |
Flash Forth 5 |
STM8S003F3P6 ( From Flash ) |
16 |
6.4 |
optimized for size not speed |
STM8S003F3P6 ( From Ram ) |
16 |
6.9 |
optimized for size not speed |
Atmega328¶
Forth |
Clock |
Time (Sec) |
---|---|---|
Flash Forth 5: |
16 |
4 |
AmForth 6.3 |
16 |
8 |
Yaffa Forth 0.6.1 |
16 |
70 |
Code¶
: gcd ( a b -- gcd ) \ 15 25 gcd .
begin
dup
while
swap over mod
repeat
drop ;
: bench ( n -- )
ms.counter.reset
200 dup 0 do
dup 0 do
j i gcd
drop
loop
loop drop
ms.print cr
;
bench