Obsoleting the Memory Map

This will be a longer blog, so grab a coffee, I know you want one :)

Warning

Using Forth changes you, it slowly alters the way you look at problem solving, the way you look at programming; after a few years your mental bindings begin to fall away and anything becomes possible.

Firstly I’m an electronics technician but Forth brought out my inner toolmaker starting with SVD2FORTH where I sought to automate the massive memory map generation needed by Cortex-M microprocessors.

That was in 2014 when I first started using Forth and as my knowledge has grown so has my ability to imagine what could be, ways to make development easier and less tedious and to get the most mileage out of the small embedded flash memory in Cortex-M0.

My first SVD2FORTH was based on work done by Ralph Doering (https://github.com/ralfdoering/cmsis-svd-fth) with CMSIS-SVD and XSLT 1.0. Initially, like Ralphs mine was pretty basic but it grew in capability as I added intelligence to the transforms. Intelligent bitmap templates based on whether the bitfield was read-only, write-only or read-write became possible as did labelling single bit and multi-bit bitfields differently to assist the coder.

Smart Bitfields

The RCC_CR_PLLRDY register “access type” is read-only and because it is one bit, the only useful transform is to use the bit-test Word “bit@”, therefore a “?” is appended to the name to so indicate. The stack comment output shows that it returns either a “1” or a “0”, and “RCC_CR bit@” is appended to the end of the definition. This makes it ready to use as shown in the example below which makes the MCU wait until the PLL clock is ready for use.

: RCC_CR_PLLRDY? ( -- 1|0 ) 25 bit RCC_CR bit@ ; \ [read-only] RCC_CR_PLLRDY, PLL clock ready flag

example

do RCC_CR_PLLRDY? until

I wanted to add pre-calculated hex addressing in V4, but ran into problems because XSLT 1.0 can’t handle hexadecimal math so I had to keep using the “base” plus “offset” format of V3. This left the MCU to do the math and save the constants and V4 became a dead end branch in my Fossil SCM.

V3 Memmap

STM32F051

$48000000 constant GPIOA \ General-purpose I/Os
GPIOA $0 + constant GPIOA_MODER ( read-write )  \ GPIO port mode register
GPIOA $4 + constant GPIOA_OTYPER ( read-write )  \ GPIO port output type register
GPIOA $8 + constant GPIOA_OSPEEDR ( read-write )  \ GPIO port output speed  register
GPIOA $C + constant GPIOA_PUPDR ( read-write )  \ GPIO port pull-up/pull-down  register
GPIOA $10 + constant GPIOA_IDR ( read-only )  \ GPIO port input data register
GPIOA $14 + constant GPIOA_ODR ( read-write )  \ GPIO port output data register
GPIOA $18 + constant GPIOA_BSRR ( write-only )  \ GPIO port bit set/reset  register
GPIOA $1C + constant GPIOA_LCKR ( read-write )  \ GPIO port configuration lock  register
GPIOA $20 + constant GPIOA_AFRL ( read-write )  \ GPIO alternate function low  register
GPIOA $24 + constant GPIOA_AFRH ( read-write )  \ GPIO alternate function high  register
GPIOA $28 + constant GPIOA_BRR ( write-only )  \ Port bit reset register

Version 5 used the same memmap as V3 but made multiple register bitfield configs possible in the same line, and at once instead of the V3 one bitfield per line strategy.

V3 Bitfields

STM32F051

RCC_CR_HSION bis!
RCC_CR_HSEON bis!
RCC_CR_CSSON bis!
RCC_CR_PLLON bis!
%10100  RCC_CR_HSITRIM bis!

V5 Bitfields

STM32F051

V5 also made a distinction between one bit bitfields (the bit is always “1”) , and bitfields with greater than one bit for which a “<<” was appended. For the latter, the “<<” indicates that an input parameter is expected, so the Reference Manual should be consulted to find the correct value for the application. The bitfields are all summed and no mask is needed as this issue is taken care of by the Word “bis!”.

RCC_CR_HSION RCC_CR_HSEON RCC_CR_CSSON RCC_CR_PLLON
                           %10100  RCC_CR_HSITRIM<<
                                + + + + RCC_CR bis!

Finally we get to V6 where I solved my XSLT 1.0 pre-calculated hexadecimal problem as you can see in “V6 Memmap” below.

V6 Memmap

RP2040

\ =========================== XIP_SSI =========================== \
$18000000 constant XIP_SSI_CTRLR0 \ Control register 0
$18000004 constant XIP_SSI_CTRLR1 \ Master Control register 1
$18000008 constant XIP_SSI_SSIENR \ SSI Enable
$1800000C constant XIP_SSI_MWCR \ Microwire Control
$18000010 constant XIP_SSI_SER \ Slave enable
$18000014 constant XIP_SSI_BAUDR \ Baud rate
$18000018 constant XIP_SSI_TXFTLR \ TX FIFO threshold level
$1800001C constant XIP_SSI_RXFTLR \ RX FIFO threshold level
$18000020 constant XIP_SSI_TXFLR \ TX FIFO level
$18000024 constant XIP_SSI_RXFLR \ RX FIFO level
$18000028 constant XIP_SSI_SR \ Status register

Ok, you say, that’s nice but so what ?

I had to solve this issue before I could get to my next MAJOR improvement as you’ll soon see.

Remember C ?

The C programming language has always had some advantages over a non Tethered Forth, namely it takes as input a symbolic human (barely friendly) language and crunches it up into a binary that only a Tethered Forth can compete with in size and speed. For those of us using a hosted Forth on Cortex-M (because there is no OSS Tether), we have to contend with the 20K Forth binary and fit our program into what’s left of our Cortex-M0 flash.

C users don’t have to preload a memory map like we Forth users do, their memory map lives on their PC in a Header file and it gets used by the C compiler as required.

Seeing where I’m going yet ?

The V6 pre-calculated hex addressing capability now allows the next MAJOR improvement.

Using “Gema” http://gema.sourceforge.net/new/index.shtml, V6 can now take my Forth source code, strip out the symbols that were previously dependent upon a preloaded memory map, and replace them with hexadecimal addresses. The preloaded memory map is no longer necessary!

The source is unaltered, it’s still human friendly only the upload to the hosted Forth target is altered.

Note

Although not shown here, my system has always stripped comments, lines and multiple spaces from the source before uploads. It looks quite human unfriendly, but then it’s talking MCU because that’s what the MCU understands. Otherwise the MCU must strip any ‘garbage’ from the input stream, which takes time.

Forth Source

This is for a STM32F407 Discovery Board and generates random numbers using the RNG peripheral. See the commented out ‘memmap’ section at the top of the file? It’s there in case the these constants have been previously loaded, and if not then they can be uncommented here. Yes it’s time consuming to have to do this but the pre-processor will put a stop to it.

Note

Compare the following files. “Uploaded Source” is what I get after pre-processing “Forth Source” thru my new SVD2FORTH V6. This is what is uploaded to the hosted Forth on the target MCU.

\ comment out any of these previously loaded memmaps
\   $40023800 constant RCC ( Reset and clock control )
\   RCC $34 + constant RCC_AHB2ENR ( read-write )    \ AHB2 peripheral clock enable  register
\   RCC $0 + constant RCC_CR (  )  \ clock control register
\   RCC $4 + constant RCC_PLLCFGR ( read-write )  \ PLL configuration register
\   $50060800 constant RNG ( Random number generator )
\   RNG $0 + constant RNG_CR ( read-write ) \ control register
\   RNG $4 + constant RNG_SR ( )            \ status register
\   RNG $8 + constant RNG_DR ( read-only )  \ data register

: bit ( u -- u ) 1 swap lshift 1-foldable ;
: RCC_CR_PLLON ( -- x addr ) 24 bit RCC_CR ;            \ Main PLL PLL enable
: RCC_CR_PLLRDY? ( -- 1|0 ) 25 bit RCC_CR bit@ ;        \ Main PLL PLL clock ready  flag
: RCC_PLLCFGR_PLLSRC ( -- x addr ) 22 bit RCC_PLLCFGR ; \ 0: HSI clock selected as PLL, 1: HSE oscillator clock selected as PLL

: PLLCFGR-CNF ( -- )
  1 RCC_PLLCFGR_PLLSRC bis!     \ HSI clock selected as PLL clock source which is Mecrisp-Stellaris USART default.
;

: pll-on ( -- ) PLLCFGR-CNF RCC_CR_PLLON bis! begin RCC_CR_PLLRDY? until ;

0 variable rnd-sample
0 variable rnd-flag

: RCC_AHB2ENR_RNGEN ( -- x addr ) 6 bit RCC_AHB2ENR ;   \ RCC_AHB2ENR_RNGEN, Random number generator clock  enable
: RNG_CR_RNGEN ( -- x addr ) 2 bit RNG_CR ;             \ RNG_CR_RNGEN, Random number generator  enable
: RNG_SR_DRDY? ( -- x addr ) 0 bit RNG_SR bit@ ;        \ RNG_SR_DRDY, Random data ready yet ?
: RNG_SR_SECS? ( -- 1|0 ) 2 bit RNG_SR bit@ ;           \ RNG_SR_SECS, Seed error current status
: RNG_SR_CECS? ( -- 1|0 ) 1 bit RNG_SR bit@ ;           \ RNG_SR_CECS, Clock error current status
: RNG_DR_RNDATA? ( --  x ) RNG_DR @ ;                   \ RNG_DR_RNDATA, Random data output register

: init.rng ( -- )
  pll-on
  RCC_AHB2ENR_RNGEN bis!                                \ Enable Random number generator clock
  RNG_CR_RNGEN bis!                                     \ Enable Random number generator
;

: rng-check-errors ( -- )
  RNG_SR_SECS? if ." There has been a seed error, reinitializing the RNG " cr
     RNG_CR_RNGEN bic!
     RNG_CR_RNGEN bis!
     0 rnd-flag !
     then
  RNG_SR_CECS? if ." There is a RNG CLOCK problem, see page 768 of RM0090 Rev 18" cr exit
     then
 ;

: print-random ( -- )                          \ but check it's valid first.
  rng-check-errors
  RNG_DR_RNDATA?                               \ get the new random number
  rnd-flag @ 0= if                             \ is this the first random number generated since RNG_CR_RNGEN ?
     RNG_DR_RNDATA? rnd-sample !               \ yes, save it
     then                                      \ no, proceed to testing the new number against the old
  begin
     begin RNG_SR_DRDY? until                  \ Is a new random number ready ?
     RNG_DR_RNDATA? dup rnd-sample @ - 0= if   \ compare old and new random numbers
     rnd-sample !                              \ same! save new as old, delete new
  else drop then                               \ different, drop the copy
  dup                                          \ valid RN on the stack, make a copy for final printing
  0<>                                          \ is it greater than zero ? ( question: will the RNG generate 0 as a valid RN ?)
  until
  u.                                           \ yes, Print it
;

init.rng
print-random

Uploaded Source

\ comment out any of these previously loaded
\   $40023800 constant RCC ( Reset and clock control )
\   RCC $34 + constant RCC_AHB2ENR ( read-write )    \ AHB2 peripheral clock enable  register
\   RCC $0 + constant RCC_CR (  )  \ clock control register
\   RCC $4 + constant RCC_PLLCFGR ( read-write )  \ PLL configuration register
\   $50060800 constant RNG ( Random number generator )
\   RNG $0 + constant RNG_CR ( read-write ) \ control register
\   RNG $4 + constant RNG_SR ( )            \ status register
\   RNG $8 + constant RNG_DR ( read-only )  \ data register


: bit ( u -- u ) 1 swap lshift 1-foldable ;
: RCC_CR_PLLON ( -- x addr ) 24 bit $40023800 ;         \ Main PLL PLL enable
: RCC_CR_PLLRDY? ( -- 1|0 ) 25 bit $40023800 bit@ ;     \ Main PLL PLL clock ready  flag
: RCC_PLLCFGR_PLLSRC ( -- x addr ) 22 bit $40023804 ;   \ 0: HSI clock selected as PLL, 1: HSE oscillator clock selected as PLL

: PLLCFGR-CNF ( -- )
  1 RCC_PLLCFGR_PLLSRC bis!     \ HSI clock selected as PLL clock source which is Mecrisp-Stellaris USART default.
;

: pll-on ( -- ) PLLCFGR-CNF RCC_CR_PLLON bis! begin RCC_CR_PLLRDY? until ;

0 variable rnd-sample
0 variable rnd-flag

: RCC_AHB2ENR_RNGEN ( -- x addr ) 6 bit $40023834 ;     \ RCC_AHB2ENR_RNGEN, Random number generator clock  enable
: RNG_CR_RNGEN ( -- x addr ) 2 bit $50060800 ;          \ RNG_CR_RNGEN, Random number generator  enable
: RNG_SR_DRDY? ( -- x addr ) 0 bit $50060804 bit@ ;     \ RNG_SR_DRDY, Random data ready yet ?
: RNG_SR_SECS? ( -- 1|0 ) 2 bit $50060804 bit@ ;        \ RNG_SR_SECS, Seed error current status
: RNG_SR_CECS? ( -- 1|0 ) 1 bit $50060804 bit@ ;        \ RNG_SR_CECS, Clock error current status
: RNG_DR_RNDATA? ( --  x ) $50060808 @ ;                \ RNG_DR_RNDATA, Random data output register

: init.rng ( -- )
  pll-on
  RCC_AHB2ENR_RNGEN bis!                                \ Enable Random number generator clock
  RNG_CR_RNGEN bis!                                     \ Enable Random number generator
;

: rng-check-errors ( -- )
  RNG_SR_SECS? if ." There has been a seed error, reinitializing the RNG " cr
     RNG_CR_RNGEN bic!
     RNG_CR_RNGEN bis!
     0 rnd-flag !
     then
  RNG_SR_CECS? if ." There is a RNG CLOCK problem, see page 768 of RM0090 Rev 18" cr exit
     then
 ;

: print-random ( -- )                          \ but check it's valid first.
  rng-check-errors
  RNG_DR_RNDATA?                               \ get the new random number
  rnd-flag @ 0= if                             \ is this the first random number generated since RNG_CR_RNGEN ?
     RNG_DR_RNDATA? rnd-sample !               \ yes, save it
     then                                      \ no, proceed to testing the new number against the old
  begin
     begin RNG_SR_DRDY? until                  \ Is a new random number ready ?
     RNG_DR_RNDATA? dup rnd-sample @ - 0= if   \ compare old and new random numbers
     rnd-sample !                              \ same! save new as old, delete new
  else drop then                               \ different, drop the copy
  dup                                          \ valid RN on the stack, make a copy for final printing
  0<>                                          \ is it greater than zero ? ( question: will the RNG generate 0 as a valid RN ?)
  until
  u.                                           \ yes, Print it
;

init.rng
print-random

SUCCESS!!

_images/success-fireworks.jpg

I ran my current Blue Pill Diags source code (64,768 bytes) thru the pre-processor. The result was a saving of 5128 bytes, which is significant when you only have 64KB !

THE NEW BINARY IS SMALLER BY 5% !

Advantages

  • Convenience and time saving: Previously before uploading source I had to anticipate what memory maps I needed, and load those into flash, but ONLY THE ONES I NEEDED because loading the entire STM32F051 memory map will use most of my available flash! If I later added a peripheral to the program a new memmap had to be created and flashed and the process starts again. Now that all the memmaps have been moved to the PC the entire memory map is available to my programs for the first time since 2016. I used to dream of a day when my Cortex-M0 would have enough flash to hold the entire memory map!

  • No more ‘redefined warnings’ because the constant was loaded earlier by some other program.

  • Nothing changes for the coder: The source isn’t altered, the preprocessor only changes what is sent to the hosted target Forth.

  • Flash is no longer wasted saving constants, which was only ever needed for the hexadecimal addresses anyway. Testing shows a 5% flash saving!

  • The basic hosted Forth (Mecrisp-Stellaris in this case) isn’t altered, you can still use a manual terminal and type code into it to your hearts content.

Disadvantages

Yes, sadly you can’t have advantages without disadvantages in the real world!

  • Symbolic register names such as “RCC_CR” typed at the terminal will result in a “not found” warning, unless you add them as constants first.

Note

Use lower case for the “rcc_cr” constant or the pre-processor will convert it into hex for the upload. Mecrisp-Stellaris doesn’t care about case, but the pre-processor does.

  • It’s more complex than V3.

Where to From Here ?

The pre-processor will be changed to M4 which is default on every OS.

I can use the pre-processor in reverse to replace hex addresses in Forth source with register labels and commented out descriptions to help better understand code by others.

Once V6 is completed, and the readme is finished, I’ll release it as always. Realistically that’s probably a few months away at this stage.

Thanks for reading this far!