2024-12-20 • 1710 words • 30 mins • Arm • Stm32 • Embedded

STM32: Setting up an IDE-free environment

ℹ️ The problem

The necessity to write my own startup code for a microcontroller I was working on, specifically STM32G0B1RE

❗ Why?

I don’t like to work with IDEs and I want my workflow to be as straightforward as possible. I also don’t want to make a wrapper over the CMSIS headers as my intention is to use only the neccesary peripherals.

💡 How?

In what follows I will describe the steps needed to reproduce my setup and also the documentation required for understanding the concepts. We will build a linker script from scratch following the memory organization and register address boundaries specified in the reference manual. We will then create the table of interrupt vectors and define the code to be executed by the Reset_Handler. Afterwards, to automate the building process I will provide a custom Makefile.

References 🔗

Dependencies 🔗

ARM Cortex-M0+ Reset Behavior 🔗

The Cortex-M0+ has the following reset behavior according to this well documented presentation here :

Reset_Handler 🔗

A strong reference point was this forum post and also this implementation for the STM32H7xx series.

  1. Define the interrupt vector table for the NVIC according to the Cortex-M0+ Manual and STM32G0x1xx series reference Manual.
void (* g_pfnVectors[])(void) __attribute__((section (".isr_vector"))) = {...}
  1. Move the MSP to the SP (Stack Pointer) using inline assembly and LDR instruction (see page 141 from ARMv6-m Architecture Manual). It uses the _estack symbol defined in the linker script.
__asm (
      "LDR R0, =_estack       \n\t"
      "MOV SP, R0             \n\t"
  );
  1. Initialize the .data section. In order to use symbols from the .data section, the startup code needs to copy the data from LMA (Flash) to VMA (SRAM) to make sure that all C code can access the initialized data. It uses the _data_start, data_end symbols for the virtual adresses and _sidata symbol for the start address of the LMA of the .data section.
uint32_t *dataSrc = &_sidata, *dataDest = &_data_start;
  while (dataDest < &_data_end) {
      *dataDest++ = *dataSrc++;
  }
  1. Initialize the .bss section to zero using inline assembly. It uses the _bss_start and _bss_end symbols defined in the linker script.
  __asm (
      "LDR R0, =_bss_start      \n\t"
      "LDR R1, =_bss_end        \n\t"
      "MOV R2, #0               \n\t"
      "loop_zero:               \n\t"
      "   CMP 	R0, R1          \n\t"
      "   BGE 	end_loop        \n\t"
      "   STR	R2, [R0]        \n\t"
      "   ADD   R0, R0, #4      \n\t"
      "   B 	loop_zero       \n\t"
      "end_loop:                \n\t"
  );

Linker Script 🔗

Linking is the last step in the compilation of a program. It takes a number of object files and merges them into a single executable or binary file.

The linker script is a file made up of a series of linker directives that tells the linker (arm-none-eabi-gcc in our case) which sections to include in the output file, as well as which order to put them in, what type of file is to be produced, and what is to be the address of the first instruction.

A linker script (.ld) is comprised of four sections:

STM32G0B1RE Memory Organization

MEMORY {
  FLASH (rx)      : ORIGIN = 0x08000000, LENGTH = 512K   
  SRAM  (xrw)     : ORIGIN = 0x20000000, LENGTH = 144K
}
SECTIONS {
  .isr_vector : {
      KEEP(*(.isr_vector))
  } >FLASH

  .text : {
      . = ALIGN(4);
      *(.text)
      *(.text*)
      . = ALIGN(4);
      _etext = .;
  } >FLASH

  ...
}
ENTRY(Reset_Handler)

OUTPUT_FORMAT ("elf32-littlearm")
_estack = ORIGIN(SRAM) + LENGTH(SRAM);

The entire linker script can be found on my Github repository.

KEYWORDDESCRIPTION
SECTIONSDefines the mapping between the sections of the input object files to the sections of the linker’s output file, by specifying the memory layout of each output section.
KEEPTells the linker to not remove the section wrapped by this command. Important when working with ARM microprocessors because the interrupt vector table must be in a predefined memory location and is not referenced directly by code.
LMALoad Memory Address.
VMAVirtual Memory Address.
.Location Counter.
ALIGNIntroduces the required amount of padding to align the location counter.

Clock Configuration 🔗

The purpose of the System_Init function is to configure the PLL Block of the Clock tree with a frequency of 64 MHz and to select PLLRCLK as the SYSCLK source.

The configuration of the board clocks is interfaced by RCC (Reset and Clock Control). This module provides several registries necessary for the PLL configuration.

But first, in order to simplify this process, we can reproduce the structure of the module using a struct, each registry being a struct variable with the type of uint32_t.

Using a pointer to the aforementioned struct, I will map a specific address of the board’s memory. When I say specific, I refer to the fact that we want that each of our struct alterations to impact the corresponding hardware characteristic as well.

💡 RCC memory address

We can check the board’s memory organization inside the STM32G0x1xx Reference Manual, page 64, Table 6 and find that RRC starts at address 0x4002 1000

STM32G0x1 Peripheral Register Boundary

typedef struct {
    volatile uint32_t CR;
    volatile uint32_t ICSCR;
    volatile uint32_t CFGR;
    volatile uint32_t PLLCFGR;
    volatile uint32_t RESERVED;
    volatile uint32_t CRRCR;
    volatile uint32_t CIER;
    volatile uint32_t CIFR;
    volatile uint32_t CICR;
    volatile uint32_t IOPRSTR;
    volatile uint32_t AHBRSTR;
    volatile uint32_t APBRSTR1;
    volatile uint32_t APBRSTR2;
    volatile uint32_t IOPENR;
    volatile uint32_t AHBENR;
    volatile uint32_t APBENR1;
    volatile uint32_t APBENR2;
    volatile uint32_t IOPSMENR;
    volatile uint32_t AHBSMENR;
    volatile uint32_t APBSMENR1;
    volatile uint32_t APBSMENR2;
    volatile uint32_t CCIPR;
    volatile uint32_t CCIPR2;
    volatile uint32_t BDCR;
    volatile uint32_t CSR;
} RCC_Def;

#define RCC_BASE                    0x40021000
#define RCC                         ((RCC_Def*) RCC_BASE)

This way, we can use the RCC macro to update the hardware peripheral to our desired state.

Taking a look at the clock tree, we can see which parts are of interest when deriving the PLLRCLK.

PLL Block

The formula by which the output fPLLR is calculated is the following:

 
fPLLR = ((fPLLIN / M) * N) / R,
where:
  - fPLLIN is the frequency of HSI16 clock source of 16 MHz
  - M, N and R are configurable parameters using PLLCFGR register

To start configuring, we need to make sure that the PLL clock is turned off.

RCC->CR &= ~RCC_CR_PLLON_MASK;
while (RCC->CR & RCC_CR_PLLRDY_MASK) {
}

We need to explicitly select the input clock source as the HSI16.

RCC->PLLCFGR &= ~RCC_PLLCFGR_PLLSRC_MASK;
RCC->PLLCFGR |= RCC_PLLCFGR_PLLSRC(2);

Generating the VCO frequency requires a specific range of values which need be obtained by the use of M and N. As stated in the reference manual, after the division with M the input clock frequency needs to be between 2.66 and 16 MHz and after the multiplication with N it needs to be between 96 and 344 MHz.

I will use M = 2 and N = 16 to obtain fVCO = 128 MHz.

RCC->PLLCFGR &= ~RCC_PLLCFGR_PLLM_MASK;
RCC->PLLCFGR |= RCC_PLLCFGR_PLLM(1);

RCC->PLLCFGR &= ~RCC_PLLCFGR_PLLN_MASK;
RCC->PLLCFGR |= RCC_PLLCFGR_PLLN(16);

Finally, I will also set R = 2 to obtain a final PLLR frequency of 64 MHz.

RCC->PLLCFGR &= ~RCC_PLLCFGR_PLLR_MASK;
RCC->PLLCFGR |= RCC_PLLCFGR_PLLR(1);

We can now turn on the PLL as we have finished the configuration, and also enable PLLR. After the clock has successfully turned on, I will select it as the SYSCLK.

RCC->CR |= RCC_CR_PLLON_MASK;

RCC->PLLCFGR |= RCC_PLLCFGR_PLLREN_MASK;

while (! (RCC->CR & RCC_CR_PLLRDY_MASK)) {
}

RCC->CFGR |= RCC_CFGR_SW(2);

Flash Latency

To correctly read data from flash memory, the number of wait states (LATENCY) must be correctly programmed in the FLASH access control register (FLASH_ACR) according to the frequency of the flash memory clock (HCLK)

For a HCLK with a frequency of 64MHz or more, 2 wait states are required and this modification needs to be made before setting PLLRCLK as the SYSCLK.

FLASH->ACR |= FLASH_ACR_LATENCY(2);
FLASH->ACR |= FLASH_ACR_PRFTEN_MASK;
while ((FLASH->ACR & FLASH_ACR_LATENCY_MASK) != 2) {
}

Usage 🔗

To be able to run the rules inside the provided Makefile you are required to also install the ARM GNU Toolchain :

  sudo apt-get install gcc-arm-none-eabi
  1. Create .hex file using the Makefile rule:
make all
  1. Flash the device
st-flash --format ihex write program.hex

2024-12-22T16:36:26 INFO common.c: STM32G0Bx_G0Cx: 144 KiB SRAM, 512 KiB flash in at least 2 KiB pages.
2024-12-22T16:36:26 WARN common_flash.c: Flash base use default L0 address
2024-12-22T16:36:26 INFO common_flash.c: Attempting to write 508 (0x1fc) bytes to stm32 address: 134217728 (0x8000000)
-> Flash page at 0x8000000 erased (size: 0x800)
2024-12-22T16:36:26 INFO flash_loader.c: Starting Flash write for WB/G0/G4/L5/U5/H5/C0

2024-12-22T16:36:26 INFO common_flash.c: Starting verification of write complete
2024-12-22T16:36:26 INFO common_flash.c: Flash written and verified! jolly good!
2024-12-22T16:36:26 INFO common.c: Go to Thumb mode

(OPTIONAL) 3. Start a gdb connection using st-util and connect locally using gdb-multiarch

sudo apt-get install gdb-multiarch

# Inside the first terminal window
st-util

# Inside the second terminal window
gdb-multiarch

(gdb) file program.hex
(gdb) target remote localhost:4242
(gdb) ...