Bare metal programs run without an operating system beneath; coding on bare metal is useful to deeply understand how a hardware architecture works and what happens in the lowest levels of an operating system. I wanted to create a simple example of bare metal program for ARM using free open source tools: RealView Development Suite is the state of the art of ARM compilers, but it is expensive for hobbyists; Codesourcery is a company that provides a free version of the GNU gcc toolchain for ARM cores. In particular, the EABI toolchain must be downloaded from their download page; I fetched the IA32 GNU/Linux installer. During the graphical installation, the tools are installed in a sub-folder of the user’s home; this is fine if only a single person wants to use the toolchain on that computer, otherwise it is more efficient to install it system-wide. The path to the toolchain binaries must be added to the PATH environmental variable; usually the installation process does it for you, but if it doesn’t, the standard installation path is “~/CodeSourcery/Sourcery_G++_Lite/bin“.
I created a C file called test.c containing the simplest C code I wanted to compile:
int c_entry() {
return 0;
}
The classic printf(“Hello world!\n”); example is more complex because when coding bare metal the standard input/output must be defined: it could be a physical serial port for example. I called it c_entry instead of main because in this example some things that are usually assumed true when the program reaches the main code are not implemented: for example, variable initialized globally in C code could not be really initialized.
To compile this code into an object file (test.o) run the following command, very similar to compiling code with gcc:
$ arm-none-eabi-gcc -c -mcpu=arm926ej-s -g test.c -o test.o
The -mcpu flag indicates the processor for which the code is compiled. I wanted to target the ARM926EJ-S processor in this example for these reasons:
- It’s a widespread core in common products
- I worked on a project that used this core
- The QEMU emulator supports this core in the form of a VersatilePB
In order to create a bare metal program we must understand what does the processor do when it is switched on. The ARM9 architecture begins to execute code at a determined address, that could be 0 (usually allocated to RAM) or 0xFFFF0000 (usually allocated to Read Only Memory). We must put some special code at that particular address: the interrupt vector table. It is a series of 32-bit instructions that are executed when something special happens: for example when the ARM core is reset, or when the memory contains an unknown instruction that doesn’t belong to the ARM instruction set, or when a peripheral generates an interrupt (the serial port received a byte). The instructions in the interrupt vector table usually make the processor jump to the code that handles the event. The jump can be done with a branch instruction (B in ARM assembly) when the destination address is near.
I created an assembly file called startup.s containing the following code:
.section INTERRUPT_VECTOR, "x" .global _Reset _Reset: B Reset_Handler /* Reset */ B . /* Undefined */ B . /* SWI */ B . /* Prefetch Abort */ B . /* Data Abort */ B . /* reserved */ B . /* IRQ */ B . /* FIQ */ Reset_Handler: LDR sp, =stack_top BL c_entry B .
A brief explanation:
- Line 1 generates a section named INTERRUPT_VECTOR containing executable (“x”) code.
- Line 2 exports the name _Reset to the linker in order to set the program entry point.
- Line 3 to 11 is the interrupt vector table that contains a series of branches. The notation “B .” means that the code branches on itself and stays there forever like an endless for(;;);
- Line 14 initializes the stack pointer, that is necessary when calling C functions. The top of the stack (stack_top) will be defined during linking.
- Line 15 calls the c_entry function, and saves the return address in the link register (lr).
To compile this code into an object file (startup.o) run the following command:
$ arm-none-eabi-as -mcpu=arm926ej-s -g startup.s -o startup.o
Now we have test.o and startup.o, that must be linked together to become a program. The linking process also defines the address where the program is going to be executed and declares the placement of its sections. To give this information to the linker, a linker script is used. I wrote this linker script, called test.ld, following a simple example in the linker manual:
ENTRY(_Reset)
SECTIONS
{
. = 0x0;
.text : {
startup.o (INTERRUPT_VECTOR)
*(.text)
}
.data : { *(.data) }
.bss : { *(.bss COMMON) }
. = ALIGN(8);
. = . + 0x1000; /* 4kB of stack memory */
stack_top = .;
}
The script tells the linker to place the INTERRUPT_VECTOR section at address 0, and then subsequently place the code (.text), initialized data (.data) and zero-initialized and uninitialized data (.bss). Line 11 and 12 tells the linker to move 4kByte from the end of the useful sections and then place the stack_top symbol there. Since the stack grows downwards the stack pointer should not exceed its own zone, otherwise it will corrupt lower sections. The script on line 1 tells the linker also that the entry point is at _Reset. To link the program, execute the following command:
$ arm-none-eabi-ld -T test.ld test.o startup.o -o test.elf
This will generate an ELF binary for ARM that can be executed with a simulator, or it can be loaded inside a real ARM core on a hardware board; for simplicity we can use the Codesourcery version of the gdb debugger:
$ arm-none-eabi-gdb test.elf [...] This GDB was configured as "--host=i686-pc-linux-gnu --target=arm-none-eabi". [...] (gdb) target sim Connected to the simulator. (gdb) load Loading section .text, size 0x50 vma 0x0 Start address 0x0 Transfer rate: 640 bits in <1 sec. (gdb) break c_entry Breakpoint 1 at 0x3c: file test.c, line 24. (gdb) run Starting program: /home/francesco/src/arm-none-eabi/startup/test.elf Breakpoint 1, c_entry () at test.c:24 24 return 0; (gdb) set $pc=0 (gdb) stepi Reset_Handler () at startup.s:34 34 LDR sp, =stack_top
- The target sim command tells the debugger to use its internal ARM simulator,
- the load command fills the simulator memory with the binary code,
- the debugger places a breakpoint at the beginning of the c_entry function,
- the program is executed and stops at the breakpoint,
- the program counter (pc register) of the ARM core is set to 0 to emulate a software reset,
- the execution flow can be examined step-by-step in the debugger.
An easier way to debug is using the ddd graphical front-end with the following command:
$ ddd --debugger arm-none-eabi-gdb test.elf
This program is a starting point to begin to develop more elaborate solutions. The next step I want to take is using QEMU as the development target: with it I can interact with some peripherals, even if emulated, and create bare metal embedded programs more useful in the “real world” using only free open source software.
For a deeper look into bare metal programming for ARM see also:
Entries
benjamin
2010/09/04
Thank you for this tutorial. It’s very helpful.
Gonzalo
2011/11/30
Hello Balau,
I’m trying to compile your code to a cortex-m3 environment, changing to -mcpu=cortex-m3 in the AS and GCC parameters, but for some reason it messes the code (I used “xp /_iw addr” in qemu to watch it). It seems to compile in thumb2 mode all the time, but the startup needs to be in pure ARM so it throws segmentation fault when executed.
Do you know why these tools uses thumb as default?
How can I fix it; just remove the parameter -mcpu on the startup?
Will I need some extra code to jump to thumb2 mode? (I need it because it’s useful as it saves flash memory)
Thank you!
Balau
2011/11/30
Watch out: Cortex-M3 is very different from the ARM926 that I target in my example. Cortex-M3 supports ONLY Thumb2, even in startup code, and the vector table at the beginning of the code is also different. Check also this post for more info on Cortex-M3 compilation: Using CodeSourcery bare metal toolchain for Cortex-M3
Gonzalo
2011/12/02
You are right Balau, Cortex-m3 is quite different, I really like it more than older ARMv7. Just pure C code to make it run! I’ve made a simple test app and it was very easy to make it run. But I’ve found something estrange; as far as I know, cortex-m saves r0-r3, r12, PC, LR and PSR on a exception request. It gives automatically room to run exception routines. But, for some reason, my test app adds push-pop on every exception routine. A real example:
C-code:
void HNMI_Handler()
{
};
ASM code generated by gcc:
00000058 :
58: b480 push {r7}
5a: af00 add r7, sp, #0
5c: 46bd mov sp, r7
5e: bc80 pop {r7}
60: 4770 bx lr
62: bf00 nop
Why is there this code?
Anyway, it runs properly. Thanks to you, Balau.
Balau
2011/12/02
The CodeSourcery Getting Started guide suggests to use the interrupt attribute. Try to write something like this instead:
void
__attribute__ ((interrupt))
HNMI_Handler()
{
/* ... */
};
As an aside, be careful about the terminology: old ARM7 cores are not ARMv7, Cortex processors are ARMv7. In ARMv7 the 7 is the version of the architecture. See this table: ARM cores
dz
2011/12/07
hi, i tried your example, and it is absolutely marvelous!
i found only one problem with my system, i use beagleboard, but nevertheless, everything compiled properly (i had a small problem with linking, but quick googling the error fixed it)
but, when i tried gdb, my version of arm-none-linux-gnueabi didn’t support sim option for the target argument, i got response:
“Undefined target command: “sim”. Try “help target”. ”
and i tried it, but there was no sim option listed, is this a problem with my coidesourcery? should i have installed arm-none-eabi instead?
Thanks!
Balau
2011/12/07
arm-none-linux-gnueabi does not support “target sim”, you need arm-none-eabi fot that.
Alternatively you can use qemu-system-arm as a GDB server and then attach with arm-none-linux-gnueabi-gdb, it should work.
Anyway I suggest using arm-none-eabi for bare-metal programs; the Linux toolchain that you are using is intended to compile Linux user space programs.
If in the future you need to simulate Linux user space programs you can try “qemu-arm” instead.
vivek
2012/01/13
Everything works fine till linking. But when I try to link using -ld the following error occurs. Can u help ?
# arm-none-linux-gnueabi-ld test.ld test.o startup.o -o test.elf
test.o:(.ARM.exidx+0×0): undefined reference to `__aeabi_unwind_cpp_pr0′
Balau
2012/01/13
The unwind functions should be called when an exception occurs, for example a division by zero. If you don’t care about that for now, you can define empty functions such as:
void __aeabi_unwind_cpp_pr0 (void) {}Have you tried using the bare metal toolchain? (arm-none-gnueabi-ld)
chandan
2012/02/09
Thank you so much Balau…you tutorials are very very useful.
chandan
2012/02/11
got an error while using gdb
using lucid lynx and successfully completed your tutorial “hello-world-for-bare-metal-arm-using-qemu”
error:
(gdb) target sim
Undefined target command: “sim”. Try “help target”
chandan
2012/02/11
while using ddd got an error:
root@ubuntu:~# ddd –debugger ./CodeSourcery/Sourcery_G++_Lite/bin/arm-none-linux-gnueabi-gdb test.elf
The program ‘ddd’ is currently not installed. You can install it by typing:
apt-get install ddd
root@ubuntu:~# apt-get install ddd
E: Could not get lock /var/lib/dpkg/lock – open (11: Resource temporarily unavailable)
E: Unable to lock the administration directory (/var/lib/dpkg/), is another process using it?
Please help.
Balau
2012/02/11
The GDB program of the bare-metal toolchain “
arm-none-eabi-gdb” supports the ARM simulator with “target sim“, but the GDB of other toolchains such as “arm-none-linux-gnueabi-gdb” do not support it. In that case you can use an emulator such as QEMU and attach to it with “target remote ...“Balau
2012/02/11
I never encountered that error. You should have more luck searching this error on Google.
farhanhubble
2012/03/28
Hi,
I’m trying your example but arm-none-eabi-ld gives the following error:
/home/farhanhubble/Spirit-RTOS/Source/temp/startup.o: In function `_Reset’:
/home/farhanhubble/Spirit-RTOS/Source/temp/startup.s:4: multiple definition of `_Reset’
startup.o:/home/farhanhubble/Spirit-RTOS/Source/temp/startup.s:4: first defined here
Balau
2012/03/29
Could you provide the commands that you used for compilation?
Jordan
2012/03/29
Thanks a lot for your tutorials.
I’m a student and I have a project to do on ARM arch, and your work is very usefull.
So I have to design a very simple kernel.
It’s ok to boot from the RAM.
But now, I want to to boot from the Flash, so I’va done some modification in the ld script :
SECTIONS
{
. = 0×1000000; ==> Flash begin address
.text : { *(.text) }
.rodata : { *(.rodata) }
Image_RO_Limit = .;
. = 0×20000000; ==> Ram begin address
Image_RW_Base = .;
.data : { *(.data) }
Image_ZI_Base = .;
.bss : { *(.bss) }
Image_ZI_Limit = .;
__bss_start__ = .;
__bss_end__ = .;
__EH_FRAME_BEGIN__ = .;
__EH_FRAME_END__ = .;
PROVIDE (__stack = .);
end = .;
_end = .;
.debug_info 0 : { *(.debug_info) }
.debug_line 0 : { *(.debug_line) }
.debug_abbrev 0 : { *(.debug_abbrev)}
.debug_frame 0 : { *(.debug_frame) }
}
Then, I erase my flash memory, flash the flash memory, power off the JTAG. But when I boot it doesn’t work.
I have a AT91RM200DK board.
Have you got any idea or any solution on an other dev board ?
Thanks,
Jordan.
Balau
2012/03/29
I think the problem is that you should use the “MEMORY” part of linker script, something like this:
MEMORY
{
flash (rx) : ORIGIN = 0×01000000, LENGTH = 256K
ram (rwx) : ORIGIN = 0×20000000, LENGTH = 64K
}
then put the .text and .rodata in Flash with “>flash”, the .data in the RAM with “>ram AT>flash” and the .bss in the RAM with “>ram”.
The initial code should take care of copying the .data section from Flash to RAM and zeroing the .bss section.
Many things could go wrong in the boot procedure and in uploading code, I suggest using a debugger or, if you can’t, you can use some code that turns on some LEDs to understand where the program is running.
farhanhubble
2012/03/29
hi,
i have used the following script :
PATH=$PATH:/home/farhanhubble/Spirit-RTOS/CodeSourcery_Toolchain/bin/
arm-none-eabi-gcc -c -mcpu=arm926ej-s -g /home/farhanhubble/Spirit-RTOS/Source/temp/test.c -o /home/farhanhubble/Spirit-RTOS/Source/temp/test.o
arm-none-eabi-as -mcpu=arm926ej-s -g /home/farhanhubble/Spirit-RTOS/Source/temp/startup.s -o /home/farhanhubble/Spirit-RTOS/Source/temp/startup.o
arm-none-eabi-ld -zmuldefs -T /home/farhanhubble/Spirit-RTOS/Source/temp/test.ld /home/farhanhubble/Spirit-RTOS/Source/temp/test.o /home/farhanhubble/Spirit-RTOS/Source/temp/startup.o -o /home/farhanhubble/Spirit-RTOS/Source/temp/test.elf
////////////////////////////////////////////////
i have somehow able to run that by adding option -zmuldefs while linking (i.e at the third step of the script). Is there any other way to do that.
Thanks
farhanhubble
Balau
2012/03/29
I agree that -zmuldefs is not a good solution but merely a workaround. Maybe the problem is in the linker script. I’m quite sure somehow the startup.o file is linked twice.
Also, you can try to launch the linker with “–verbose” and “-Map test.map” options to discover more information.
shabbir
2012/08/28
Hi,
I have gone through this post,i tried to debug the bare metal program using the tool chain “arm-none-eabi”,but it is failing to connect to target simulatior saying undefined command “sim”.help target list is not showing target sim option.can you please help me what else am i missing?
Balau
2012/08/28
Maybe the new versions of the arm-none-eabi toolchain dropped the support for the simulator.
I suggest using QEMU instead, if you want to execute your code.
See here for a bare metal example on QEMU and how to connect with the debugger:
Hello world for bare metal ARM using QEMU
shabbir
2012/09/03
Thanks Balau.Finally target sim command worked for me.But I want to know why target sim command is only working for arm-none-eabi toolchain,why it is not working for other toolchains? And I have written a simple C program for hello world,without .s and .ld and i tried to compile it with arm-none-eabi-gcc -g hello.c -o hello,but it is showing following error:
/home/shabbir/CodeSourcery/Sourcery_CodeBench_Lite_for_ARM_EABI/bin/../lib/gcc/arm-none-eabi/4.6.3/../../../../arm-none-eabi/bin/ld: warning: cannot find entry symbol _start; defaulting to 00008018
/home/shabbir/CodeSourcery/Sourcery_CodeBench_Lite_for_ARM_EABI/bin/../lib/gcc/arm-none-eabi/4.6.3/../../../../arm-none-eabi/lib/libc.a(lib_a-sbrkr.o): In function `_sbrk_r’:
sbrkr.c:(.text+0×18): undefined reference to `_sbrk’
/home/shabbir/CodeSourcery/Sourcery_CodeBench_Lite_for_ARM_EABI/bin/../lib/gcc/arm-none-eabi/4.6.3/../../../../arm-none-eabi/lib/libc.a(lib_a-writer.o): In function `_write_r’:
writer.c:(.text+0×20): undefined reference to `_write’
/home/shabbir/CodeSourcery/Sourcery_CodeBench_Lite_for_ARM_EABI/bin/../lib/gcc/arm-none-eabi/4.6.3/../../../../arm-none-eabi/lib/libc.a(lib_a-closer.o): In function `_close_r’:
closer.c:(.text+0×18): undefined reference to `_close’
/home/shabbir/CodeSourcery/Sourcery_CodeBench_Lite_for_ARM_EABI/bin/../lib/gcc/arm-none-eabi/4.6.3/../../../../arm-none-eabi/lib/libc.a(lib_a-fstatr.o): In function `_fstat_r’:
fstatr.c:(.text+0x1c): undefined reference to `_fstat’
/home/shabbir/CodeSourcery/Sourcery_CodeBench_Lite_for_ARM_EABI/bin/../lib/gcc/arm-none-eabi/4.6.3/../../../../arm-none-eabi/lib/libc.a(lib_a-isattyr.o): In function `_isatty_r’:
isattyr.c:(.text+0×18): undefined reference to `_isatty’
/home/shabbir/CodeSourcery/Sourcery_CodeBench_Lite_for_ARM_EABI/bin/../lib/gcc/arm-none-eabi/4.6.3/../../../../arm-none-eabi/lib/libc.a(lib_a-lseekr.o): In function `_lseek_r’:
lseekr.c:(.text+0×20): undefined reference to `_lseek’
/home/shabbir/CodeSourcery/Sourcery_CodeBench_Lite_for_ARM_EABI/bin/../lib/gcc/arm-none-eabi/4.6.3/../../../../arm-none-eabi/lib/libc.a(lib_a-readr.o): In function `_read_r’:
readr.c:(.text+0×20): undefined reference to `_read’
collect2: ld returned 1 exit status
Can you Please help me on this?
Balau
2012/09/03
“target sim” must be implemented by the developers that take care of the toolchain. I don’t know the reason why they did not implement the simulator.
About the C program compilation, the “undefined reference” errors are there because bare-metal programs need to implement low-level functions, otherwise for example the program doesn’t know where to send the characters of the “printf” string.
See this post about it:
I don’t know why the “_start” symbol is not found. Usually the toolchain links automatically the “crt” object files that contains the “C Run-Time” necessary to execute C code correctly.
Brijen
2012/10/07
Hi,
Thanks for the detailed post, helped me getting started with bare metal programming the raspberry pi. But at this moment I’m having problems with IRQ handler, do you have an example for it? Or could you let me know in short how it is to be done?
Thank you,
Brijen
Balau
2012/10/07
There’s a detailed explanation in the document that I linked at the end of the post, in Part 6: Building bare metal ARM with GNU [pdf].
From my side, I wrote a post about interrupts here: ARM926 interrupts in QEMU.
Hope this helps.
Nur Hussein
2012/11/03
Hi I’m using the cross-compiler from Debian, and it’s arm-linux-gnueabi-gcc instead of arm-none-eabi-gcc. What’s the difference between “Linux” and “None”, and “gnueabi” vs “eabi”.
Balau
2012/11/03
The toolchains have a triplet consisting of Architecture-OperatingSystem-BinaryInterface.
Architecture is ARM for both.
OperatingSystem is the one where the compiled program will run. In case of Linux, it means that the program will be executed by Linux, and the program libraries will talk with a Linux kernel through system calls. In case of None, the program will be run on the “bare metal”, talking directly with the hardware (usually with newlib as C standard library and runtime).
I think the eabi and gnueabi string in the toolchain name means the same, because EABI is the specification that both toolchains implement, gnueabi is the GNU open source implementation.
Nur Hussein
2012/11/03
Thanks! I tried your examples with arm-linux-gnueabi-gcc and it worked even though it’s a baremetal program. Is it because we’re not making syscalls? If I was to write my own kernel and OS does it matter if the toolchain is “Linux” instead of “none”?
Balau
2012/11/04
Yes the example works because we are supplying the startup code and we are not making system calls.
If you want to write your own kernel and OS different from Linux, then I don’t think you can use either of the two toolchains directly.
It depends on what kind of OS you are writing, but I think for a simple OS you could supply the startup and low-level calls, and use the bare metal toolchain.
Otherwise you could create your own toolchain such as “arm-YOUR_OS-gnueabi-gcc” but I suppose it’s harder.
vbalaji
2013/01/17
Hello Balau,
In bare metal programming how to handle the different exceptions. Can you provide the inputs for handling the different exceptions like undefined,data abort ,software,irq and fiq.
Please do needful.
Thanks
balaji
Balau
2013/01/17
This guide contains some information about exceptions and interrupts: Building Bare-Metal ARM Systems with GNU.
This ARM developer guide contains many details about exceptions handling: ARM Compiler toolchain – Developing Software for ARM Processors.
See also my blog post ARM926 interrupts in QEMU.
shabbir
2013/01/31
HI Balau,
This is irrelevant question from above posts.I want to know how to generate the MMU page table for new ARM based SOC.Please help me with the steps required to create page tables for new SOC.
Balau
2013/02/01
I have no experience of configuring MMU, I am afraid I can’t help you.
balaji
2013/02/28
Hi balau,
How do i know the performance of any SOC with and without L2 cache at the u-boot level. I enabled the L2 cache controller at the u-boot level. I want to see the execution speed of an application at the u-boot level by enabling L2 cache. Please clarify me if it is possible. Sorry If this is not a relevant question to you.
Thanks
balaji
Balau
2013/02/28
I think that it depends on what you have of this SoC. For example you might need one of these:
- Development board
- Target device (for example smartphone)
- RTL simulation
- Cycle-accurate emulator
For example the CodeSourcery ARM simulator or QEMU are not suitable for this, because they don’t model cache with its timings.
Then you need a way to measure the start and end of U-Boot execution, for example raising and lowering a GPIO and measuring the rising/falling edge distance with oscilloscope or an external microcontroller. Or using internal timers if the SoC has any.
Obviously you need to remove dead wait time such as “press any key to stop autoboot” countdown otherwise you won’t get useful results.
Hope this helps.
Andre
2013/05/08
Hi Balau,
great work! Thanks a lot, makes itmore easier to understand the basics of what is going on after reset.
However, I need to put out RAM bytes on UART just _before_ RAM gets initialized by the startup script. Therefore I need to initialize UART in ASM before. Do you know how to do this for omap4460/ARM A9?
Thanks a lot!
Balau
2013/05/08
You need to know how to initialize the UART and how to write ARM assembly code.
If you know neither, I suggest you tackle the two problems separately at first.
To understand how to initialize the UART, the OMAP4460 manual is needed.
You could then create some C code that initializes it and writes data to it.
To study some ARM assembly code, take a look at this:
http://www.eng.auburn.edu/~nelson/courses/elec5260_6260/ARM_AssyLang.pdf
You could also compile C code with “
--save-temps” option to see what kind of assembly is generated, or disassemble from a .o (or ELF) with “arm-none-eabi-objdump -S stuff.o > stuff.txt”Once you know both, add your assembly code just before the “
BL c_entry” line.