A bug that always gets me

Posted on 2012/10/21

3


This week I made an error that I already repeated at least twice in my career; it doesn’t seem to stick in my brain. I’m going to write about it here, so that maybe I will stop doing it, and maybe it will prevent the same kind of error for my readers.

Let’s see if you can spot the bug before I reveal it.

As a C programmer I am used to see code like this:

while(start + index != limit) {
  index++;
  /* other stuff */
}

As an embedded C programmer I also see code that deals with interrupts and events that change the expression result outside the program flow:

volatile int handle_times;

void handler(void) {
  handle_times++;
}

void wait_handle_times(int offset) {
  while(handle_times + offset < MAX_HANDLE_TIMES) {     /* wait */   } }

As an embedded bare-metal C programmer I deal with code that sets, clears and checks bit flags.

 reg = serial->status_reg; /* read status of the serial port */
if(reg & TXERROR) { /* check if TXERROR flag is 1 */
  /* manage error */
  serial->status_reg = reg & ~TXERROR; /* clear flag */
}

The “bitwise” operators such as “&” and “|” are applied on each bit of the expression values, to obtain the desired value as a result. In this sense they are similar to addition, multiplication and so on, because they perform a computation between two expressions and produce another expression which is a data value of 8 bits or 16 bits or 32 bits and so on.

We had to write some code that performed serial communication of a string. The code checks that the serial port data register is empty and then it writes the character to transmit. The status register of the serial port contains many flags, between receive status, transmit status and error status. The code was something like this:

void serial_tx(const char *data) {
  while(*data != '\0') {
    while(serial->status_reg & TXEMPTY == 0) { /* wait */ }
    serial->data_reg = *data++;
  }
}

The program was not working. The terminal at the other end of the serial cable received only part of the string. I thought the problem was the cable, it was not. I checked out with an oscilloscope, and some characters were not transmitted by the device. It seemed the device was eating some of the characters. The first one always got sent, the others did not. It seemed to work “better” with higher baud rates, but many characters were eaten anyway. If we tried to debug the program step-by-step, the problem seemed to go away (Heisenbug?).

Then I printed out the value of the “serial->status_reg” value after the waiting “while” and before writing the character in the data register. It showed that the TXEMPTY flag was zero. So the flag went to 1 and then back to 0 without writing anything into it? Then I wrote this:

void serial_tx(const char *data) {
  while(*data != '\0') {
    uint32_t flag;
    do {
      flag = serial->status_reg & TXEMPTY;
      printf("flag = 0x%08X\n", flag);
    } while(flag == 0);
   printf("flag = 0x%08X\n", flag);
   serial->data_reg = *data++;
 }
}

and the program started to work; it transmitted the characters one by one and the receiver terminal printed them correctly.

Why did it work? I was doing the same thing  as before and just added some printf. I tried to comment out the printf calls thinking it was increasing the execution time, but it worked anyway with the printf commented. What was the difference from the previous program? I simply calculated the flag and then compared with zero… and then it dawned on me. The previous program did not do these two things in the same order I was doing them now. It was first comparing and then performing the bitwise “&“. I added some parentheses to the original code:

void serial_tx(const char *data) {
  while(*data != '\0') {
    while((serial->status_reg & TXEMPTY) == 0) { /* wait */ }
    serial->data_reg = *data++;
  }
}

and it worked. The error was that in C standard the bitwise “&” operator has lower priority than the comparison “==” operator. I was seeing it as similar to additions and multiplications, but instead the bitwise AND operator “&” has similar priority to the logical AND operator “&&“. With logical operators you can do “if (a == 0 && b == 1) { /* ... */ }” to perform two equality comparisons and then return the logical AND between the two. In my program what happened was that “TXEMPTY == 0” was always false, because TXEMPTY was a flag with a non-zero value. Zero bitwise AND with anything returns zero, so the loop was never entered, and the serial port data register was written too early, when the data register was still full with the previous character. The overwritten character was not sent, and the transmission was done only for those lucky characters that were written in the data register while it was emptied.

So, the specific lesson is:

  • In the ANSI/ISO C standard the bitwise operators have lower precedence than comparison operators

A more generic recipe can be:

  • When in slightest doubt, add parentheses to ensure intended behavior (and increase user readability)

Finally, an even more generic insight is:

  • Our brain creates models, assumptions and schema of the things we work on. Sometimes these models do not correspond with reality, but they are so persistent that they interfere with perceiving the truth even when it is in plain sight. We can be aware that the brain works that way, and when things don’t work as expected we can question these models that we have, we can step back, and try to see the reality with a beginner’s mind. It can change perspective and lead us to a better understanding of the world and of ourselves.
About these ads
Posted in: Embedded