Today I encountered a bug that was quite difficult to find regarding strings. In order for strings to work they must be null-terminated, and this implies that an array of characters can contain a string with a length equal to the array size minus one, because there must be space for the null character. I found out that, when initializing array of chars with strings, the compiler does not complain if just the null character doesn’t fit.
To give an example, I have this file called “string_test.c“:
#include <stdio.h>
char string_1[8] = "12345678";
char string_2[8] = "1234567";
char string_3[8] = "123456789";
char string_4[8] = "OVERFLW";
int main() {
printf("string_1 = %s\n", string_1);
printf("string_2 = %s\n", string_2);
printf("string_3 = %s\n", string_3);
return 0;
}
All the arrays have a size of 8, but string_1 and string_3 are initialized with strings that don’t fit in 8 characters. Here’s what happens when I compile:
$ gcc -Wall -Wextra string_test.c -o string_test string_test.c:5: warning: initializer-string for array of chars is too long
GCC complains about string_3 but not about string_1. What happens in both cases is that the strings are not null-terminated, and this could cause overflows; in this case the execution of the program results in this:
$ ./string_test string_1 = 123456781234567 string_2 = 1234567 string_3 = 12345678OVERFLW
The arrays in this simple case are put in memory as they are written in C, so one string follows the other. Both string_1 and string_3 are not null-terminated, so the printf overflows, printing potential garbage until a null character is found.
The problem in my personal case was that the compiler did not warn me about a situation similar to that of string_1; luckily splint does give a hand:
$ splint string_test.c
Splint 3.1.2 --- 03 May 2009
string_test.c:3:20: String literal with 9 characters is assigned to char [8]
(no room for null terminator): "12345678"
A string literal is assigned to a char array that is not big enough to hold
the null terminator. (Use -stringliteralnoroom to inhibit warning)
string_test.c:5:20: String literal with 10 characters (counting null
terminator) is assigned to char [8] (insufficient storage available):
"123456789"
A string literal is assigned to a char array too small to hold it. (Use
-stringliteraltoolong to inhibit warning)
The tool gives two slightly different warnings for the two strings, since the programming language behaves differently in the two cases. This is one of the many examples that show the usefulness of static analysis.
Entries
Matthias Arndt
2011/06/01
Interesting bug – did you file a bug report/complaint for GCC yet?
Balau
2011/06/02
Actually GCC is just following the C standard, which says:
And later it explains an example where
char t[3] = "abc";is a perfectly legal inintialization, and the array is initialized without the null character.The bug is not in GCC code, it’s in my program; it was enabled by the false assumption that the compiler warns when the null character doesn’t fit.
andrea
2011/06/03
IMHO
declaring
char string_2[8] = “1234567″;
is the same as declaring
char string_2[8] = {’1′, ’2′, ’3′, ’4′, ’5′, ’6′, ’7′};
and it means that the compiler will fill the string_2 array only with 7 characters and then fill the remaining char with a 00.
But not because you declared a string but because it fills the remaining non initialized data with zero.
So if you write
char string_2[8] = “123″;
the compiler will fill the remaining 5 elements of the array with a zero.
if you write
char string_2[8] = 0×55;
the compiler will fill the remaining 7 elements of the array with a zero.
In fact you didn’t receive a warning about string_1 because gcc never adds a NULL in declarations like that.
Different is if you do this
char *string_2 = “1234567″;
where a NULL is added after the 7.
Balau
2011/06/03
It’s true that
char string_2[8] = “1234567″;has the same effect aschar string_2[8] = {’1′, ’2′, ’3′, ’4′, ’5′, ’6′, ’7′};, but for two different reasons: the second example is a common array initialization; the first example is a case defined by the standard where an array of characters can be initialized by a string.Maybe the rationale behind it is to make it similar to common initialization where, as you said, the non-explicitly-initialized elements are filled with zeros.
Yes but it does when the array is of unknown size: if you write
char string_2[] = “1234567″;then string_2 has a size of 8 characters, if you writechar string_2[] = {’1′, ’2′, ’3′, ’4′, ’5′, ’6′, ’7′};then string_2 has a size of 7 characters.I agree that
char *string_2 = “1234567″;is a very different thing.Andrew Murray
2011/08/07
This is really interesting – nice to see an example of an overflow with such few lines of code!