[Moved an old post from 2006 to my new blog]
Few days back one of my colleague asked me to debug a problem. She wrote a program and it was crashing in strcpy. I looked at the the code and it looked just fine to me. I thought lets debug it to see whats going on. I started the debug session, variables were pointing to the right data, the stack was fine and she was copying a fixed string to a big enough buffer. I stepped over strcpy and bammm…access violation. Weird huh…For a second i thought how can a simple code like this crash. It was time to dig into the disassembly to see what exactly is going on. But before we do that, lets take a look at two C functions below:
void foo() { char buffer[16]; char *p = buffer; *p = 1; /// Interesting code p[1] = 1; /// Interesting code p[2] = 2; /// Interesting code } void hoo() { char p[16]; *p = 1; /// Interesting code p[1] = 1; /// Interesting code p[2] = 2; /// Interesting code }
Look at the lines marked “Interesting code” above. They are exactly the same in both functions. They also perform the same task i.e *p in both functions modifies the value of first element of array. Similary p[1] = 1; modifies the second element of array in both function and so on. But trust not what you see. The code in function “foo” and “hoo” above even though looks exactly the same, performs the same task, will generate different sets of machine instructions. Disassembly of the two functions is shown below:
; Disassembly of foo ; void ; foo() ; { push ebp mov ebp,esp ; char buffer[16]; sub esp,14h ; char *p = buffer; lea eax,[buffer] mov dword ptr [p],eax ; *p = 1; // Interesting code mov ecx,dword ptr [p] ; <-- Get the address pointed by p ; <-- ecx contains address pointed by p mov byte ptr [ecx],1 ; Move 1 in the address pointed by ecx ; p[1] = 1; // Interesting code mov edx,dword ptr [p] ; <-- Get the address pointed by p ; <-- edx contains address pointed by p mov byte ptr [edx+1],1 ; <-- Move 1 in the address pointed by edx+1 ; p[2] = 2; // Interesting code mov eax,dword ptr [p] ; <-- Get the address pointed by p ; <-- eax contains address pointed by p mov byte ptr [eax+2],2 ; <-- Move 1 in the address pointed by eax+2 ; } mov esp,ebp pop ebp ret ; Disassembly of hoo ; void ; hoo() ; { push ebp mov ebp,esp ; char p[16]; sub esp,10h ; <-- allocate space for p[16] ; *p = 1; // Interesting code mov byte ptr [p],1 ; <-- Move 1 in the address pointed p ; <-- p is actually ebp-10h here ; p[1] = 1; // Interesting code mov byte ptr [ebp-0Fh],1 ; <-- Since stack grows from bottom to top, p[1] ; <-- will be p+1 => ebp-10h-1 => ebp-0Fh. Thus ; <-- the above statement moves 1 in the address ; <-- pointed p[1] ; p[2] = 2; // Interesting code mov byte ptr [ebp-0Eh],2 ; <-- similarly move 2 in the address ; <-- pointed by p[2] ; } esp,ebp pop ebp ret
As you can see from the disassembly, in function “foo”, compiler generates code such that all references to the “char array buffer” are made by derefencing p. On the other hand, in function “hoo” all references to the “char array p” in function hoo are made directly and not by derefencing any other variable.
Now back to the problem i was debugging…my colleague did this genuine mistake, which i am sure anyone could have done. She had a global variable declared as char gBuffer[MAX_PATH]; in one file but was using it as extern char *gBuffer in another. She was using gBuffer in the function like strcpy(gBuffer, “test”); The compiler treated gBuffer as a pointer and was passing the address of memory pointed by gBuffer, which would be the contents of first 4 bytes (on 32 bit, or 8 bytes on 64 bit machines) to function strcpy. In this case, the contents were 0x00000000 and thus the strcpy call was actually resulting in something like strcpy(NULL, “test”). No wonder it crashed. I changed the extern char *gBuffer to “extern char gBuffer[];”. Compiler got the hint that it is an array of characters and passed the correct address in strcpy.
These syntactic twists in C/C++ language makes it a bit harder to learn and prone to mistakes. However, it also make sure that programmers understand what compiler is doing behind their back. I guess a lot of programmer out there live for C/C++ because the power it provides. It is a two edged sword, you can use it properly to do amazing things or you can cut yourself if you are careless (even a little bit).
This posting is provided “AS IS” with no warranties and confers no rights.