Initially, I tried a few obvious things. The debugger claimed that there was
"No symbolic information" at the point of the GPF's, so I went up the call
stack, and found that malloc() was the offending function. (This
is key, as I'll show later, but at this point, no flags went off in my head...)
I rescanned dependencies, rebuilt the world, and tried again. No luck. I
backed out my changes, and things worked again. I added and backed out code
until just one line was pinpointed as the culprit: void *pFoo;
added as a public member variable in CMainFrame.
What!?! Adding a member variable (that isn't even referenced anywhere else in the app) shouldn't cause GPF's! And in three different places, depending on who's looking. I examined the other member variables (mostly pointers to other child windows, most of whom were NULL at the point of the GPF's) with no success.
I then spent some time looking into the code that was calling MFC's
malloc(). MFC's implementation of 'new' was the direct call stack
parent, but above that were other MFC functions. In one GPF, I was below a
call to CBitmap::Create(), in another, I was downwind of
CWinApp::OnIdle() while it was doing some cleaning up. Nothing
that made sense.
I ran Debug Window along with the app, which yielded some diagnostic messages. I corrected them, but the app still crashed. I then thought to run BoundsChecker. This yielded more diagnostics, which I then fixed, but still the GPF's kept coming.
On day three of the bug hunt, (or more accurately, after two nights of sleep),
something finally dawned on me. What the debuggers were telling me by crashing
inside of malloc() was that the problem was in the data that
malloc() was referencing. But without the malloc()
source code, how does one determine what that might be?
At this point, I recalled that there are Runtime Library routines that will
walk you through the heap's data structures. Hoping that one of these data
structures was what was getting trashed (and thereby causing the GPF's), I
looked up 'heapwalk', and found sample code (which I modified,
shown below) to check the heap.
Sure enough, my heapdump() code reported that shortly before a
GPF, the heap was getting trashed. All I needed to do was sprinkle calls to
heapdump() around my app, and narrow down where the problem was
taking place. Half a dozen recompiles later, I had found the offending lines
of code:
for (int nIndex = 0; nIndex < nNumItems; nIndex++) m_clRectArray[nIndex].SetRect(...);
After inspecting the definition of m_clRectArray, it
turned out that it had been declared as having 5 fixed elements, and
nNumItems was passed into the object as 6. Apparently,
expanding CMainFrame by 4 bytes caused the array
overwrite to stomp on memory that malloc() was depending
on. I expanded the array to 6 elements, and added an
ASSERT to ensure that any size passed in will be valid.
(A better solution, perhaps, would have been to allocate the array at
runtime.)
The code for heapdump() came pretty much directly from the example
HEAPDUMP.C in the "_heapwalk Functions" article, under Product
Documentation\Languages\Visual C++ 1.5\Run-Time Library Reference\Part 2
Run-Time Functions\H. I changed it slightly to write to a log file, and to
overwrite the log file with every program execution. For purposes of simply
checking the heap's validity, the fprintf() inside the while loop
can be commented out (but not the while loop itself) to reduce the clutter of
the listing. The szTag parm is useful to identify individual
calls to heapdump() from different locations in the app, and the
HEAPDUMP macro uses this to mark the listing with the file and
line number of the call.
Below is a listing of the file HEAP.H:
#define HEAPDUMP { char szBuf[80]; sprintf( szBuf, __FILE__ ":%d", __LINE__ ); heapdump( szBuf ); }
void heapdump( char *szTag );
Below is a listing of the file HEAP.CPP:
#include <stdafx.h>
#include "heap.h"
#include "malloc.h"
void heapdump( char *szTag )
{
char *szMode;
FILE *fo;
_HEAPINFO hinfo;
int heapstatus;
static int nExecCount = 0;
nExecCount++;
if ( nExecCount == 1 )
szMode = "wt";
else
szMode = "at";
if ( ( fo = fopen( "heap.txt", szMode ) ) == NULL )
{
::MessageBox( NULL, "Can't open heap output file",
"Heap Error", MB_ICONSTOP );
return;
}
fprintf( fo, "Exec = %3d at \"%s\"\n", nExecCount, szTag );
hinfo._pentry = NULL;
while( ( heapstatus = _fheapwalk( &hinfo ) ) == _HEAPOK )
{
fprintf( fo, "%6s block at %Fp of size %4.4X\n",
( hinfo._useflag == _USEDENTRY ? "USED" : "FREE" ),
hinfo._pentry, hinfo._size );
}
switch( heapstatus )
{
case _HEAPEMPTY:
fprintf( fo, "OK - empty heap\n" );
break;
case _HEAPEND:
fprintf( fo, "OK - end of heap\n" );
break;
case _HEAPBADPTR:
fprintf( fo, "ERROR - bad pointer to heap\n" );
break;
case _HEAPBADBEGIN:
fprintf( fo, "ERROR - bad start of heap\n" );
break;
case _HEAPBADNODE:
fprintf( fo, "ERROR - bad node in heap\n" );
break;
}
fprintf( fo, "end\n\n" );
fclose( fo );
}