Tracking down the elusive "Wierd GPF"

Using heapwalk() to track down heap stomping

by David Jaquay

My problem was this: I added some very innocent code and started getting GPF's in previously working portions of the program. The debugger was reporting GPF's in potentially three different places, depending on how the app was run: In the VC++ debugger, under CVW or with BoundsChecker.

Initially, I tried a few obvious things. The debugger claimed that there was "No symbolic information" at the point of the GPF's, so I went up the call stack, and found that malloc() was the offending function. (This is key, as I'll show later, but at this point, no flags went off in my head...) I rescanned dependencies, rebuilt the world, and tried again. No luck. I backed out my changes, and things worked again. I added and backed out code until just one line was pinpointed as the culprit: void *pFoo; added as a public member variable in CMainFrame.

What!?! Adding a member variable (that isn't even referenced anywhere else in the app) shouldn't cause GPF's! And in three different places, depending on who's looking. I examined the other member variables (mostly pointers to other child windows, most of whom were NULL at the point of the GPF's) with no success.

I then spent some time looking into the code that was calling MFC's malloc(). MFC's implementation of 'new' was the direct call stack parent, but above that were other MFC functions. In one GPF, I was below a call to CBitmap::Create(), in another, I was downwind of CWinApp::OnIdle() while it was doing some cleaning up. Nothing that made sense.

I ran Debug Window along with the app, which yielded some diagnostic messages. I corrected them, but the app still crashed. I then thought to run BoundsChecker. This yielded more diagnostics, which I then fixed, but still the GPF's kept coming.

On day three of the bug hunt, (or more accurately, after two nights of sleep), something finally dawned on me. What the debuggers were telling me by crashing inside of malloc() was that the problem was in the data that malloc() was referencing. But without the malloc() source code, how does one determine what that might be?

At this point, I recalled that there are Runtime Library routines that will walk you through the heap's data structures. Hoping that one of these data structures was what was getting trashed (and thereby causing the GPF's), I looked up 'heapwalk', and found sample code (which I modified, shown below) to check the heap.

Sure enough, my heapdump() code reported that shortly before a GPF, the heap was getting trashed. All I needed to do was sprinkle calls to heapdump() around my app, and narrow down where the problem was taking place. Half a dozen recompiles later, I had found the offending lines of code:

for (int nIndex = 0; nIndex < nNumItems; nIndex++)
   m_clRectArray[nIndex].SetRect(...);

After inspecting the definition of m_clRectArray, it turned out that it had been declared as having 5 fixed elements, and nNumItems was passed into the object as 6. Apparently, expanding CMainFrame by 4 bytes caused the array overwrite to stomp on memory that malloc() was depending on. I expanded the array to 6 elements, and added an ASSERT to ensure that any size passed in will be valid. (A better solution, perhaps, would have been to allocate the array at runtime.)

The code for heapdump() came pretty much directly from the example HEAPDUMP.C in the "_heapwalk Functions" article, under Product Documentation\Languages\Visual C++ 1.5\Run-Time Library Reference\Part 2 Run-Time Functions\H. I changed it slightly to write to a log file, and to overwrite the log file with every program execution. For purposes of simply checking the heap's validity, the fprintf() inside the while loop can be commented out (but not the while loop itself) to reduce the clutter of the listing. The szTag parm is useful to identify individual calls to heapdump() from different locations in the app, and the HEAPDUMP macro uses this to mark the listing with the file and line number of the call.

Below is a listing of the file HEAP.H:

#define HEAPDUMP  { char szBuf[80]; sprintf( szBuf, __FILE__ ":%d", __LINE__ ); heapdump( szBuf ); }

void heapdump( char *szTag );

Below is a listing of the file HEAP.CPP:

#include <stdafx.h>
#include "heap.h"
#include "malloc.h"

void heapdump( char *szTag )
{
   char      *szMode;
   FILE      *fo;
   _HEAPINFO  hinfo;
   int        heapstatus;
   static int nExecCount = 0;

   nExecCount++;

   if ( nExecCount == 1 )
      szMode = "wt";
   else
      szMode = "at";
   if ( ( fo = fopen( "heap.txt", szMode ) ) == NULL )
      {
      ::MessageBox( NULL, "Can't open heap output file",
                    "Heap Error", MB_ICONSTOP );
      return;
      }

   fprintf( fo, "Exec = %3d at \"%s\"\n", nExecCount, szTag );
   hinfo._pentry = NULL;
   while( ( heapstatus = _fheapwalk( &hinfo ) ) == _HEAPOK )
      {
      fprintf( fo, "%6s block at %Fp of size %4.4X\n",
         ( hinfo._useflag == _USEDENTRY ? "USED" : "FREE" ),
         hinfo._pentry, hinfo._size );
      }

   switch( heapstatus )
      {
      case _HEAPEMPTY:
         fprintf( fo, "OK - empty heap\n" );
         break;
      case _HEAPEND:
         fprintf( fo, "OK - end of heap\n" );
         break;
      case _HEAPBADPTR:
         fprintf( fo, "ERROR - bad pointer to heap\n" );
         break;
      case _HEAPBADBEGIN:
         fprintf( fo, "ERROR - bad start of heap\n" );
         break;
      case _HEAPBADNODE:
         fprintf( fo, "ERROR - bad node in heap\n" );
         break;
      }

   fprintf( fo, "end\n\n" );
   fclose( fo );
}
Hosted by www.Geocities.ws

1