From: fobits@io.org (Frank Obits) Newsgroups: rec.games.programmer Subject: Fast Scrolling in Mode X Date: 28 Dec 1994 06:52:59 -0500 This article describes a fast and efficient method of scrolling a tiled world, using 320x200 Mode X. It will scroll an infinite distance vertically, horizontally and diagonally. On my system, a 386/40 with an ancient 8-bit video card, it has enough time to bounce nine small (16x10) sprites around on the screen while scrolling at a steady 70 fps. Probably I'll be flamed for posting such kindergarten stuff, but when I took an interest in game programming (not that long ago) I found such information surprisingly difficult to find. The method described by Diana Gruber in the Action Arcade Adventure Set scrolls in all directions and is easy to implement. The problem is that whenever it reaches a tile boundary it stops to copy a whole screenful of pixels back to the center of the page. On my system this causes the normal 70 fps rate to miss a beat, producing a slight jerk in the scrolling. Slowing down the framerate would cure this, but it still struck me as inefficient to move all those pixels from hither to yon. I had high hopes for Dave Robert's PC Game Programming Explorer, which concentrates on such low-level coding. Although it is an excellent book in most respects, the scrolling method used is rather limited. It moves beautifully in the vertical direction and can be easily modified to scroll horizontally, but it can't go in both directions in the same game - not unless you are willing to give up page-flipping. This method is derived from some hints in the PCGPE which were confirmed by a post here from Henric Steen. In an exchange of e-mail he gave me a few more pointers... Thanks, Henric. It isn't perfect, though. The big problem is that in it's present version it is incompatible with a split screen. If the user wants to see a status bar s/he will have to request one in a pop-up window, as in the Commander Keen series. Henric says that it's easy to incorporate a split screen, but despite his help I must admit that I haven't caught on yet... but enough blather, on with the show! Not everyone uses the same terminology, so I'll start by explaining some names I use and a few functions in my Mode X library. ** Definitions ** Window - the "window" is the part of vram which is actually displayed on the screen. In this method it is always 320x200 pixels. Page - this is NOT the same as the window. The width of a page is set by writing to the hardware, but the starting point and length are simply determined by variables in the program. Visible Page - the page in which the window is currently located. Active Page - the page on which we are currently drawing or erasing sprites. Background Page - holds a copy of the tiled background. Page Flipping - making the active page visible and vice-versa. ** Functions ** void window_at( unsigned pg_off, int x, int y ) This function writes to the Line Start and HPP registers to set the visible window at x,y within the page which starts at pg_off. void ltile_to_vram( char *tilearray, int tilenum, unsigned pg_off, int x, int y ) Gets a tile from a linear array of tiles and writes it at position x,y within the specified page. void rect_vram_to_vram( unsigned src_off, unsigned dest_off int x, int y, int hgt, int width ) Uses write mode #1 to copy a rectangular area of pixels from x,y on the source page to the same position on the destination page. ** Initialization ** By default the length of a Mode X line is 80 addresses, or 320 pixels. We want a page wide enough so that when the window is centered there is a buffer on each side to hold one column of tiles, which is a total width of 352 pixels or 88 addresses. The height of a page will be 240 lines, which is 16 + 200 + 24. When the window is in the initial position this leave a buffer for one tile row at the top. An even number of 16x16 tiles won't fit on a 200-line screen. After filling the screen with 12 rows we have 8 extra lines hanging off the bottom. The 24-line buffer at the bottom allows for these 8 lines plus another complete row of tiles. Thus each page takes 240 * 88 = 21120 addresses, and three of them use 63360 addresses. You can put them adjacent to each other, but I spaced them approximately evenly in vram. So now we need to create and initialize some global variables: unsigned visible_off = 360, /* initial positions of pages */ active_off = 22208, back_off = 44052; int window_x = 16, /* position of window within page */ window_y = 16; int world_x = 0, /* position of upper left tile in world */ world_y = 0; After setting Mode X and the page width, fill all three windows with the initial background. The tiles will extend all the way across the page (22 tiles) and down 14 rows - one at the top and 12 1/2 for the window. set_mode_x(); set_page_width( 88 ); set_pallette( pal ); window_at( visible_off, window_x, window_y ); for( i=0; i<14; i++ ) put_tile_row( back_off, i, tiles ); /* copy to other pages */ rect_vram_to_vram( back_off, active_off, 0, 0, 352, 224 ); rect_vram_to_vram( back_off, visible_off, 0, 0, 352, 224 ); When scrolling North, East or West the method is simple. First go as far as possible by moving the window within the page. When you run out of valid data move *all* the pages upward or downward far enough to accommodate a new row or column of tiles. Grab the tiles, copy to the other pages and reposition the window within the page. Moving South is just slightly different, because of the odd-sized buffer. To make it easy to grab rows or columns of tiles we would like to keep the top of the page aligned at a tile boundary. So after using eight lines of the buffer ( ++window_y == 20 ) we grab a new row for the bottom of the page, but don't reposition the pages until we've gone down an even 16 lines ( window_y == 33 ). Meanwhile, we're flipping pages and handling sprites. With a clean copy of the tiled background on the background page, sprites can be erased with a fast block copy, without regard to tile boundaries. This is more efficient than the "dirty tile" method, which requires copying a whole tile if a sprite overlaps even a few pixels in the corner. By now it has probably occurred to you that we can't do this very often before one of the pages goes right off the beginning or end of the video segment. If it's the background page, ignore it. The positions of the pages are held in 16-bit unsigned ints, which will automaticly wrap around between the ends of the segment. All of the routines that write or copy pixels are presumably written in assembly. If you're using real mode the index registers will also wrap, so that address a000:ffff is adjacent to a000:0000. In protected mode I believe there's an assembly directive to allow manipulation of the lower 16 bits of the index registers, so you can have the same effect. If the visible window becomes split, it's a more serious matter. In most video cards the hardware which paints the screen also wraps at the ends of the segment, but some SVGA cards don't do that. The solution is simple. We are doing all of this in a loop that looks something like this: LOOP: wait for retrace swap( visible_off, active_off ) window_at( visible_off, window_x, window_y ) erase sprites from active page check user input do scrolling if( active_off > 44416 ) swap( active_off, back_off ) draw sprites on active page goto LOOP Note that the scrolling is done after the sprites have been erased >from the active page. At this point the active and background pages are pixel-for-pixel identical, so if the scrolling has split the active page, just swap it with the background page. If the scrolling has split the visible page it won't take effect until the *next* time it becomes visible. Before that it will take it's turn as the active page, and we'll catch it then. ** Some Improvements ** This is both simple and efficient, but as I've described it so far it still has one big problem. When it needs more tiles it first consults the world map and copies them one-by-one to the background page. Then it uses two block copies to transfer them to the other pages, all in one frame. That's a lot to do in one frame - in fact it's almost all the work the engine does. Out of curiosity I temporarily removed all sprite handling and the wait-for-retrace function, so it did nothing but scroll the screen as fast as it could. Under those conditions the Watcom profiler said that execution time was distributed as follows: 45.5% ltile_to_vram() 49.9% rect_vram_to_vram() 4.6% everything else Each new row or column is copied twice after it is grabbed, so getting them into vram takes about twice as long as a copy, and between them they account for almost all of the work of the scrolling. So the first 15 pixels require almost no time, then all this gets dumped into one frame... and that's not the worst. The worst case is when it's scrolling diagonally and the tiles are aligned so that it needs *both* a new row and column at the same time. Let's tackle these problems one by one. Do we really need to update all the pages at once? No, we don't. The copy to the visible page can be put off until the next frame, when it will be the active page. So instead of doing two copies we can just do one and set a global variable to indicate that another is needed. For instance, when scrolling East: /* grab new row of tiles */ put_tile_col( active_off, 21, tiles ); /* copy to other pages */ rect_vram_to_vram( active_off, back_off, 336, 0, 16, 240 ); deferred_copy = COPY_RGT; The deferred copy can be done right after the page flip, so the main loop starts like this: wait for retrace swap( visible_off, active_off ) window_at( visible_off, window_x, window_y ) if( deferred_copy ) do copy erase sprites from active page That's a big improvement, but if we take the percentages of execution time as arbitrary units of time, the worst-case diagonal scroll still looks like this: frame frame+1 Horizontal: 75 | 25 | Vertical: 75 | 25 | In the vast majority of games the user would never notice if the screen moved a bit horizontally before starting the diagonal scroll, so by deferring the vertical scroll we could spread it out to one of these: Horizontal: 75 | 25 | -- | (better) Vertical: -- | 75 | 25 | Horizontal: 75 | 25 | -- | -- | (best) Vertical: -- | -- | 75 | 25 | To implement this requires some additions to the code. First, when scrolling diagonally the horizontal scroll must always be done first. Then in the vertical scrolling functions we need to check if a deferred copy is pending. If it is, do not import a new row. void scroll_north( void ) { if( --window_y < 0 ) { /* check if we have deferred copy pending */ if( deferred_copy ) { ++window_y; /* cancel move */ return; /* wait until next time */ } /* get new row, etc */ } That will do for a one-frame delay, and it can be put off for one more by defining the flags for the copies with two flags in one int. #define COPY_TOP 0x11 #define COPY_BOT 0x21 #define COPY_LFT 0x31 #define COPY_RGT 0x41 #define NOT_YET 0x01 /* in main loop */ if( deferred_copy ) { switch( deferred_copy ) { case COPY_TOP : rect_vram_to_vram( back_off, active_off, 0, 0, 352, 16 ); break; /* other cases */ case NOT_YET : deferred_copy = FALSE; break; } deferred_copy &= 0x01; } So the first time through the loop it does the copy and clears all except the low bit. The second time that is cleared too, opening the door for the vertical scroll. ** The End (finally) ** So there it is. I make no claim to be the first to use this method, not by a long way. It's pretty simple and has probably been used by umpteen programmers for years. The only problem is that nobody bothered to explain it in any kind of detail, or at least not where I could find it. If anybody can think of improvements - especially an elegant way to add a split screen - I'll be more than happy to hear them. It would be pushing my luck too far to post the full code for my little demo, but if anyone would like it in e-mail, drop me a line. -]Frank[-