Stuph

Single-Electron, Gigabyte-capacity memory

I recall reading at www.techweb.com that low-power-consumption, Gigabyte-capacity memory has been developed and will be available in the near future. However, its use is limited by the access time: several microseconds.

IBM Announces Massive Breakthrough in Chip Design

Finally IBM has finally figured out how to use copper instead of aluminum wiring inside computer chips, and is being sure not to let the secret leak out. The copper-based chips will be able to integrate six layers of metal on a chip, and shrink chip circuitry down to 0.2 microns, the smallest fab in the industry. The new transistors will also feature a .12 micron effective channel length, which translates directly into improved performance. The CMOS 7S, as it has been dubbed, should show up in microprocessors in late 1998.

The Future...

A while ago I heard about Protein-based memory, which was going to be able to hold 4GB in the space of one cm³. They seemed to be able to manipulate synthetically-produced protein molocules to store data, though reading the data back was a bit of a problem. Plus, the access time was painfully slow - one microsecond or something. So, they were dreaming up ideas like using this RAM as a fast area to replace your hard drive swap file. Well, it likely doesn't matter.

Indeed, terms like "gigabyte" might become obsolete, and "terabyte" will be the primary unit of storage. The figure given for when this memory will be available is about the same as for the protein memory: about ten years. Oh, were it 2008... here's the article from boot:
"
The lab coats over at NEC have come up with a new trasistor that will allow the company to produce 10-terabit (or ten trillion bits) DRAM. Current DRAM capacities are just reaching 64-megabit capacities; the new chips can hold nearly 164,000 times more data. In other words, replacing the 64Mb chips on one of today's high-end 128 MB DIMMs with their 10Tb counterpart would result in modules with 16 million megabyte capacities.

The two-gate construction of the STT (or Surface Tunnel Transistor) sports the first 14-nanometer gate, surpassing the former 14-nm plateau. The breakthrough was made possible when researchers stopped fighting the quantam effect, and discovered an entirely new way of building a transistor.

Unfortunately, the breakthrough technology doesn't work with standard lithographic processes, instead requiring a esoteric electron beam, and is at least a decade away from plugging into your PC.
"
I have to wonder if, when this technology becomes available, cheap, large capacity RAM will replace permanent storage space and people will start grumbling about the cost of keeping their computer memory powered all the time... and blackouts will be more than a passing inconvenience.

MY SEMI-TECHNICAL PROPOSAL FOR NEW VIDEO GAME TECHNOLOGY circa 1998

Okay, I've been saying this for years: REAL Virtual Reality! What's with strapping some heavy box on your poor eyeballs when the Russians already have the technology to totally immerse the player in an alternate reality? Think of it. You stick on some neural interfaces and BOOM! Your brain has control transferred from your body to a virtual one!! Suddenly you are in this Arwing cockpit and you look down at your furry hands and whoosh! Suddenly there's this huge accellerration as you are thrust forward into an
interstellar battle. Gives the Rumble Pak a run for its money, no? Anyways, the possibilities are endless. You could be the pieces in Tetris, the Ball in Mega 3D Arkanoid or the Chamelion twist dude. The best part would be getting annhililated in M.K. V.

The only catch, of course, would be that you loose control of all bodily functions, so you'd have to be locked in a padded room with a drain in the floor. But hey, it could even be made portable! They could combine it with the technology from Honey, we Shrunk Ourselves and Alladin's lamp so it'd be a compact system. (Gee, what would that be called? Hmm, how about "Virtual Game Boy Pocket"?)

Congress Adds 'All Your Base Are Belong To Us' Amendment To Bankruptcy Bill
WASHINGTON, DC-- Seeking to increase fiscal accountability among citizens who have no chance to survive make their time, the House of Representatives added an "All Your Base Are Belong To Us" amendment Monday to H.R. 333, the Bankruptcy Abuse Prevention and Consumer Protection Act of 2001. "What you say!!!" shouted the bill's sponsor, Rep. George Gekas (R-PA), following the amendment's approval. "This bill will not only make debt-ridden Americans more accountable, but it has the added benefit of taking off every 'zig' for great justice." Opponents of the amendment protested that it would potentially set up U.S. the bomb. Copyright 2001 Onion, Inc., All rights reserved

P299 - Quantum Ontological Hermeneutics: A Neo-Reconstructionist Approach

In our apparently jointly covariant four-dimensional manifold of space-time events, normalized metric fields for our specific worldline are shown to be inadequate for a self-consistent relativistic cosmology. In fact, such a cosmology can be shown to be intrinsically tied to subjective, positivist hermeneutics and not substantive invariants, meaning that we probably do not exist - not merely in spite of our subjectively perceived reality, but precisely because our reality is perceived subjectively, not objectively. Once this theoretical foundation has been established in the first week of class, students are dismissed for the remainder of the semester since none of this really matters anyway. Instead, coupons are provided for discounts on bowling and golf. Extra credit is given for attendance at Green Bay Packer games.

Articles/Opionions

Dynamic recompilation (written in 1997 or 1998)
Copied from the SNES Knowledge Base

I thought no one would ever ask, but then someone did. So I wrote up a small doc on the subject. What's dynamic recompilation? It is a method that tries to make an emulator faster by translating instructions from the machine being emulated to native machine code on the target processor. This article doesn't cover all the details; with the 65816 in particular (the processor in the SNES) there are many issues to deal with that are not covered here. I also left out an important discussion about memory management for recompiled code.

In a normal emulator, the logic works something like:

while (cycles_left) {
    load several bytes of memory at PC;
    increase program counter according to size of instruction;
    decrease cycles_left according to number of cycles used by instruction;
    switch (opcode) { // Opcode
    case 0x00:
        emulate instruction opcode zero;
        /* This includes decoding the addressing mode and performing the operation. This may mean calling two functions, but in most emulators, two macros are used instead. There are, of course, exceptions where some opcodes have special functions. */
        break;
    case 0x01:
        emulate instruction opcode one;
        break;
    }
}

Notice the variable cycles_left: when this reaches 0, the loop breaks so that the CPU can do other things, such as rendering a line of the graphics screen or doing operations in a coprocessor. It is important to realize that cycles_left is often not very large. For example, in a SNES emulator it is typically about 170, and each instruction takes from 2 to 7 cycles, so there are about 50 instructions before the loop breaks. This limits the length of the recompiled sequences you create to about, say, 10 instructions, to avoid running the cycles_left counter TOO far below zero. (A little bit is O.K.)

The task is then to convert whatever gets emulated to something approximating code executable on the target processor. This is rather tricky, but here's what a top-level loop MIGHT look like:

byte *recompiled_vectors [number of possible entry points];

while (cycles_left) {
// PC means program counter
if (recompiled_vectors[PC] == NULL) {
allocate memory for a new recompiled sequence
translate a sequence of code starting at PC into native assembly
}
call recompiled_vectors [PC/Entry_point_granularity]
}

First notice that big array at the top. This is simply an array of pointers to recompiled sequences of code. If the system has 4 MB of memory, for example, and instructions may start on any byte boundary and be as small as one byte, then this array will have an entry for every byte. On 32-bit x86 machines, this array would therefore be 16MB (4MB * 4 bytes per pointer). Obviously this is a gigantic waste of memory, since about 98% of this array would be end up unused.

The problem is that you need a pointer to point to every sequence of recompiled code, and you need to be able to access this array as fast as possible. Since the CPU can evaluate an expression like recompiled_vectors[PC] very fast (except for possible problems with cache misses and page faults), using a giant array for this purpose is quite efficient. Other methods of storing the pointers require search algorithms etc. which I imagine can be very slow. In order to have the pointers readily available without using a gigantic array, really complex algorithms might be required.

One of the central ideas behind dynamic recompilation is that most programs execute the same sequences of code hundreds or sometimes millions of times. The first time the code is executed, then, it is okay if the process of recompiling is very CPU-intensive because the recompiling is done only once. Therefore, with dynamic recompilation, the first time emulating the code it executes very slowly, and on successive runs it is lightning fast.

If it is found that a sequence of code has not been recompiled before, it is compiled. How do you compile the code? I won't spell it out--the algorithm can be very large--but here's an example. The INC (add 1 to a memory location) instruction on the 65c816 looks like this:

INC <address or immediate value>

let's focus in on just one version of this instruction: Absolute addresing at $1234 when the processor is in 8-bit mode (M=1):

INC $1234

The process used to emulate this instruction using my current SNEqr CPU core is like this (don't complain about my code, I haven't fully optimized it yet):

loop_m1x1:
mov ebx, edx        ; Memory at PC
convert_snesptr ebx
mov ebx, [ebx]      ; Load bytes at PC:   6D 34 12 xx
movzx eax, bl
lea eax, [opcodetable_m1x1+eax*4]
call [eax]          ; Emu code for opcode 6D
jc looptop_check
test esi, 80000000h
jz loop_m1x1

opEEm1x0:
opEEm1x1:
load_abs ; Set EBX to pointer to whatever is stored at $1234
docount 3, 6 ; add 3 to PC (EDX) and subtract 6 from cycle count
inc8
dobreak

With macros expanded:

op6Dm1x0:
op6Dm1x1:
; convert_abs_data macro        ; Converts OPDATA (EBX) to SNES 24bit address
; value returned in EBX   ; For absolute addressing modes
shr ebx, 8
and ebx, 0FFFFh
or ebx, dword [_reg.DBR_2less]
; convert_snesptr macro snesaddr <---ebx
; Takes snes address and converts it to a PC pointer. reg is 32-bit.
; EAX destroyed.
mov eax, ebx
shr eax, 11
and ebx, 01FFFh     ; offset
and al, 0FCh
mov eax, [_startaddr+eax] ; base pointer
add ebx, eax        ; add 'em
; docount macro addtopc, subfromcycles <---3, 6
add edx, 3
sub esi, 6
; inc8 macro
mov edi, ebx     ; remember address
; trapandread8 ebx ; read memory byte
; trapandread8 macro pcptr     EXPANDED BELOW
; Traps the read of a PC pointer pointing to a part of SNES address space
local notrap
cmp pcptr, offset _registers
jb notrap
cmp pcptr, offset _registers + 4000h
jae notrap
mov byte [_reg.P], cl
mov dword [_reg.PC], edx
mov dword [_scan_cycles], esi
push eax
push edx
push ecx
push dword 0
push pcptr
call _trapregread
pop eax ; arg 1
pop edx ; arg 2
pop ecx
pop edx
pop eax
notrap:
mov pcptr, [pcptr]
; End of expanded macro
movzx ebx, bl
and cl, 0FFh-FNEGATIVE-FZERO ; clear flags affected
inc bl
or cl, [nz_8bit + ebx]
; trapandwrite8 edi, bl
; trapandwrite8 macro pcptr, reg EXPANDED BELOW
; Takes data from reg and puts it in "snesaddr" and traps the write
local notrom, notrap, isrom
cmp pcptr, [_rom]
jb notrom
cmp pcptr, [_rom] + 300000h
jb isrom
notrom:
mov [pcptr], reg
cmp pcptr, offset _registers
jb notrap
cmp pcptr, offset _registers + 4000h
jae notrap
mov byte [_reg.P], cl
mov dword [_reg.PC], edx
mov dword [_scan_cycles], esi
push eax
push edx
push ecx
push dword 0
push pcptr
call _trapregwrite
pop eax
pop edx
pop ecx
pop edx
pop eax
notrap:
isrom:
; End of INC macro
; dobreak macro
clc
ret

Whew. That's all the code required to emulate INC! Now, there were a lot of redundant
checks and loading done in the above instruction, which is one reason why dynamic recompilation can be so fast. Now, let's take a look at what a compiled version could look like:

(Assume ESI is the cycle count, EDX is the PC and CL is the processor status flags P, as before.)

recompiled:

In the previous emulated version, a process converted from the absolute address ($1234) to a full 24-bit snes address ($DB1234), where DB is the contents of the data bank register. The recompiled version is only slightly shorter:

mov ebx, dword [_reg.DBR_2less]
or ebx, 00001234h ; Load $DB1234

Then it must be converted to a x86 pointer and loaded (no real change here):

mov eax, ebx
shr eax, 11
and ebx, 01FFFh ; offset
and al, 0FCh
mov eax, [_startaddr+eax] ; base pointer
add ebx, eax ; add 'em

EBX now contains pointer to data at $1234.
Then do the cycle and byte count:

add edx, 3
sub esi, 6

Now, in the interpretive emulator, a check was made to see whether the read was from a MMIO register. Because of the address ($1234) it is already known that the address CANNOT access MMIO; therefore, the check is omitted and the memory is loaded directly.

mov edi, ebx ; remember address
; Macro for reading memory is now only one line:
mov ebx, [ebx]

movzx ebx, bl
and cl, 0FFh-FNEGATIVE-FZERO ; clear flags affected
inc bl
or cl, [nz_8bit + ebx]

Again, upon writing to the memory, the MMIO check is ommitted, but this time another check is ommitted: the write check. $1234 cannot be a ROM address, so that check is redundant, and again the macro is only one line!
mov [edi], bl

ret

Whew. Some instructions and operands can be optimized more than others when compiling. Notice, in the previous code sequence, that I have left out some obvious optimizations such as instruction scheduling and maybe incrementing the memory directly. Remember that dynamic recompilation uses a COMPILER, and unless you're willing to go a few extra miles, optimizations are not possible.

Now then, extra benefits can be had by dynamic recompilation when you go from emulating just one instruction in a sequence to many.

Take a 65816 code sequence like this:

Code_seq_start:
SEP #$30         ; 8-bit mode
LDX #$00         ; For X = 0...
Loop_start:
DEC $1234,X      ; (*((byte*)0x1234+X))--;
INX              ; X++
CPX #$10         ; X != 10?
BNE Loop_start   ; Loop again

Assume this is the first time executing this part of code, so the dynamic recompiler kicks in.

A recompilation algorithm would write a routine that emulates ALL of these instructions, up to and including the BNE. The following optimizations could be done in the process:

* Since these instructions are done one after the other, there is no call/return/loop overhead except at the very beginning and end of the sequence.
* This is a big one: Flags don't always have to be set. For example, the DEC instruction sets the 65816 ZERO and NEGATIVE flags, but so does the instruction after it, INX. Therefore, the flags don't have to be set when DEC is executed because their values will be overwritten anyway. Likewise, CPX also sets the ZERO and NEGATIVE flags, so the emulation code for INX doesn't have to set the flag. The only instruction that really has to set the flags is CPX.
* The PC and cycle count don't have to be updated after every instruction, but can be updated as a lump (there may be exceptions to this rule, but I won't go over them here.) For example, the first instruction is 2 bytes, the next is 2 bytes, then 3 bytes, 1 byte, 2 bytes and 2 bytes. In a normal interpretive emulator, this would mean doing 6 separate ADD instructions, but with recompilation we can do it with just one: ADD EDX, 12.

Now you may wonder why compilation stops at BNE. This is because BNE may branch to another part of code, and the dynamic recompiler doesn't know whether it will or not, so it just breaks off. If the branch is not taken, then recompilation will restart at the instruction after BNE. If the branch IS taken, note that there is no easy way to branch into the middle of the already-compiled sequence. If you'll remember, way back at the top of this document, there was an array of pointers--one pointer for each recompiled entry point. The only entry point that exists for this sequence of code is at Code_seq_start. The pointer for Loop_start will be NULL, so the recompiler must compile the four instructions at Loop_start *AGAIN*. This should not cause any significant code slowdown or other problems, though.

If there is a really long sequence of code without any branches, the compiler must break off sometime. 10 instructions seems to me like a reasonable maximum recompiled code sequence length.

So there you go. That's the basic idea of dynamic recompilation.

There are a few problems to keep in mind with dynamic recompilation. Firstly, some programs have self-modifying code, which can be a big problem! Secondly, as well as being hard to write, a dynamic recompiler can be very hard to debug. If you have used the zSNES debugger etc., you know that you can trace one instruction at a time. With dynamic recompilation, you loose that ability and must trace 10 or so instructions at once, without knowing the exact state of the flags between instructions!

This doc should scare off potential users of dynamic recompilation, I guess. Mail me if you can think of some improvements.

Nostalgia Pirates (written in 1997 or 1998)
Should people be able to pirate ROMs?

The standard response from any emu author, when asked his position on pirating ROMs is as follows: "No." This may or may not be followed by "Comment." Other classic statements include: "Thou shouldst not download ROMs which thou dost not own", "I am not responsible for [xxxx] and will not be held liable for [xxxx]", and "I do not support piracy in any way, shape, or form—never have; never will." Oh, Right....

I'd bet there isn't a single Emu author out there who hasn't downloaded or copied a ROM they didn't buy. (I'd also lose that bet, but not by a large margin.) I will even openly admit that I once had a NES game archive of 500 games. You know what? I have played hardly any of them. Most of them just sat there, worthless lumps of bits with an increasingly older access date stamp. Why? I dunno. Why do some people keep collections of dead butterflies? Why do tourists come back from foreign lands with thousands of useless trinkets? It's the same reason. I guess it's fun to boast, and compete for the biggest collection. Most of the high-bandwidth warez scene is like that, too.

That's one of the things that causes people in some statistical organizations to post "losses due to piracy" of millions (billions?) of dollars. They are under the misguided impression that a non-get is the same as a loss. It has to be based on a premise that, if someone had not pirated a piece of software, that same someone would have bought it. I bet this is false in at least half of cases. For instance, there are tons of people who already own Win95 OSR2 who are pirating prerelease copies of Windows 98. The stats people would count all those copies as losses, when they are probably not. Do you really think that someone who owns Win95 and MSIE 4.0 would bother to pay an extra hundred smakers just so they can have their title bars in two colors? Probably not. Do you think that, just because someone pirates a copy of Word, Microsoft is losing money? Well, I don't really care for Word, but—hypothetically speaking, of course—if I pirated a copy of it, that doesn't mean I would have paid for a copy. You see, although I would rather use this darned (free) Netscape Composer than pay for a copy of Word, if Word suddenly appeared on my hard drive, then that doesn't mean I wouldn't use it. So when the statistics dude comes along and sees me using Word, he would mark it as a loss—however, as I have gently implied, I would never buy Word no matter what.

I find the Game Boy fiasco Nintendo is dealing with rather amusing. Nintendo must think it's losing lots of money because of Game Boy emulators. Hmm. Do they think so? I don't. The Game Boy is a very portable game system, and "pirating" a ROM to play on your desktop PC can't cut into the portable market very well. There is another reason why it does not. For instance, I have a Game Gear. (As an aside: I'm looking for one single RPG for it! Argh!) I'm on a limited budget, and I had to choose between getting a Game Gear and a Game Boy. (The Sega system was a bad choice—but that's not the point.) Now that I have a Game Gear, I have decided not to get a second system. So if I did download a Game Boy ROM, could Nintendo be losing money? No, because I don't have a Game Boy anyway. Going back to the previous point, suppose that I did have a Game Boy. If I saw a copy of Link's Awakening for sale, might I buy it even though I've already pirated the ROM? The answer is yes, actually, because unlike playing on a genuine GB, I can't lug my PC into the car and play the pirated ROM while on the road.

That reminds me, by the way, of an unrelated point. Nintendo didn't lose any money if the ROM I 'pirated' was not available in stores. How can a company lose money to piracy on games that cannot be bought? Huh? I mention that because I've somehow never seen a copy of Link for sale... but it applies, more broadly, to a huge section of the pirate scene: pirating old games.

I was really irritated by IGN64.com's statement about the N64 version of M.A.M.E., which said said using it was "all very illegal"*. (They are, in fact, correct, but "very" illegal?) Techically, for instance, downloading many old arcade ROMs for MAME and even games for systems such as the NES, TG16, and (dare I say it?) SNES is illigal. Why? Look, the systems are dead. The game manufacturers' assembly lines have long since devoted themselves to making N64 carts. People should be able to try out a few of those games they never got to try.

I don't, however, want to sound like I am totally in favor of piracy. Being a long-time programmer (I'm gonna be eligible to vote soon, in fact!) I understand the financial pains of some piracy: I paid $350 Cdn for Watcom C++, and made precicely zip trying to sell my home-brewn software. I can imagine what it is to go to massive efforts to create a video game or other software, and then go broke because your market is composed primarily of pirates. For that reason, I am generally against the piracy of (fairly new) software. It really pisses me off whenever somebody pirates something that I had to pay gobs of money for. Thus, I shall present my official statement to the world concerning piracy: No.

* I don't use MAME, but if I have a sudden urge to play Centipede, it's my God-given right to do so!

End of 1997 - State of the Emulation World (written in 1997)
From Novelty to Supercalifragilisticexpealidocious.

1997 has been a phenomenal year for emulation. At its outset, the typical system was a high-end 486 system, on which you could emulate approximately: nothing. Yeah, sure, the Game Boy emulator sort-of did it... but really, who can handle 3 fps?

Especially in the first half of 1997, emulator authors were hard at work optimizing their programs, so that even a Super Nintendo emulator could run well on a 486 DX4/100. At the same time, Pentium prices lowered and most of us were able to get Pentium 100, 133, or even 200MHz systems. Now, the window is opening for PlayStation emulator(s) to run on PC's, and on the horizon we even see an N64 emulator [some 2D emulation now works!] Indeed, if you are new on the emulation scene, you can consider yourself very lucky.

Now, that same system that played such a jerky version of Game Boy can play the exquisite NESticle emulator at 15 fps, and there are much, much, more of everyone's old favorites from the past available on the Internet. The aforementioned Game Boy emulator is now 20-30% faster and plays well (after tweaking) on any 486. Mode 7 and SPC700 (sound processor) support are standard on the SNES emulators; Genecyst and KGen are godsends for the Genesis. I shouldn't leave out the arcade emulators, though I don't care a whole lot for them: I do believe M.A.M.E. (Multi Arcade Machine Emulator) runs over 200 classic games, and Callus (from the author of NESticle and Genecyst) runs the Street Fighter series of arcade games and more.

I had previously been bashing the idiots who believe that a N64 emulator existed, but that was only because there were fake N64 emulators going around. Now, they actually seem plausible. My first estimate was that you'd need a 4,000 MHz processor to emulate the N64, which was based on an extrapolation of the fact that, to run SNES96 well, you ought to have a 166 MHz Pentium—and that was only to emulate a 2.6 MHz Processor with a 2D graphics system! However, there are some things that make N64 emulation easier. Probably most important is that, by sacrificing "full" emulation capabilities, 3D video hardware can be used to display the graphics; new 3D cards are coming out that mostly surpass the N64's polygon-pushing power. Secondly, partial-compilation techniques may be used to reduce the amount of pure emulation that is done and figure out certain things before the emulation starts. Since the N64 is a very RISC processor, 94 MHz (the N64's main CPU clock speed) may not be such a big obstacle after all. However, there are some Coprocessors to worry about.... Anyhow, I revise my estimate. With a 3Dfx or higher-power 3D graphics card, I'd say that you'll be able to reasonably emulate the N64 with a Pentium II 300 MHz machine with 64 MB of RAM. When the emulator becomes playable, I don't expect it to be uncommon for people to have such a processor. I wonder how they'll go about emulating the 3D stick and the Rumble Pack?

August 2000: Oops, went from a too-high estimate to too low. Perhaps someday we'll have highly optimized emulators running on 300Mhz machines, but you probably need at least 350Mhz, even on high-quality hardware. This is so despite the innovation (first seen in UltraHLE) of "high-level emulation" which maximizes speed by "emulating as little as possible". It is interesting how I wrote this article, not three years ago, with the idea of 4000Mhz machines seeming outrageous, yet now 2Ghz seems to be on the horizon (the current fastest officially-marketed x86 is the 1.13Ghz Pentium III, with overclockers going up to about 1.3Ghz.).

Hosted by www.Geocities.ws