stephendeken.net a blog of insignificant proportions

The GPU Is Not A Magic Wand

So I’m writing a game engine in C++, on top of which I’ll be building a single-player RPG.  I don’t have a lot to show yet, just black triangle stuff, but I wanted to share a particular event.

I’ve just gotten to the point where I’ve got the very basics of a rendering pipeline working, and I wanted to see how many polygons I could push, and at what framerate I could push them.  To do this, I wrote a very basic mesh class that basically makes a bunch of colored triangles using random coordinates and colors, bundles them all up into a single batch, and sends that batch over to the video card.  I enabled multisampling, because I don’t like looking at jagged edges, and I enabled blending, because I wanted to have the triangles be translucent.

Then I fired it up using 200 triangles and it worked like a charm.  All the data is on the video card, the CPU should basically be sitting idle, and the video card can just do its thing as fast as it can.  And that’s fast, right?

Well, I wanted to see how much it could handle, right?  So that’s what I did.  I upped the count to 20,000 triangles (with random coordinates and colors, with multisampling set to a 4×4 grid, with blending enabled on every triangle) over to my video card and said “have fun!”

I guess the GPU had fun — it locked up the machine pretty hard, and I think it took me about five minutes until I was able to recover control.

What happened?  200 triangles worked fine.  At 2,000 it was a little laggy (I discovered this after the fact).  At 20,000 I couldn’t even interact with the desktop long enough to hit ‘stop’ in the debugger.  Something was wrong; I should be getting far more than 1 frame per second with 20,000 triangles.

Was I accidentally using the software renderer?  No, the renderer string  says ‘ATI accelerated’.

Was it a problem with using the video card’s memory versus CPU memory?  No, it had similarly poor performance with vertex arrays in CPU memory, and even with immediate mode.

Was it a problem with buffering or video depth?  No, all that seems to be fine; changing things around had no measurable effect.

Was it the antialiasing or the blending?  No, disabling those improved it only slightly.

Was it maybe a problem with the driver or video card itself?  No, because tons of games play just fine, and they push way more than 20,000 polygons at once, so it’s got to be something I’m doing wrong.

So what’s the problem?  I chased my tail for a while, poking at various options, and then — on a lark — I turned off rendering, so that all of the calls I made didn’t actually do anything.  Boom — I went from 1 FPS to well over 1000 FPS.

That was my Houseian moment.  Performance increased when I turned off rendering because the GPU was doing too much, and now I was making the GPU do nothing.  It was slow before because the GPU was doing too much.  But how was that possible?  I’m only pushing 20,000 triangles, and modern games push way (way) more than that and still get decent framerates.  Therefore, the problem is not that I’m using too many triangles — the problem is that the triangles I’m using are making the GPU do too much.  Most of them take up nearly half the screen; the rasterizer has to run through half the screen size nearly 20,000 times in order to draw a frame.

Yes, the GPU is fast, and it can do a lot at once, but it’s not a magic wand.  It still has to go through, pixel by pixel, reading and writing colors in the various buffers, and if you make it do that 20,000 times at 1024×768 (supersampled to 4096×3072), it really will have to do about a half billion operations or more.  It’s still silicon under there, you know, not an extradimensional pocket of pure graphics.

(I changed the triangles so they were much smaller (about 20×20 pixels each), and ran it again.  It was fast and smooth, pushing a framerate that was high enough I didn’t care to remember what it was.  I increased the number of triangles up to 200,000 and it was still pretty okay, not nearly 60fps, but still not bad.  Of course, this is all without any game logic, mesh deformation, event handling, etc.)

KeyCastr 0.8.0 Released

KeyCastr 0.8.0 has been released.  Source is available on GitHub.

Diary-X Composer

Back when Diary-X was still around, I was writing a Mac OS X application I called ‘Diary-X Composer.’  It was intended to be an offline version of Diary-X, allowing you to keep a journal on your local computer, and optionally sync it to the Diary-X servers.  I designed a glossy blue book icon for it, because I seem to always do the most useless parts of application development up front.

I never actually released a finished version — I posted one disk image, once, that was a very very early alpha version, basically just using the existing Diary-X code with a custom-built version of Perl.  I did, however, put my little blue book icon on it, because I thought it looked really cool, and I really seriously have a problem with doing the useless parts first.

Anyway, a few months back, Michelle and I consolidated our web hosting accounts to a neutral hosting provider (one that neither one of us had used before): HostGator.  The very first time I logged in to our account, my eye was drawn directly to their FAQ icon:

hostgator-faq

That’s my little book icon!  It’s been doubled for some reason, apparently because there’s a lot of frequently asked questions.  Every time I log in to the admin panel it makes me smile.

You’d think I’d know this stuff by now.

I’ve been using C and C++ for about ten years now, both personally and professionally.  And yet, I’ve never encountered a specific bit of syntax that I really should already know about: bit fields.  You can specify the specific number of bits required for integer members of C structs, allowing the compiler to take care of the bit masking and marshalling for you:

typedef struct foo {
    unsigned int flag_0 : 1, // one-bit variable (0..1)
    unsigned int small_enum : 4, // four-bit variable (0..15)
    unsigned int other_index : 3,  // three-bit variable (0..7)
} foo;

This struct will be packed into a single byte (actually probably a single int), which can potentially save quite a bit of space.  You could do the same thing manually by specifiying one “flags” field and doing the bit-twiddling on your own, but the bit fields make the meaning and the code a lot cleaner.

I’m embarrassed to admit that I didn’t know this even existed until yesterday, when I had to open up the NSWindow sources to look something up.

KeyCastr 0.8.0 Coming Soon

I’m very close to finishing KeyCastr version 0.8.0. This version is a complete rewrite from 0.7.x, with improvements across the board:

I’ve already promised a few people that it would be released two weekends ago, but I wanted to let everyone know that it’s really very close to being finished and will be released soonish.

Moving a WordPress Installation

If you’ve moved a WordPress installation from one domain to another (or just between directories on the same domain), you might find that you can’t log in, and that lots of the CSS and images are missing.

The problem is that WordPress stores a partial URL in the wp_options table, and that partial URL is used to submit the login form, among other things.  So, you can do one of a few things:

  1. Reinstall WordPress at the new location,
  2. Move the installation back to the original location and make the changes via the GUI, or
  3. Hand-edit the database to make the required changes.

If you choose option #3, there are two rows in the wp_options table that need to be changed.  The option names are ‘home’ and ‘siteurl’.  Changing those will at least allow you to log in, and any further changes can be made via the GUI.