NOTE: This version of the log appears to have been cut off about halfway through. I have not had the time to try to find the missing log entries and merge them into this page. (I've got better things to work on these days.)
As with any large project, work on T3D had to start somewhere. It could be said that work on it started early in my graphics programming days, when I became frustrated with inaccurate, slow, and just plain ugly renderings, and started researching everything I could on the subject of realistic rendering. On the other hand, perhaps that takes us too far back. Perhaps more appropriate was the beginning of T3D as such, when I sat down with my computer, a Perl interpreter, and a few Perl add-on modules downloaded from the net, and decided in the typical fashion of hackers everywhere to try to change the world.
So began a long journey, with many interruptions of months at a time in which no progress was made, because I was too busy with other things, or had reached the point where research turned to reading rather than coding. So far, the world hasn't changed, but some time ago, I realized that for now, I had too much to learn. The goal hasn't changed, but it has been postponed, while I lay the groundwork in my mind and on the screen. What follows is a history of the work, of the changing face of the engine, and of the pitfalls and treasures discovered along the way. It is roughly categorized by the step number (analogous to a build or revision number) at which I made each major change. Follow with me now through an archeological dig into my hard disk:
At first, I simply needed to learn how to use the tools at my disposal. I started with Perl and GD.pm, a module written by Lincoln D. Stein to wrap Thomas Boutell's gd library, which provides an easy-to-use interface for drawing on GIF images. I started with GIFs, both because GD was so easy to use, and because it would be easy to post images for others to view. The first image shows the output of the simple test program, which simply created a few polygon-defining memory structures, and then called a drawing routine. The drawing code flood-filled the GIF with the background, drew the polygons on top, printed the total polygon count and total drawing time in ticks (milliseconds) in the upper left-hand corner, and finally dumped the finished image to disk. The two polygons took about half a second to draw; this was later found to be caused almost entirely by the library's slow flood-fill routine.
Now that I knew how to draw polygons, I started working on the 3D math. The first challenge would be to output a simple pyramid. To keep things simple, I restricted the object transforms to translations, and the view transforms to simple movements along the Z axis, up which the view was forcibly directed. On the plus side, the code performed perspective transformation and had already begun to branch out into a modular approach, with each major object (in the OO sense) broken into a separate code base, one each for points, polygons, objects, scenes, view transforms, and world transforms. In addition, the point code already contained the first ideas for optimization--point transformations were cached and lazy-evaluated (of course, this wasn't to prove useful until animation was added in step 4). Still, the second image isn't much to look at. It was at this point that I discovered the slowdowns caused by the flood-fill routine and the diagnostic messages I had been spewing forth. I decided to simply use the default black background for the GIFs, and clear the screen before each timed run. The combined result was an improvement of over an order of magnitude in rendering speed. This was important not for its own sake (these were only early tests, after all), but because the slowness in the GIF-handling code was masking any performance effects of changes to the 3D calculation code.
At this point, I added lazy-evaluation code to both the world and view transforms (though I really didn't use it yet), and allowed world transforms to include nonuniform scaling and both absolute and relative updates. After taking a code snapshot, I added arbitrary rotations and fixed a stupid bug in the lazy-evaluation code that led to a nice speed improvement; of course, this meant that GIF serialization alone was taking about half the total running time of the test script. In addition, the astute viewer will notice that a sample frame from this time period shows a bug in the poly counting code. :-)
Next I added an incredibly slow implementation of backface culling. Still, a sample frame from this step shows that it was indeed functional. A quick code snapshot and some fiddling later, I had a version that was not only faster (culling speed was improved by about a factor of three), but actually used filled polygons! I know, what a concept, right? I also switched to counting visible polygons, instead of all polygons.
I added the ability to move the VRP (View Reference Point) arbitrarily, though not yet the view orientation; I simply recoded the animation from the last steps to achieve the same visual effect by moving the viewpoint rather than scaling and translating the pyramid. When that worked, I added VUV (View Up Vector) and VPN (View Plane Normal) handling, and changed the animation to simply have the view point fly around the pyramid, pointing toward the world origin at all times.
The performance was so awful that I decided to improve the performance tracking code, to help in localizing the rampant speed problems; at the same time, I improved the performance in a few places that became obvious time sinks.
It was time, finally, to promote color to a first-class object; the simply intrinsic color polygon attribute became a far richer (and more extensible) material object, which specified intrinsic, ambient, diffuse, and specular colors. In addition, each color definition carried its own colorspace information, though only RGB is really supported at this time; also, color components changed to floating point numbers on a clamped 0.0 to 1.0 scale. Still, the images were nearly the same; the test program wasn't really showing off the capabilities of the new code.
While working on the new color handling code, I decided that the class heirarchy wasn't working well for all that I was now doing, and rearranged things a bit, adding higher level scene and view classes. Another stepping followed yet another class heirarchy change, this time to unite the color handling code that was duplicated in the material and light objects.
While debugging something odd-looking in the diffuse color code, I switched from the 5-face pyramid to a 288-face sphere approximation to get a better view of subtle color effects; at the same time, I improved the code for static object creation. Unfortunately the huge increase in face count changed the frame rate measurements from frames per second to seconds per frame. Another stepping was made after cutting the time a bit, and normalizing the viewport size in preparation for eventually allowing multiple different resolution output devices to share the same cached point location information.
It seemed to be just about time to start playing around with multiple objects, so I added some cheesy Z-ordering code, pulled the viewpoint back a bit, and changed the test script to rotate around both objects.
I spent the odd hours of a couple of days investigating why the math was so damned slow, and sped it up considerably by taking what I had learned about the slowness of the standard Math::MatrixReal module, and applying it to a new, specialized module of my own making. Of course, it was still nowhere near realtime, but there was much more to be done, and this made the next step slightly less painful. I also increased the number of faces on the sphere from 288 to 450, so that I could make better use of the 256 colors I had available (one for each face, less than half visible in any one frame).
With the higher-resolution sphere approximation, I was ready to actually add Phong style specular highlighting. The result looks a little strange, because the sphere is diffusely red and specularly green, but what the heck. I also switched from 20 frames per test to 30, so I could get a better feel for whether the math was right, and that led to finding a couple of bugs.
I had made various changes over the two months between this step and the last one, had forgotten what most of them were (yes, there is a big danger to not documenting your work while you are doing it), and decided to just step the project to sortof backup my work before I went crazy trying to find out why everything was still so damned slow.
I decided that part of my problem was the deep tree that represented a point in T3D. I flattened the structure, changed all of the places that referenced it, and tried it out, only to find the whole thing was 2% slower!
While speeding things up a few percent, I decided to start passing full 3D locations through to the output drawing routines. While not needed in the current engine, this cleaned things up a little bit, and provides support for future Z-buffer acceleration and 3D texturing. Step 25 somehow got lost in the shuffle, and I have no record of whatever it was. Probably I just lost count; winter is not exactly my time to shine mentally. :-/
I created a generic top-level module, T3D.pm, that provides various support routines, and has front-end functions that support unified object creation, somewhat like Tk.pm. At the same time, I moved the pyramid and sphere creation routines into a new shapes library, and went to a standard specular grey for the pyramid's color, instead of the wierd mishmash that it had been for so long; I think this makes the pictures look a bit less ugly somehow.
Because GIFs have only a 256 color palette, I had for a long time been trying to allocate a color for each polygon I drew, falling back to a closest-match algorithm if there were no free palette slots. I decided to switch to an algorithm that started by trying to reuse existing palette entries, if they were identical to within the precision of 24-bit source colors; if unsuccessful, the new method simply falls back to the old one. This made it possible to correctly draw an 1800-face sphere, which not surprisingly took forever to render. To track the effectiveness of the new color choosing algorithm, I added color count to the list of statistics printed on each GIF.
While nice for eye candy, the code was not nearly fast enough to support the high-resolution sphere, so I went back to the 450-face model, and cleaned up various bits of code to make better use of the features in T3D.pm. I also added better debug logging to the objects, using T3D.pm's ability to handle throttle logging by verbosity level, preliminarily on a scale of 0 (errors only) to 4 (log every math operation). I then added the ability to heirarchically dump a view and its decendents (nearly everything), to provide a rough measure of data structure complexity. Finally, having learned that I had been using extremely bad Perl style in certain places in my code, I cleaned them up and was much happier with the result.
After a nice winter break visiting family, I got back to cleanup and added the ability to heirarchically dump a view and its decendents (nearly everything), to provide a rough measure of data structure complexity. Having learned that I had been using extremely bad Perl style in certain places in my code, I cleaned it up and was much happier with the result. Finally, I enabled wireframes as an option again (I had been hardcoding filled polys for a while). Of course, backface culling was still active, which made the results a little strange, but whatever.
All of the little changes I had made in the last few steps were insignificant compared to the bugfix made in this step. World coordinates were being recalculated every frame even on static objects; since both objects in the test run are static (only the viewpoint moves), fixing this made the whole run five times faster. Well, I guess that proves that the caching is a performance win . . . .
Added support for the backend output renderer to report output resolution to the visible poly determination routines; though not used yet, this could be used to guide curved surface approximation routines. Also added a hook to make it easy to turn on and off specular highlights.
Finally! The first step toward truly interactive code, this step supports rendering either using the old GIF renderer, or the new Tk Canvas renderer. A whole bunch of random stuff was FUBAR in this step, and it became necessary to allow the user to turn off the sphere to actually see reasonable performance.
Finally gave T3D a sense of time, allowing the scene time to be a function of frame number, real time since last frame, or something even more arbitrary. This allows simulations to maintain a constant speed regardless of rendering rate, and at the same time allow high-quality renderings to take a long time to render without causing gaps in the virtual world time.
Worked around what appears to be a Tk bug (unfilled polygons get filled with the background color anyway), did a whole bunch of miscellaneous code cleanup, and made the performance statistics more precise and useful.
I had been calling a set of functions to transform a point individually for every single point; making versions of the transform code that could accept lists of points to act on sped things up by avoiding a good portion of the call overhead. I applied this technique of cutting down on spurious extra calls to continue to squeeze a few percent here and there out of the code. Purists will at this point call me utterly crazy, because I could easily get a 100- or 1000-fold increase in rendering speed by just using a faster implementation language, like C or assembler. However, at this point, I was trying to determine the performance characteristics of Perl, so that in the future I could make informed decisions about what chunks of code would need to be transcoded into a lower-level language, and which could stay in the original Perl.
I decided that it would be instructive to determine just how far this whole line of optimization could go, and just created a "carnal" module which did all of the calculation normally spread all over a number of modules, by completely disregarding object encapsulation, and operating on the data structures directly. This halved the runtime of the visible surface determination code, but that's not nearly enough to justify the unholy mess that is the carnal code. I added a free-running mode to the Tk test program, so that much more data could be gathered, and returned the output resolution to 600x450, as I had originally intended, back when that seemed like a reasonable target resolution for a software renderer. For reference, I normally run my copy of Unreal at 512x384 on a PII-266.
At long last, actual interactivity--though it was as simple as allowing the user to hold the mouse down and drag the VRP around the X-Y plane, watching the results. Not spectacular, but nothing to sneeze at, either.
This step is mostly just a backup of little changes made since step 41, as after this came a very long break in T3D programming while I worked on other things (notably, broadwell.org). The last existing log file reports that a frame rate of 3.67 fps had been achieved. Clearly, it was time to drop the pure Perl stance and use something else to improve calculation speed. The next thing to try is PDL, the Perl Data Language, which basically makes N-dimensional arrays a basic Perl data type, with C-coded calculation kernels for the standard array operations and a nice extension mechanism for adding other operations. It is reportedly much faster than raw Perl code (though not nearly as fast as, for instance, the assembler-optimized Intel Math Kernel Library), and considerably lighter on memory usage for large data structures. For the curious among you, yes, I did try the Perl compiler; if I remember correctly, it produced a mere doubling in speed, while producing a humongous C source file and executable. Not worth it, in light of the pain involved in using it.
During the long wait, I set up a Linux system (RedHat 5.1), and installed the latest development perl and all of the modules that seemed useful, including PDL. I then fixed a couple of minor things so that the code would run either on my NT box or the Linux box, and set to converting all of the code from pure Perl to PDL. For this step, I converted over several of the classes to use pdls instead of T3DMath objects or arrays. To try to take advantage of PDL's ability to thread (iterate) operations over multidimensional data, I converted T3DPoint to T3DPointList. This had the effect of making vertex transforms blazingly fast, but this could not make up for the slow performance of the code using individual pdls for each object instead of a list pdl. It turns out that PDL is so optimized toward handling large data structures well, that it is very slow in handling small structures. This simply means that at least T3DPolygon and T3DMaterial will need to be converted to use lists now.