These are mostly rough thoughts on an architecture designed to make
heavy use of a multithreaded model while still allowing some niceties
like strictly ordered event logs.
- Multiple threads and/or processes (collectively "units") cooperating
- Threads are assumed to be truly system-local, but may be on same or different processors, and data pushing may face NUMA delays as well as normal lock contention; relatively high trust and attendant efficiency
- Processes are assumed to be networkable, and pass event messages through network; low trust, higher overhead
- Master unit (AKA event server, AKA bottleneck :-) in control of event system and perhaps global simulation state but little else
- Other units converse with each other via the event server
- Heirarchy of event servers for scalability?
- Proxys/multiplexers of events, each acting as event servers?
- Event servers maintain the concept of ticks, corresponding roughly and abstractly to the passage of time
- Client units can be:
- fully synchronous (handle requests immediately and return)
- tick-synchronous (receive request asynchronously, but event server will wait for response before starting next tick)
- asynchronous (receive request asynchronously, no response guarantee, may send response event later)
- Asynchronous units may choose to:
- Conserve ticks, handling each before starting the next, with the possibility of getting behind
- Drop ticks, only processing the most recent tick when they begin a processing loop
- Combine ticks, doing all events from all queued ticks as if they all occurred in the most recent tick
* Event servers can:
- Run freely, producing ticks as fast as they can process each batch of events
- Time ticks, producing ticks each time a timer fires, event if the current event batch has not finished processing
- Wait for events, producing no ticks when idle
- Combine methods, producing ticks only when batches complete, but with a minimum spacing time or similar
- Event servers
- Local masters
- Master of masters (root of event server tree)
- Proxy (security, logging, recording, protocol conversion, etc.)
- Multiplexer (reduce net bandwidth requirements)
- Players
- Humans
- NPCs/bots
- Spectators
- Recorders?
- Utility
- Renderers
- Sound/music systems
- Network handlers
- Physics engines
- Global AI engines
- World coherence engines
- Game rules handlers
- External event gatherers (to bring real world weather into game universe, for instance)
- Server finders
- Parent init
- Global init
- Load all process shared data/conf
- Determine types of children to create
- Prepare reaper system and sig handlers
- Spawn child processes
- Process init
- Per-process init (according to type)
- Load thread shared data/conf
- Determine types of threads to create
- Open listening ports
- Spawn threads
- Thread init
- Per-thread init (according to type)
- Load unshared data/conf
- Go to run mode
- Goals
- Lag between user input and rendering as short as possible
- All processing units (CPUs, GPUs, PPUs, etc.) given as much time as possible to work
- Minimize empty pipes
- Minimize readback syncs
- Scale well, from single CPU running graphics, physics, etc. in software, to multiple CPUs, multiple GPUs, perhaps even multiple PPUs and others
- Possibility of multiple views, both in synchrony and totally asynchronous
- Proposed pipeline
- Thread 0: process events
- Update time
- Gather timed and triggered events
- Gather events from last frame (sync point for async local processing)
- Gather network/external events (sync point for network processing)
- Gather user input events
- Process fast path for all events, post slow path events to other threads
- Allow other threads to start
- Block for synchronous threads to flag done (sync point for synchronous local processing)
- Thread 1: graphics processing
- Reshape window, if needed
- Set basic state
- Clear buffers
- Process slow path graphics events
- Begin CPU-based computations
- Thread 2: physics processing
- Clear buffers, if needed
- Process slow path physics events
- Thread 3: global AI processing
- Clear buffers, if needed
- Process slow path global AI events