Fault Tolerant Game Server

Design and created a fault tolerant self-scaling game server architecture that provided better than 99.99% uptime as site membership grew from 500 to 500,000 users, able to redistribute its own workload from one server to an entire server farm autonomously.

The architecture was self-monitoring and self-healing in the event of issues. Game State information, such as player names, points and tokens, pending actions related to the game or players etc., was stored redundantly on multiple machines. If it ever became unavailable on one machine, all processes were able to switch dynamically to work with other copies of the information.

The overall game site architecture also required a lean, efficient, fault tolerant messaging architecture and communications protocol. Under Jon's direction, the team created a custom message oriented middleware robust enough to keep all games running, without any noticeable interruption, even if a server were to be unplugged or suddenly fail.



Created with Artisteer