View Single Post
Lt. Commander
Join Date: Dec 2007
Posts: 120
# 12
02-19-2011, 05:32 PM
Originally Posted by Rehpic
The number of people logged in at 11:30am PST (the time things started going bad the last two weeks) was about the same as the last two weeks. The difference is that we fixed several bugs.

The main problem was a bug that was causing mission map instances to stick around for a few minutes after the player(s) exited, when they should have closed immediately. That led to the shard getting overloaded (both CPU and RAM), which triggered a couple of other bugs. First there was a problem with the load balancer and how it dealt with machines that are getting close to full which caused already overloaded machines to get even more overloaded. Second there was a problem with our error recovery when a machine started running out of memory, which would require manual intervention of an ops person to clean up, rather than cleaning automatically.

Those problems have all been fixed, and things are running very smoothly today.
With all the enter/exit maps during those first few weeks, that would've led to several per a person. Good catch on that bug!