How did you get any reliable data from that short test?
As a programmer, if I was testing this, I wouldn't need actual users to test it out... I'd just create a bunch of simulated logins (actual logins as far as the server is concerned) during that time, and analyze the results on that. As users would get affected during that time, just inform people that the test is going on, do the test, some might get stuck during that time, but all I care about in monitoring the full queue.
20 minutes is plenty of time to perform a test, if done right