Replaying traces, Part 2

When I left off, I was discussing using SQL Profiler to replay a trace taken on a production server on different test boxes with different configurations. The end result is to see if we can virtualize our production OLTP database servers.

Our next step in testing was to test replaying a trace on the same server it was captured on. We couldn’t do this on a production server so we used our testing server and assigned our developers to hit the database with our application for one hour. Afterwards we restored the databases to the same state they were in and replayed the trace using the same options we use on the other server tests; using multi-threading, not displaying results, and not replaying server SPIDs. The replay of the trace took only 45 minutes. We probably didn’t capture a large enough workload. But we did rule out the overhead of running the replay when comparing to performance of the production server when the workload was captured.

Back to testing. Since we can’t exactly duplicate the replay of the workload to match the production server, we decided that running the replay on the test server would be our new baseline since we could duplicate that for every test. Testing on the physical box showed the following:

  Production 16 CPU (new baseline) 4 CPU, HT ON 4 CPU, HT OFF
Time

1:07

2:37

3:40

2:44

Avg CPU

17%

24%

83%

77%

We expected the 4 CPU tests to take longer and use more CPU. We were surprised that the 4 CPU test with hyper-threading turned off performed better than with hyper-threading on. So we ran one more test, using all physical processors and turning ht off. This gave us 8 physical processors to work with. This test, though, was incomplete. We ran it twice and both time the replay appeared to hang after completing 99%. Both times we needed to kill the replay after 5 hours. The counters we captured weren’t valid.

I’ll post on testing on a virtual server in a future post.

Technorati Tags: ,
m4s0n501