TPC-H Presampling and Sampling Phase??? What is it doing?


We’ve tried kicking off TPC-H with scale 100 or 200 but the PreSampling or Sampling step seems to take forever (over 24hrs) hence we decided to cancel the run.

Does anyone know what the PreSampling or Sampling step do? From our observation it was performing lots of read.

How can we shorten the Sampling step to get to the actual stream queries?

Sorry for the delay in answering but I was out sick the last couple of days.

During the Pre-Sampling phase BMF is sending queries to the database, but any statistics captured during this time phase will not be stored in the repository. This phase is used to allow the system to get to steady state before capturing statistics. For the Sampling phase, BMF continues to send queries to the database, but all captured statistics will be stored in the repository after the iteration completes.

Now typically the TPC-H is run as a Replay test, meaning that the user scenarios (streams) are executed only once, therefore the sampling period can not be set but BMF will execute the replay test with all querries being in sampling phase. Since BMF will run the complete stream, and thus all of the stream SQL querries, it can be in the Sampling phase for a long time if the querries are taking a long time, such is expected when running the TPC-H, especially at that high of a scale factor. If you want the TPC-H test to take less time to execute, you can execute it with a smaller scale factor.

I hope this helps.