Let me start with the first set of questions;
If I keep the keying and thinking time as default and run with 100 virtual users, I am getting the same performance with HDD as I get with 960GB SSD which is 99.x. So how to see the benefit of an SSD ??
To see the benefit of the SSD vs. a HDD you will need to drive more physical IO, which is the opposite to what RDBMS do to improve performance, especially with an OLTP type workload. So for an extreme example to illustrate this point, running a test using an OLTP workload, either TPC-C or TPC-E, against a database where the OLTP database size is 100G and the database has 256G of memory, then the entire database will be in memory cache so no physical IO and thus, no difference in SSD vs. HDD. So even in the case of the dataset size is larger than the database memory, if the database is able to cache 80% of the data used, then the performance improvement will not be realized. If you want to show the improvement of the SSD vs. HDD you can several things. One, force the database to perform more physical IO by reducing the database memory. Or you can the workload to something which requires more IO, like the TPC-H which is a data warehouse/OLAP type workload.
So my question is , how this dell paper defines it’s logic of “Each user simulates the same TPC-E like workload, so one user performs the workload once, but running with 10 users would run the workload 10 times in parallel” thereafter aiming for 480 tpsE with 10 users running with 24,000 customer rows. Is the logic flawed ?
It doesn’t appear that the logic is flawed. For the TPC-E each virtual user runs the transaction mix. There are no direct relationships, per say, between the virtual user and the underlying TPC-E dataset. For the different queries each virtual random select the data based of what the TPC-E specification specifies. It is just that the random number generators used by each virtual user are seeded uniquely so as to generate different number streams as compared to other virtual users.
Now with the TPC-C, BMF does assign virtual users a specific warehouse ID and a district ID, but when the scale factor, which in the TPC-C case each scale factor represents a warehouse and each warehouse has 10 districts, does not allow each virtual user to be assigned a single warehouse id/district ID pair, then it will give the virtual user as many pairs as needed to cover the dataset. For example using a TPC-C scale factor of 10, so 10 warehouses X 10 districts/warehouse = 100 pairs, and only 10 virtual users. Each virtual user will be given 10 pairs or warehouse/district ids.
I hope this makes sense but let me know if you have any further questions.