Hi, I was wondering is there any guidance or rules of thumb for setting the parameters (database size, number of simultaneous users) of benchmarks in MonetDB in relation to system properties (RAM, CPU, Disk, Network)? Specifically I'm looking to run microbenchmarks with TPC-H, looking at performance of each query in isolation, so for my setup a RAM vs database size relationship is probably most important. (This is to evaluate the impact of intra-query checkpointing capabilities I've added to a modified version of MonetDB as part of my PhD work.) Thanks, ~ Daniel.
Hi Daniel, IMHO, for benchmarking, there are only three "rules of thumb" --- well, in fact, I consider them "real" rules: 1) Be aware of all parameters that (potentially) influence your metric. 2) Document & describe in detail which value( range)s you use for all these parameters. 3) Do not draw any conclusions that go beyond the parameter value( range)s you did measure with. Obviously, these hold in general, not only for MonetDB. Having said that, for the parameters you mention, there is IMHO no such thing as "rule of thumb"; neither for MonetDB nor for any other system. If you want to know how the system performs under different number of simultaneous users, you need to vary that number accordingly. If you want to know how the system performs with different database sizes, you need to vary that number accordingly. If you want to know how the system performs with certain ratios of database size vs. RAM, you need to vary that ratio accordingly. If you are not interested in the impact of these parameters, then you set them to a value of your choice, and report that value (and your rational for choosing it) in your benchmark description / documentation. Of course, if you run only single-user scenarios, you cannot conclude from the about multi-user behavior and vice versa. Likewise, if you only experiment with database sizes that fit well in RAM, you cannot conclude from that on the systems behavior with database sizes that significantly exceed RAM, and v.v. If you want to analyze the performance of each query in isolation, this sounds likes a single-user scenario to me. Best, Stefan ps: I'd also be curious to learn more about what kind of intra-query checkpointing capabilities you added to MonetDB, and how, and how these are related to (read-only) TPC-H queries ... ----- On Apr 12, 2017, at 1:13 PM, Playfair, Daniel daniel.playfair@sap.com wrote:
Hi,
I was wondering is there any guidance or rules of thumb for setting the parameters (database size, number of simultaneous users) of benchmarks in MonetDB in relation to system properties (RAM, CPU, Disk, Network)?
Specifically I’m looking to run microbenchmarks with TPC-H, looking at performance of each query in isolation, so for my setup a RAM vs database size relationship is probably most important.
(This is to evaluate the impact of intra-query checkpointing capabilities I’ve added to a modified version of MonetDB as part of my PhD work.)
Thanks,
~ Daniel.
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
Thanks for your response Stefan!
It's helped me to clarify the scenario - indeed intending to be single user, fitting within RAM.
For this phase of experiments I'm comparing three solutions that present the same user observable failover capabilities,
So I need to a justifiable fixed set of parameters for the other benchmark parameters.
I think I shall select a scale factor that consumes 50% of RAM to safely classify as fitting within limits.
This also keeps things simple by dealing with powers of two.
Hopefully this seems justifiable and not overly conservative for a single user test?
Relating to my checkpointing, using TPC-H isn't ideal.
Any read-only benchmark will work, and one with longer queries would be preferred for building the case for my work.
But this should at least help to validate my models, get an idea of overheads etc.
I was looking at TPC-DS as a possible alternative, but am stuck with TPC-H for now as I need to get experiments running.
On general rules of thumb I was wondering if there was anything equivalent to the sizing guidelines available for some databases. (SAP has these for HANA, Oracle for MySQL etc.)
These tend to specify ratios of CPU per user, memory per user per database size, database size as fraction of ram, etc.
Since you seem interested, I'll create another thread to and introduce my work to try and keep this thread on topic about sizing.
Thanks,
~ Daniel.
-----Original Message-----
From: developers-list [mailto:developers-list-bounces+daniel.playfair=sap.com@monetdb.org] On Behalf Of Stefan Manegold
Sent: 12 April 2017 13:04
To: Communication channel for developers of the MonetDB
Hi,
I was wondering is there any guidance or rules of thumb for setting the parameters (database size, number of simultaneous users) of benchmarks in MonetDB in relation to system properties (RAM, CPU, Disk, Network)?
Specifically I’m looking to run microbenchmarks with TPC-H, looking at performance of each query in isolation, so for my setup a RAM vs database size relationship is probably most important.
(This is to evaluate the impact of intra-query checkpointing capabilities I’ve added to a modified version of MonetDB as part of my PhD work.)
Thanks,
~ Daniel.
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
participants (2)
-
Playfair, Daniel
-
Stefan Manegold