Roberto Cornacchia wrote:
I come back to this issue with some more interesting results.
I did mention in the past that 'COPY INTO ... FROM stdin ..' seems to be a lot slower than 'COPY INTO ... FROM <filename> ...'
Actually, I have just found something that may shift the problem away from the stdin itself.
Look at the following two methods of restoring a previously dumped (rather small) database.
RESTORE METHOD 1 (far too slow):
$ mclient < backup.sql [ 62 ] [ 70199 ] [ 288 ] [ 70322 ] [ 69 ] [ 423 ] [ 280596 ] [ 4302 ] [ 49038 ] [ 1 ] [ 10 ] [ 6943 ] [ 6943 ] [ 3424 ] [ 3424 ] [ 3424 ] [ 1 ] [ 6943 ] [ 3424 ] TIME elapsed: 68 seconds
RESTORE METHOD 2:
$ echo "\< backup.sql" | mclient [ 62 ] [ 70199 ] [ 288 ] [ 70322 ] [ 69 ] [ 423 ] [ 280596 ] [ 4302 ] [ 49038 ] [ 1 ] [ 10 ] [ 6943 ] [ 6943 ] [ 3424 ] [ 3424 ] [ 3424 ] [ 1 ] [ 6943 ] [ 3424 ] TIME elapsed: 1 seconds
Can anyone explain this difference?
When mclient reads from standard input, it reads a line at a time. When it reads from file (either using the \< construct or as a command line argument (i.e. without <)) it reads in large chunks.
One further consideration: Method 2 works fine, but cannot be used to load gzipped dumps. Maybe this can be a feature request: '\<' should accept gzipped input.
Patches are gratefully accepted. ;-)
Roberto
On Mon, 2009-05-18 at 13:03 +0200, Stefan Manegold wrote:
On Mon, May 18, 2009 at 11:57:57AM +0200, Stefan de Konink wrote:
On Mon, 18 May 2009, Stefan Manegold wrote:
In case you came to your conclusion from experimental/emperical study, I would be very interested to know about the experiemntes performed and the results measured. In that case, I would be verythankful if you could share the informative details with us, i.e. detailed descriptions of - experimental setup (HW, SW, datasets, workload/queries) - measured results (what was measured how, and what are the results?) - (your) discussion/analysis/interpretation of the results I thought the MADAM meeting was for this. Ok. fine.
... we should then revive the ideas of keeping note of MADAMs (e.g., on a Wiki or alike) as reference for those who cannot be present ...
Could you please elaborate on whether you consider you setup/schema "representaive" and/or whether/to which extend you think that your results for one specific case could be generalized to other cases? Technically the only thing I have to prove to get my code reverted is that there are cases where late contraints hurt performance. IMHO, our task should be the following:
- analyze whether there is indeed a significat performance difference between early and late constraint checking; (5% do not justify any further time to be spent on this) - and if so, analyze under which circumstances which altertative performs (significatly) better or worse than the other; - and (only) then try to find out where the (significat) performance difference comes from - and (only) then + either try to eliminate the difference; + or * make dump/restore use the most suitable alternative (as far as possible with reasonable effort); * and document our experiences to give adivce for users
Stefan
Stefan
-- Sjoerd Mullender