a few performance improvement questions

27 May 2015

      hi, i've been running some pretty big queries on a few tables (total of 900
million records) on a 64GB windows machine and i'm curious if anyone might
provide advice about whether any of these strategies are computationally
expensive?  i am maxing out my available RAM in my queries a lot, and at
the point it hits that ceiling, the computation starts dragging.  here's my
pseudo-sql that displays the kinds of commands i've been running.  thanks!

============

(a) i am conducting a left join where not all records match. the
non-matching records need to be zeroes

in general, is this the least computationally demanding way to get an
average amount per person, including the zeroes?

SELECT AVG( sum_amount ) AS avg_amt_per_person

FROM

( SELECT
  person_id , SUM( CASE WHEN amount IS NULL THEN 0 ELSE amount END ) AS
sum_amount
FROM
   ( ( SELECT person_id FROM person_table ) AS a LEFT JOIN ( SELECT
person_id , amount FROM amount_table ) AS b ON a.person_id = b.person_id )
AS temp_table
GROUP BY person_id ) ;

============

(b) i am on windows (so no strptime) and i have dates in either a mm/dd/yy
format but some january thru september are m/dd/yy (i.e. the month can
either be one or two digits).  i am just calculating the month as part of
my query on-the-fly.  is it foolish of me not to add a new, permanent
column to the data set?  i don't have a good sense of whether this segment
slows down my query by a lot.  it is one of my join/merge variables.

    CAST( CASE WHEN SUBSTRING( some_date , 2 , 1 ) = '/' THEN
SUBSTRING( some_date , 1 , 1 ) ELSE SUBSTRING( some_date , 1 , 2 ) END AS
DOUBLE ) AS some_month

============

(c) i have a three tables that have the same structure that get LEFT JOINed
onto my main table.  so right now i am doing something like--

main_table LEFT JOIN second_table LEFT JOIN third_table LEFT JOIN
fourth_table

is it smarter fo me to do this instead? --

CREATE TABLE stacked_table FROM second_table UNION ALL third_table UNION
ALL fourth_table

main_table LEFT JOIN stacked_table

Anthony Damico

Anthony Damico

martin van dinther

Anthony Damico

tags

participants (2)