question: efficient use of partitioned tables
Hi, This might be less of MonetDB-specific question about features than about RDBMS (or just SQL) in general, but I would still appreciate your advice in these matters. I have data with a few "cross-sectional" tables (e.g. the parents of an individual) and many longitudinal/panel tables in annual chunks (e.g. tax records for each year separately). I wonder if it is a good idea to merge/union the annual tables into long panels. The benefits seems to be a much cleaner structure for the consistent columns, correspondingly easier code to write, fewer loops, tables names, lines of code. Also, fast joins on individual IDs and years across the tables without intermediary steps. However, the downsides are that not all columns are consistent across years, and there are simply some that appear only in certain years. Storing missing values for other years for these variables is a waste of disk space and memory. The merged tables could be too big anyway — if it matters in MonetDB (think of tables ~70 GB in some cases, while my server has 128 GB RAM), while many use cases would focus on a subset of years anyway. Do you have any thoughts about this? Maybe more concrete questions would be: — How cumbersome would it be (for a novice!) to code joining data from the annual tables separately, for multiple series of tables? (E.g. union all tax returns for 1997-2008, then join all residential data for these 12 years year-by-year, then employment records year-by-year etc.) — How much slower would this *run* than simply selecting years from the longitudinal tables? — Shall I be worried about the longitudinal tables having the size on the same order of magnitude as my physical memory, or MonetDB handles it equally efficiently (it would need to access many of the small annual tables anyway, that also takes memory). Thanks, Laszlo
participants (1)
-
László Sándor