Performance issue related to dropping (empty) tables
I am seeing a performance issue related to schema upgrades. This is from a C++ program that uses the MAPI library. We try and keep 30 days worth of data, imported at a fairly high data rate with lots of columns. We use multiple threads to ingest data (bulk insert), and the number of threads is stored in a configuration table. For 20 days worth of data, this works out to around 9000 tables (plus merge tables). We have roughly 6 sets of tables per hour, each of which has 4 tables and a merge table connecting them. Because this is a schema change, we wrap the entire upgrade in a transaction. For the purpose of the test, we had no data in the tables, but only had empty tables. If we change our setting from 4 parallel tables to 2, we end up dropping 9000 tables. About half of the "drop table" commands take 0 seconds. The rest take anywhere between 4 and 12 seconds each. Here is a plot of the time per delete, y access is seconds, x is the statement. The X access is NOT time based: [cid:image001.png@01D5568E.A5389030] Any idea on how we can reduce the time? The total time to delete 9000 tables was 480 minutes. Thanks, Dave
Hai, The long exec. time is probably caused by fdatasync. How many columns do the tables contain? What type of disk are you using? Regards, Jennie
On 19 Aug 2019, at 13:04, Gotwisner, Dave
wrote: I am seeing a performance issue related to schema upgrades. This is from a C++ program that uses the MAPI library.
We try and keep 30 days worth of data, imported at a fairly high data rate with lots of columns. We use multiple threads to ingest data (bulk insert), and the number of threads is stored in a configuration table.
For 20 days worth of data, this works out to around 9000 tables (plus merge tables). We have roughly 6 sets of tables per hour, each of which has 4 tables and a merge table connecting them.
Because this is a schema change, we wrap the entire upgrade in a transaction.
For the purpose of the test, we had no data in the tables, but only had empty tables.
If we change our setting from 4 parallel tables to 2, we end up dropping 9000 tables. About half of the “drop table” commands take 0 seconds. The rest take anywhere between 4 and 12 seconds each.
Here is a plot of the time per delete, y access is seconds, x is the statement. The X access is NOT time based:
Any idea on how we can reduce the time? The total time to delete 9000 tables was 480 minutes.
Thanks,
Dave _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Across all the tables, we have about 200 columns. This development machine is running on a machine with a raid of SSDs.
________________________________
From: users-list
On 19 Aug 2019, at 13:04, Gotwisner, Dave
wrote: I am seeing a performance issue related to schema upgrades. This is from a C++ program that uses the MAPI library.
We try and keep 30 days worth of data, imported at a fairly high data rate with lots of columns. We use multiple threads to ingest data (bulk insert), and the number of threads is stored in a configuration table.
For 20 days worth of data, this works out to around 9000 tables (plus merge tables). We have roughly 6 sets of tables per hour, each of which has 4 tables and a merge table connecting them.
Because this is a schema change, we wrap the entire upgrade in a transaction.
For the purpose of the test, we had no data in the tables, but only had empty tables.
If we change our setting from 4 parallel tables to 2, we end up dropping 9000 tables. About half of the “drop table” commands take 0 seconds. The rest take anywhere between 4 and 12 seconds each.
Here is a plot of the time per delete, y access is seconds, x is the statement. The X access is NOT time based:
Any idea on how we can reduce the time? The total time to delete 9000 tables was 480 minutes.
Thanks,
Dave _______________________________________________ users-list mailing list users-list@monetdb.org https://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=DwIGaQ&c=Hlvprqonr5LuCN9TN65xNw&r=u41iOHYIJhkr4P2f2CDLh8D6LDUMXy2xJPhXyGKDzns&m=e7G5gan3lzaPk0qdWfiNJdCuX8Gz3-MHk_oV8a598vk&s=70MX_Qu6Ek5cVX9MGdU5WmJqsFGBxxlGtvESrARHyho&e=
_______________________________________________ users-list mailing list users-list@monetdb.org https://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=DwIGaQ&c=Hlvprqonr5LuCN9TN65xNw&r=u41iOHYIJhkr4P2f2CDLh8D6LDUMXy2xJPhXyGKDzns&m=e7G5gan3lzaPk0qdWfiNJdCuX8Gz3-MHk_oV8a598vk&s=70MX_Qu6Ek5cVX9MGdU5WmJqsFGBxxlGtvESrARHyho&e=
That’s 1.8m files, even with SSDs, it’ll take some time, when the WAL log is processed once in a while. Maybe you can drop the tables in smaller batches and drop them more often (i.e. don’t wait until you’ve gathered tables of 20 days to drop at once)? Jennie
On 23 Aug 2019, at 16:28, Gotwisner, Dave
wrote: Across all the tables, we have about 200 columns. This development machine is running on a machine with a raid of SSDs. From: users-list
on behalf of Ying Zhang Sent: Friday, August 23, 2019 4:24 PM To: Communication channel for MonetDB users Subject: Re: Performance issue related to dropping (empty) tables This message originated outside of NETSCOUT. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Hai,
The long exec. time is probably caused by fdatasync.
How many columns do the tables contain?
What type of disk are you using?
Regards, Jennie
On 19 Aug 2019, at 13:04, Gotwisner, Dave
wrote: I am seeing a performance issue related to schema upgrades. This is from a C++ program that uses the MAPI library.
We try and keep 30 days worth of data, imported at a fairly high data rate with lots of columns. We use multiple threads to ingest data (bulk insert), and the number of threads is stored in a configuration table.
For 20 days worth of data, this works out to around 9000 tables (plus merge tables). We have roughly 6 sets of tables per hour, each of which has 4 tables and a merge table connecting them.
Because this is a schema change, we wrap the entire upgrade in a transaction.
For the purpose of the test, we had no data in the tables, but only had empty tables.
If we change our setting from 4 parallel tables to 2, we end up dropping 9000 tables. About half of the “drop table” commands take 0 seconds. The rest take anywhere between 4 and 12 seconds each.
Here is a plot of the time per delete, y access is seconds, x is the statement. The X access is NOT time based:
Any idea on how we can reduce the time? The total time to delete 9000 tables was 480 minutes.
Thanks,
Dave _______________________________________________ users-list mailing list users-list@monetdb.org https://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=DwIGaQ&c=Hlvprqonr5LuCN9TN65xNw&r=u41iOHYIJhkr4P2f2CDLh8D6LDUMXy2xJPhXyGKDzns&m=e7G5gan3lzaPk0qdWfiNJdCuX8Gz3-MHk_oV8a598vk&s=70MX_Qu6Ek5cVX9MGdU5WmJqsFGBxxlGtvESrARHyho&e=
_______________________________________________ users-list mailing list users-list@monetdb.org https://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=DwIGaQ&c=Hlvprqonr5LuCN9TN65xNw&r=u41iOHYIJhkr4P2f2CDLh8D6LDUMXy2xJPhXyGKDzns&m=e7G5gan3lzaPk0qdWfiNJdCuX8Gz3-MHk_oV8a598vk&s=70MX_Qu6Ek5cVX9MGdU5WmJqsFGBxxlGtvESrARHyho&e= _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Jennie,
This is a special use case when someone decides to change the number of parallel threads we use to ingest. We will document to "don't do this". In theory, we can reduce by deleting from the merge tables, and then deleting the physical tables over time in the background. Since our use case doesn't have an "off peak" time period, it's better to have the "maintenance window" take longer than to risk dropping data when doing this work on a live system.
Thanks for the answer though, it clarifies some other behavior I've seen.
Dave
-----Original Message-----
From: users-list
On 23 Aug 2019, at 16:28, Gotwisner, Dave
wrote: Across all the tables, we have about 200 columns. This development machine is running on a machine with a raid of SSDs. From: users-list
on behalf of Ying Zhang Sent: Friday, August 23, 2019 4:24 PM To: Communication channel for MonetDB users Subject: Re: Performance issue related to dropping (empty) tables This message originated outside of NETSCOUT. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Hai,
The long exec. time is probably caused by fdatasync.
How many columns do the tables contain?
What type of disk are you using?
Regards, Jennie
On 19 Aug 2019, at 13:04, Gotwisner, Dave
wrote: I am seeing a performance issue related to schema upgrades. This is from a C++ program that uses the MAPI library.
We try and keep 30 days worth of data, imported at a fairly high data rate with lots of columns. We use multiple threads to ingest data (bulk insert), and the number of threads is stored in a configuration table.
For 20 days worth of data, this works out to around 9000 tables (plus merge tables). We have roughly 6 sets of tables per hour, each of which has 4 tables and a merge table connecting them.
Because this is a schema change, we wrap the entire upgrade in a transaction.
For the purpose of the test, we had no data in the tables, but only had empty tables.
If we change our setting from 4 parallel tables to 2, we end up dropping 9000 tables. About half of the “drop table” commands take 0 seconds. The rest take anywhere between 4 and 12 seconds each.
Here is a plot of the time per delete, y access is seconds, x is the statement. The X access is NOT time based:
Any idea on how we can reduce the time? The total time to delete 9000 tables was 480 minutes.
Thanks,
Dave _______________________________________________ users-list mailing list users-list@monetdb.org https://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=DwIGaQ&c=Hlvprqonr5LuCN9TN65xNw&r=u41iOHYIJhkr4P2f2CDLh8D6LDUMXy2xJPhXyGKDzns&m=e7G5gan3lzaPk0qdWfiNJdCuX8Gz3-MHk_oV8a598vk&s=70MX_Qu6Ek5cVX9MGdU5WmJqsFGBxxlGtvESrARHyho&e=
_______________________________________________ users-list mailing list users-list@monetdb.org https://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=DwIGaQ&c=Hlvprqonr5LuCN9TN65xNw&r=u41iOHYIJhkr4P2f2CDLh8D6LDUMXy2xJPhXyGKDzns&m=e7G5gan3lzaPk0qdWfiNJdCuX8Gz3-MHk_oV8a598vk&s=70MX_Qu6Ek5cVX9MGdU5WmJqsFGBxxlGtvESrARHyho&e= _______________________________________________ users-list mailing list users-list@monetdb.org https://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=DwIGaQ&c=Hlvprqonr5LuCN9TN65xNw&r=u41iOHYIJhkr4P2f2CDLh8D6LDUMXy2xJPhXyGKDzns&m=dtAKxtJx2doNBsk_6-pSEQ9D44SnDVC7rvhGqwDqFLA&s=dQmhsqcIns6fZz7jY4Nh57CSHH-sU0PJYjc3bKgLOsE&e=
_______________________________________________ users-list mailing list users-list@monetdb.org https://urldefense.proofpoint.com/v2/url?u=https-3A__www.monetdb.org_mailman_listinfo_users-2Dlist&d=DwIGaQ&c=Hlvprqonr5LuCN9TN65xNw&r=u41iOHYIJhkr4P2f2CDLh8D6LDUMXy2xJPhXyGKDzns&m=dtAKxtJx2doNBsk_6-pSEQ9D44SnDVC7rvhGqwDqFLA&s=dQmhsqcIns6fZz7jY4Nh57CSHH-sU0PJYjc3bKgLOsE&e=
participants (2)
-
Gotwisner, Dave
-
Ying Zhang