commit failures with dbfarm on iSCSI LUN
Hi there, Do you have any experience with running a dbfarm over iSCSI? We have tried to use the NAS in our 1Gbit LAN for our largish daily experiments with MonetDB. It's a very handy setup and seems more suited than NFS. It seems to achieve reasonable performance, but we get quite regularly (though not predictably for now) commit failures during a rather long ETL. We do not get such commit failures when the same db and ETL are run on a local disk. Excerpt from merovingian.log (Jul2015-SP1): 2015-11-24 11:52:37 ERR trec01[19110]: !ERROR: bm_subcommit: commit failed 2015-11-24 11:52:37 ERR trec01[19110]: !ERROR: log_tend: write failed 2015-11-24 11:52:37 ERR trec01[19110]: !FATAL: 40000!COMMIT: transation commit failed (perhaps your disk is full?) exiting (kernel error: !ERROR: GDKsave: error on: name=07/717, ext=theap, mode=1 2015-11-24 11:52:37 ERR trec01[19110]: !OS: Input/output error 2015-11-24 11:52:37 ERR trec01[19110]: ) The disk is most definitely not full, 1.5 TB available (the same works on a local disk with less space available). It looks like iSCSI is the problem (which works perfectly except these random failures). Can you think of any reason why iSCSI could could fail where a real local block device would not? iscsi client (where MonetDB runs): libiscsi 1.11.0 iscsi storage (where the dbfarm is stored): iscsid 2.0-871 The iSCSI LUN is created as regular file with thin provisioning (a file that dynamically grows on the NAS). We haven't tried yet with a fixed-size block-level LUN (trying this today anyway) Hoping someone can have an idea already. Roberto
Hi Roberto, The obvious explanation is network "issues". The error message would likely be wrong, because the assumption is that monetdb runs on top of a local filesystem. If your write takes longer because of some network issue, it will timeout at some point. But timeout setting on a network connection are different then timeouts on writes to harddiscs. This might lead to different errors, that are not handled by monetdb in the proper way. I am sure we do not test this setup at the moment. And another thing that might be important is how well memory mapping works with iscsi. Arjen de Rijke ----- Original Message -----
From: "Roberto Cornacchia"
To: "Communication channel for MonetDB users" Sent: Tuesday, November 24, 2015 12:24:44 PM Subject: commit failures with dbfarm on iSCSI LUN
Hi there,
Do you have any experience with running a dbfarm over iSCSI?
We have tried to use the NAS in our 1Gbit LAN for our largish daily experiments with MonetDB. It's a very handy setup and seems more suited than NFS.
It seems to achieve reasonable performance, but we get quite regularly (though not predictably for now) commit failures during a rather long ETL. We do not get such commit failures when the same db and ETL are run on a local disk.
Excerpt from merovingian.log (Jul2015-SP1):
2015-11-24 11:52:37 ERR trec01[19110]: !ERROR: bm_subcommit: commit failed 2015-11-24 11:52:37 ERR trec01[19110]: !ERROR: log_tend: write failed 2015-11-24 11:52:37 ERR trec01[19110]: !FATAL: 40000!COMMIT: transation commit failed (perhaps your disk is full?) exiting (kernel error: !ERROR: GDKsave: error on: name=07/717, ext=theap, mode=1 2015-11-24 11:52:37 ERR trec01[19110]: !OS: Input/output error 2015-11-24 11:52:37 ERR trec01[19110]: )
The disk is most definitely not full, 1.5 TB available (the same works on a local disk with less space available). It looks like iSCSI is the problem (which works perfectly except these random failures).
Can you think of any reason why iSCSI could could fail where a real local block device would not?
iscsi client (where MonetDB runs): libiscsi 1.11.0 iscsi storage (where the dbfarm is stored): iscsid 2.0-871
The iSCSI LUN is created as regular file with thin provisioning (a file that dynamically grows on the NAS). We haven't tried yet with a fixed-size block-level LUN (trying this today anyway)
Hoping someone can have an idea already.
Roberto
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Thanks Arjen,
I perfectly understand that MonetDB cannot see / handle the problem. I was
mostly interested in whether you guys already had experiences to share
about it.
In the meantime, we have observed that the same ETL fails consistently on a
file-based LUN *with* thin provisioning, but never fails on a file-based
LUN *without* thin provisioning (all space is pre-allocated). We didn't try
with a block-based LUN, but I suspect it would work.
The exact reason why thin provisioning makes it fail is still unknown, but
perhaps this may help others hitting the same issues.
Roberto
On 24 November 2015 at 12:58, Arjen de Rijke
Hi Roberto,
The obvious explanation is network "issues". The error message would likely be wrong, because the assumption is that monetdb runs on top of a local filesystem. If your write takes longer because of some network issue, it will timeout at some point. But timeout setting on a network connection are different then timeouts on writes to harddiscs. This might lead to different errors, that are not handled by monetdb in the proper way. I am sure we do not test this setup at the moment.
And another thing that might be important is how well memory mapping works with iscsi.
Arjen de Rijke
----- Original Message -----
From: "Roberto Cornacchia"
To: "Communication channel for MonetDB users" Sent: Tuesday, November 24, 2015 12:24:44 PM Subject: commit failures with dbfarm on iSCSI LUN Hi there,
Do you have any experience with running a dbfarm over iSCSI?
We have tried to use the NAS in our 1Gbit LAN for our largish daily experiments with MonetDB. It's a very handy setup and seems more suited than NFS.
It seems to achieve reasonable performance, but we get quite regularly (though not predictably for now) commit failures during a rather long ETL. We do not get such commit failures when the same db and ETL are run on a local disk.
Excerpt from merovingian.log (Jul2015-SP1):
2015-11-24 11:52:37 ERR trec01[19110]: !ERROR: bm_subcommit: commit failed 2015-11-24 11:52:37 ERR trec01[19110]: !ERROR: log_tend: write failed 2015-11-24 11:52:37 ERR trec01[19110]: !FATAL: 40000!COMMIT: transation commit failed (perhaps your disk is full?) exiting (kernel error: !ERROR: GDKsave: error on: name=07/717, ext=theap, mode=1 2015-11-24 11:52:37 ERR trec01[19110]: !OS: Input/output error 2015-11-24 11:52:37 ERR trec01[19110]: )
The disk is most definitely not full, 1.5 TB available (the same works on a local disk with less space available). It looks like iSCSI is the problem (which works perfectly except these random failures).
Can you think of any reason why iSCSI could could fail where a real local block device would not?
iscsi client (where MonetDB runs): libiscsi 1.11.0 iscsi storage (where the dbfarm is stored): iscsid 2.0-871
The iSCSI LUN is created as regular file with thin provisioning (a file that dynamically grows on the NAS). We haven't tried yet with a fixed-size block-level LUN (trying this today anyway)
Hoping someone can have an idea already.
Roberto
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (2)
-
Arjen de Rijke
-
Roberto Cornacchia