[MonetDB-users] Redundant data
Hello All, I am very new to MonetDB and I am very pleased with its performance at our university. Basically, I have data which is stored on our Unix filesystem. /research_data/life/YYYY/MM/DD/species (there are over 100 species files) I then created a very large csv file which is traverses thru this data. The file size is about 2GB. The table I created looks like this t1 ( t as timestamp species as varchar(10) weight as float color as varchar(10) ) The file species.csv looks like this weigh, color 3,red 7,green 4,blue I have a script that basically traverses thru the filesystem and creates a big csv file for example: 2008-04-01, cat,3,red 2008-04-01, cat,7,green 2008-04-01, cat,4,blue This works fine, but if I run the copy() operation again it will put redundant data. Is there anyway to avoid this?
Mag Gam wrote:
Hello All, Hello Mag,
What version of MonetDB are you using? on what kind of machine/os?
I am very new to MonetDB and I am very pleased with its performance at our university.
Basically, I have data which is stored on our Unix filesystem.
/research_data/life/YYYY/MM/DD/species (there are over 100 species files)
I then created a very large csv file which is traverses thru this data. The file size is about 2GB.
The table I created looks like this
t1 ( t as timestamp species as varchar(10) weight as float color as varchar(10) )
The file species.csv looks like this weigh, color 3,red 7,green 4,blue
I have a script that basically traverses thru the filesystem and creates a big csv file for example: 2008-04-01, cat,3,red 2008-04-01, cat,7,green 2008-04-01, cat,4,blue
This works fine, but if I run the copy() operation again it will put redundant data. Is there anyway to avoid this?
use primary keys in your table.... this will trap duplicate insertion.
------------------------------------------------------------------------------ The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your production scanning environment may not be a perfect world - but thanks to Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 Series Scanner you'll get full speed at 300 dpi even with all image processing features enabled. http://p.sf.net/sfu/kodak-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Thanks for the quick response.
I am using Version5 (Feb 2009 release) and on Linux 64Bit.
If I use a primary key, would the insert take a longer time? I am
talking about close to a billion records.
On Sat, May 9, 2009 at 9:29 AM, Martin Kersten
Mag Gam wrote:
Hello All, Hello Mag,
What version of MonetDB are you using? on what kind of machine/os?
I am very new to MonetDB and I am very pleased with its performance at our university.
Basically, I have data which is stored on our Unix filesystem.
/research_data/life/YYYY/MM/DD/species (there are over 100 species files)
I then created a very large csv file which is traverses thru this data. The file size is about 2GB.
The table I created looks like this
t1 ( t as timestamp species as varchar(10) weight as float color as varchar(10) )
The file species.csv looks like this weigh, color 3,red 7,green 4,blue
I have a script that basically traverses thru the filesystem and creates a big csv file for example: 2008-04-01, cat,3,red 2008-04-01, cat,7,green 2008-04-01, cat,4,blue
This works fine, but if I run the copy() operation again it will put redundant data. Is there anyway to avoid this?
use primary keys in your table.... this will trap duplicate insertion.
------------------------------------------------------------------------------ The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your production scanning environment may not be a perfect world - but thanks to Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 Series Scanner you'll get full speed at 300 dpi even with all image processing features enabled. http://p.sf.net/sfu/kodak-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your production scanning environment may not be a perfect world - but thanks to Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 Series Scanner you'll get full speed at 300 dpi even with all image processing features enabled. http://p.sf.net/sfu/kodak-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
participants (2)
-
Mag Gam
-
Martin Kersten