Goodmorning,
At this moment there is no functional compatibility between the PostgreSQL
and MonetDB with respect to the generate_series function. The range in
PostgreSQL is inclusive, in MonetDB exclusive. More importantly
generate_series can't be used with argument that is joined from a different
table, MonetDB just seems to support a subquery, hence joining that back
without a primary is impossible.
PostgreSQL;
select generate_series(a, b), a from (select 1 as a, 2 as b) as a;
…
[View More]generate_series | a
-----------------+---
1 | 1
2 | 1
(2 rows)
MonetDB;
select * from generate_series((select 1 as a, 2 as b));
+-------+
| value |
+=======+
| 1 |
+-------+
If I would like to functionally achieve the following;
I have list that states the speeds over partial segments, identified by
linear referencing. From 10 meter to 12 meter, the distance was 2 meter and
speed was 2s.
10 2 2
I would like to expand these values to segments of one meter, hence end up
with a table that would state:
10 2 2
11 2 2
12 2 2
Now I am aware of the Python bindings and I could achieve what I wanted by
the function below, but I would find generate_series still very practical.
DROP FUNCTION expand_per_meter;
CREATE FUNCTION expand_per_meter(trip_hash BIGINT, userstopcode STRING,
distancesincelastuserstop INTEGER, s INTEGER, t INTEGER)
RETURNS TABLE(trip_hash BIGINT, userstopcode STRING,
distancesincelastuserstop INTEGER, s INTEGER, t INTEGER)
LANGUAGE PYTHON {
result = dict()
result['trip_hash'] = []
result['userstopcode'] = []
result['distancesincelastuserstop'] = []
result['s'] = []
result['t'] = []
for i in range(0, len(trip_hash)):
expanded = range(distancesincelastuserstop[i],
distancesincelastuserstop[i] + s[i] + 1)
expand = len(expanded)
result['trip_hash'] += [trip_hash[i]] * expand
result['userstopcode'] += [userstopcode[i]] * expand
result['distancesincelastuserstop'] += expanded
result['s'] += [s[i]] * expand
result['t'] += [t[i]] * expand
return result
};
--
Stefan
[View Less]
Stefan,
This is a good question. For the specific example, you are quite correct.
However, the motivation for the question came from using the MonetDB-rmath extension and embedded R: often it is more efficient to calculate a number of values (e.g. the confidence intervals and p-values) together and then return all of the values as a table. This probably also reflects that I am a statistician and an old R user.
The recent article by Raasveldt suggested that analysis objects from Python could …
[View More]be stored natively rather than being serialised BLOBs. This could be further extended to storing R objects natively. Members of the objects could then be extracted as they are needed. Admittedly, this would lose many relational advantages. My suggestion is halfway, where R and Python can be used for analytics, with simple relational storage of the results.
Actually, is there any way to optimise a lateral call using BATs?
-- Mark
On 13 Jun 2018 17:43, Stefan Manegold <Stefan.Manegold(a)cwi.nl> wrote:
Hi Mark,
just out of curiosity,
(in the given case/example) what's wrong with a simple
select *, mean-1.96*sd as lci, mean+1.96*sd as uci from test;
or
create or replace function ci(se double)
returns double
begin
return 1.96*se;
end;
select *, mean-ci(sd) as lci, mean+ci(sd) as uci from test;
in case you really want/need to avoid the "redundant" occurrence or "1.96"
...
Thanks!
Best,
Stefan
----- On Jun 13, 2018, at 5:30 PM, Mark Clements mark.clements(a)ki.se wrote:
> Jennie,
>
> Thank you for replying to this inquiry - it's appreciated.
>
> Your solution gives the right result - but it requires changing the
> function call and will not generalise easily to other column types or
> other number of covariates. If we had another table such as:
>
> drop table test;
> create table test as select 1 as id, 2 as sex, cast(3 as double) as
> mean, cast(4.0 as double) as sd;
> -- and repeating the function
> create or replace function cis(mu double, se double) returns table(lci
> double, uci double)
> begin return select mu-1.96*se, mu+1.96*se;
> end;
>
> we can get all of the test table combined with the confidence interval by:
>
> -- row_number()
> select * from
> (select *, row_number() over() as rid from test) as t1
> natural join
> (select *, row_number() over() as rid from cis((select mean,sd from
> test))) as t2;
> -- or using lateral
> select * from test, lateral cis(test.mean,test.sd);
>
> The row_number solution is fast, and the lateral solution is simple. I
> was hoping for:
>
> select *, cis(mean,test).* from test;
>
> Again, thank you for your help.
>
> -- Mark
>
> On 06/08/2018 01:53 PM, Ying Zhang wrote:
>> Hai Mark,
>>
>> I’m just wonder if this gives you wat you want:
>>
>> create function cis(mu double, se double)
>> returns table(mu_out double, lci double, uci double)
>> begin
>> return select mu, mu-1.96*se, mu+1.96*se;
>> end;
>>
>> Regards,
>> Jennie
>
> _______________________________________________
> users-list mailing list
> users-list(a)monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
--
| Stefan.Manegold(a)CWI.nl | DB Architectures (DA) |
| www.CWI.nl/~manegold/<http://www.CWI.nl/~manegold/> | Science Park 123 (L321) |
| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
_______________________________________________
users-list mailing list
users-list(a)monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
[View Less]
For using the MonetDB-rmath extension, I was interested in using
table-returning functions to return multiple values. Is there a fast and
elegant way to combine the results with the source table in SQL? To make
this concrete, consider some test data and a function to calculate a 95%
confidence interval from a mean and standard error:
drop table test;
create table test as select cast(value as double) as value from
generate_series(0,5);
drop function cis(double,double);
create function cis(mu …
[View More]double, se double) returns table(lci double, uci
double)
begin return select mu-1.96*se, mu+1.96*se;
end;
We can then call the table-returning function:
select * from cis((select value, 1.0 from test));
If we want to re-join the results with the source table, we could use
row_number() or use lateral:
-- Using row_number:
select * from
(select *, row_number() over() as rid from test) as t1
natural join
(select *, row_number() over() as rid from cis((select value,1.0 from
test))) as t2;
-- Using lateral:
select * from test, lateral cis(test.value, cast(1.0 as double)) as t2;
I believe that the latter will lead to a slow loop, even when BAT
functions are available. The table-returning function can be used in the
select statement only for a scalar expression:
select cis(0,1).*; -- ok
select cis(0,1).* from test; -- Error: no such operator 'cis'
select cis(value,1).* from test; -- Error: identifier 'value' unknown
Again, is there a fast and elegant way to combine the source table with
a table-returning function in SQL?
Kindly, Mark.
[View Less]
Hello all:
I am trying to find the most efficient way to filter results for those matching today only.
My events table has 550 million rows and I want a view as follows-
CREATE VIEW events_today AS SELECT * FROM events WHERE str_to_date(timestamp_to_str(event_timestamp, '%Y-%m-%d'), '%Y-%m-%d') = CURRENT_DATE which should give me all events with today's date.
Running the SELECT by itself also never returns.
However, the query just hangs without ever returning.
I'm running 11.29.3 (Mar2018) …
[View More]on CentOS 7 with 128GB and 16 Cores.
Adam
[View Less]
Hi all:
I ran into an issue yesterday when I ran a simple group count query -
The query took 7 minutes, 13 seconds to run.
Query was as follows -
SELECT zone, count(*)
FROM chronicle.events
WHERE event_type = 'mb'
GROUP BY zone
This table, which a merge table currently has 546 million events.
The merge table is made up of currently two merge table each representing a year (events_2018, and events_2017).
Those merge tables are each made up of four tables, events_year_zone_(1-4)
I have a …
[View More]scheduled task which inserts events into the relevant tables every 5 minutes from 5 Microsoft SQL Servers.
Roughly 12,000 events broken across 5 inserts (one for each server).
The import process is handled with a C# Windows service that utilizes ODBC to talk to MonetDB.
The server is currently MonetDB v11.27.11 (July2017-SP3) running on VMware 6.5 in a CentOS 7 VM with 64GB of RAM, and 16 CPU cores.
When I ran into this issue yesterday, the mserver5 process was utilizing 90% of available RAM.
This was making my insertion task take longer and longer each time, and simple queries like the one above took minutes.
Logged into the server via SSH, the input lag as a result of the almost total resource usage caused me to reboot the server.
I restarted the VM, and the same query took 20 seconds.
I'd like to sort where this issue might be coming from, but I'm not sure where to look.
This server is currently part of a proof-of-concept project that will soon be going to production.
Any guidance is greatly appreciated.
Adam
[View Less]
Hi All:
I'm having some issues using ODBC with C#.
I am trying to insert the contents of a DataTable into Monet using a foreach loop with MonetDB ODBC client under Windows.
This is the error I receive...
System.Data.Odbc.OdbcException (0x80131937): ERROR [42000] [MonetDB][ODBC Driver 11.27.11]syntax error, unexpected ',' in: "execute 5 (timestamp '2018-06-04 21:22:25',8651,'EVENT','','SYSTEM','1810_INFRA_"
This is my loop which is wrapped in a transaction -
foreach (DataRow row in events.…
[View More]Rows)
{
cmd.CommandText = String.Format("INSERT INTO chronicle.events_{0}_zone_{1} VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)", year, zone);
cmd.Parameters.AddWithValue("@date_time", row["Date_Time"]);
cmd.Parameters.AddWithValue("@fracsec", row["FracSec"]);
cmd.Parameters.AddWithValue("@event_type", row["Event_Type"]);
Truncated for brevity.
I'm not sure where the error in my insert statement is because the error message itself appear to be truncated.
Very frustrating as it hangs on the first row.
Adam
[View Less]