The DataCell Architecture
The DataCell approach is easily understood using previously mentioned example
. The sensor
program is a simulator of a real-world sensor which emits events at a regular interval,
e.g. a temperature, humidity, noise, etc. The actuator
is a device simulator that is controlled using events
received, e.g. a fire alarm. The sensors and actuators work independently.
They are typically proprietary devices that communicate with a controlling station using a wired network.
The only requirement in the DataCell is that devices can communicate using the UDP protocol to deliver events by default
with the most efficient event message format CSV. Alternative message format handlers can readily be included by
extending the formats recognized by the adaptors or as a simple filter between the device and the DataCell.
Baskets The basket is the key data structure of the streaming engine. Its role is to hold a portion of an event stream, also denoted as an event window. It is represented as a temporary main-memory table. Unlike other stream systems there is no a priori order or fixed window size. The basket is simply a (multi-) set of event records received from an adapter or events ready to be shipped to an actuator. There is no persistency and no transaction management over the baskets. If a basket should survive session brackets, its content should be inserted into a normal table. The baskets can be queried with SQL like any other table, but concurrent actions may leave you with a mostly empty table to look at.
Adapters
The receptor
and emitter
adapters are the interface units in the DataCell to interact with sensors and actuators.
Both communicate with their environment through a channel. The default channel is a UDP connection for speed.
By default the receptor is a passive thread, opening a channel and awaiting events to arrive.
Contrary, the emitter
is an active thread, which immediately throws the events on the channel identified.
Hooks have been created to change the roles, e.g. the receptor
polling a device and emitter to wait for polling actuators.
Events that can not be parsed are added to the corresponding basket as an error. All errors collected can be
inspected using the table producing function datacell.errors()
.
Continuous queries The continuous queries are expressed as ordinary SQL queries, where previously declared basket tables are recognized by the DataCell optimizer. For convenience they can be packed in a procedure, where the events from a basket can be delivered to multiple baskets. Access to these tables is replaced and interaction with the adapters is regulated with a locking scheme. Mixing basket tables and persistent tables is allowed. An SQL procedure can be used to encapsulate multiple SQL statements and deliver the derived events to multiple destinations.
Continuous queries often rely on control over the minimum/maximum number of events to consider when the query is
executed. This information is expressed as an ordinary predicate in the where clause.
The following pre-defined predicates are supported. They inform the DataCell scheduler when the next action should
be taken. They don't affect the current query, which allows for a dynamic behavior.
The window slide size can be calculated with a query. It also means that a startup query is needed to inform the
scheduler the first time or set the properties explicitly using datacell.basket()
and datacell.beat()
calls.
Setting | Description |
---|---|
datacell.threshold(B,N) | query is only executed when the basket B has at least size N |
datacell.window(B,M,S) | extract a window of at most size M and slide with size S afterwards |
datacell.window(B,T,Ts) | extract a window based on a temporal interval of size T followed by a stride Ts |
datacell.beat(B,T) | next query is executed after a T milliseconds delay (excluding query execution time) |
The sliding windows constraints are mutually exclusive. Either one slide based on the number of events is consumed or the time window. For time slicing, the first timestamp column in the basket is used as frame of reference. This leaves all other temporal columns as ordinary attributes.
This functionality is temporarily suspended