StreamQL: A Query Language for Efficient Data Stream Processing

Recent technological advances, such as the Internet of Things (IoT), are causing an enormous proliferation of streaming data, i.e., data that is generated in real-time and at high rates. StreamQL is a query language which simplifies the task of specifying complex computations over streaming data.

The basic object in StreamQL is the stream transformation that describes how an input stream is transformed into an output stream. StreamQL provides a novel integration of several useful programming abstractions for stream processing: (1) relational constructs (such as filtering, mapping, aggregating, key-based partitioning, and windowing), (2) dataflow constructs (such as streaming/serial and parallel composition), and (3) temporal constructs that are inspired from Temporal Logic and regular expressions. StreamQL allows the programmer to specify a streaming analysis in a modular fashion, as its language constructs compose freely.

We provide a formal denotational semantics for StreamQL using a class of monotone functions over streams. We have implemented StreamQL as a lightweight Java library, which we use to experimentally evaluate our approach. The experiments show that the throughput of our implementation is competitive compared to state-of-the-art streaming engines including RxJava, Reactor, Siddhi, Rx.NET, and Trill.

The following diagram shows the performance of StreamQL, RxJava, Rx.NET, Reactor, Trill and Siddhi on the micro benchmark. The vertical axis shows the throughput in number of tuples per second.

The following diagram shows the performance of StreamQL, RxJava, Reactor and Siddhi concerning realistic workloads -- detecting stock patterns (S1a-S4c), monitoring an online auction system (N1-N8), and analyzing a high-frequency trading market (T1-10). The vertical axis shows the throughput in number of tuples per second.

The experiments were run in Ubuntu 16.04 LTS on a desktop computer equipped with an Intel Xeon(R) E3-1241 v3 CPU (4 cores) with 16 GB of memory (DDR3 at 1600 MHz).

Publications

Lingkun Kong, Konstantinos Mamouras. "StreamQL: A Query Language for Processing Streaming Time Series". OOPSLA, 2020 [link].

Lingkun Kong, Konstantinos Mamouras. "StreamQL: A Query Language for Efficient Data Stream Processing". OGHPC, 2020 (Poster).

The StreamQL Engine

The execution engine of StreamQL is given as a light-weight Java library. This library requires a Java JDK with version >= 1.8.

Programming with StreamQL

We introduce a simple example of the StreamQL program in Java. Given a stream of integers, the query sum computes the sum of the integers. The method eval , which stands for "be evaluated", is used to obtain an object that encapsulates the algorithm for the query.

import streamql.QL;
import streamql.algo.*;
import streamql.query.*;

import java.util.*;

public class HelloWorld {
  public static void main(String[] args) {
    ArrayList<Integer> source = new ArrayList<>(Arrays.asList(1,2,3,4,5));
    // get the input stream from a data source
    Iterator<Integer> stream = source.iterator(); 

    // sink that prints data items
    Sink<Integer> sink = new Sink<Integer>() {
      @Override
      public void next(Integer item) {
        System.out.println("Received: " + item);
      }
      @Override
      public void end() {
        System.out.println("Job Done");
      }
    };

    // running sum of the integers
    Q<Integer, Integer> sum = QL.aggr(0, (s, x) -> s + x);

    // evaluation of the logical query plan
    Algo<Integer, Integer> exe = sum.eval();

    // connect the output of query to sink
    exe.connect(sink);

    // execution loop
    exe.init();
    while (stream.hasNext()) {
      Integer item = stream.next();
      exe.next(item);
    }
    exe.end();
    
    // Console Output:
    // --------------
    // Received: 1
    // Received: 3
    // Received: 6
    // Received: 10
    // Received: 15
    // Job Done
  }
}

On this evaluated object, the methods init and next are used to initialize the memory and consume data items, and after the input stream terminates, the method end is invoked to respond to the termination of the input stream.

The API documentation is given here, which briefly introduces core interfaces of StreamQL and the usage of each StreamQL operator.

We also provide a tutorial that introduces how to program streaming computations using StreamQL, which can be found here.