User Defined Aggregate Functions

User-Defined Aggregate Functions (aka UDAF) is a feature of Lenses SQL that enables defining new functions and extending the SQL vocabulary for data aggregations. A typical Aggregation function (i.e. COUNT) would look like:

SELECT name, COUNT(name) FROM topicA GROUP BY name

If you need a particular function that does not come out of the box, Lenses provides a simple API that can be used to implement additional such functions. Creating a custom function and applying a pre-trained machine learning model to a stream is one of the scenarios.

Implementing your first UDAF

Writing a user-defined function for Lenses requires a person to provide an implementation of the following Java interface:

public interface UserDefinedAggregateFunction {
    void aggregate(Object value);
    Object result();
}

It is required for the implementing class to be placed under the io.lenses.sql.udf package. Furthermore, the actual class name is going to represent the function name which will be called from Lenses SQL’s DSL.

package io.lenses.sql.udf

class DOUBLE_COUNT extends UserDefinedAggregateFunction {

    private long count = 0;

    @Override
    public void aggregate(value: Object) {
        count = count + 2;
    }

    @Override
    public Object result() {
        return count;
    }
}

Once the compiled jar file is available, it needs to be dropped under the udf folder found in Lenses installed location. This library will be automatically loaded and the UDF will be available to the SQL engine. Therefore a user can run a query similar to this one:

SELECT name, DOUBLE_COUNT(name) FROM topicA GROUP BY name