User defined functions


When the pre-defined set of functions is not enough, lenses allows users to define their own functions.

To code such a function, first it is required to pull this JVM dependency, and implement the UserDefinedFunction interface:

io.lenses:lenses-sql-udf:4.0.0

The resulting artifact should be added to Lenses deployment location, and it will be loaded automatically by Lenses.

Implementing a UDF 

In order to implement a UDF, the first step is to define how many arguments a function takes.

Lenses currently supports defining functions that take 1,2,3 or a variable number of arguments.

For each variant, lenses provides a corresponding interface:

  • Functions taking 1 argument: UserDefinedFunction1
  • Functions taking 2 argument: UserDefinedFunction2
  • Functions taking 3 argument: UserDefinedFunction3
  • Functions taking a variable number of arguments: UserDefinedFunctionVarArg

Package 

Please make sure that the UDF implementation class belongs to one of the packages specified by the lenses.sql.udf.packages configuration option.

Interface 

For this section, only the variable argument number function will be analysed. The remaining variants work in a very similar way with the only difference being the fact that the typer function will have a set number of arguments instead or receiving a variable number of arguments.

public interface UserDefinedFunctionVarArg extends UserDefinedFunction {


    String name();
    /**
     * Defines a mapping from input to output types.
     *
     * @param argTypes List of argument data types.
     * @return The resulting data type.
     * @throws UdfException if one or more of the given data types are not supported by this UDF.
     */
    DataType typer(List<DataType> argTypes) throws UdfException;

    /**
     * Evaluates this UDF for the given argument.
     *
     * @param args List of arguments.
     * @return The result of the evaluation.
     * @throws UdfException if the evaluation failed for some reason.
     */
    Value evaluate(List<Value> args) throws UdfException;
}

String name() 

When a query specifies a function name:

SELECT STREAM foo(bar1,bar2,bar3) FROM source;

Lenses will first check the list of pre-defined functions for a matching name. If no match is found, Lenses will then proceed to check if a user defined function (UDF/UDAF) exists. It does so by checking if a function exists for which its name (foo in the example above) matches the one returned by the method name()

DataType typer(List argTypes) throws UdfException 

In order to reduce the need for multiple functions to be defined (one for each accepted argument type), lenses allows function definitions to specify their type based on the types of its arguments.

The typer for a function that transforms its arguments from Strings into Integers could be defined as such:

public interface UserDefinedFunctionVarArg extends UserDefinedFunction {
    DataType typer(List<DataType> argTypes) throws UdfException {
        for (DataType dt : argTypes) {
            if (!dt.isString()) {
                throw new UdfException("Invalid argument. Function foo only accepts String types.");
            }
        }
        return new LTInt();
    }
}

Value evaluate(List args) throws UdfException; 

Finally, in order to specify the function’s behavior the evaluate method can be used.

The following example will convert all arguments to an integer and then return their sum.

public interface UserDefinedFunctionVarArg extends UserDefinedFunction {
    Value evaluate(List<Value> args) throws UdfException {
        int result = 0;
        for (Value value : args) {
            int v = Integer.parseInt(value.get());
            result += v;
        }
        return result;
    }
}

Special Considerations 

State 

User defined functions are meant to stateless. Because of this, no guarantees are made regarding using instance variables and their usage is highly discouraged.

Nullability 

Nullability in Lenses is a type level concern. As such, if a function can return null values, the typing information must reflect that.

When a function can return a NullValue, one should define the typer as an LTOptional<T>.

Testing 

Testing is an important part of any development. In order to test your udf we recommend following the example tests published in the Lenses UDF Example Repository

--
Last modified: September 26, 2024