Scaling Stream Processing

There are three execution modes to run Lenses continuous SQL.

A continuous SQL query will be materialized at runtime into a Kafka Streams application.

../../_images/execution_modes.png

IN_PROC is the default execution mode and the processors are executed locally within the Lenses application. This can have scalability issues and poses a risk to the running application and can affect stability. IN_PROC is recommended only for testing.

CONNECT is the execution mode that solves these limitations and provides availability guarantees and scalability. Lenses can deploy the Lenses SQL processors in Kafka Connect. Kafka Connect provides a distributed, fault-tolerant and scalable framework as part of the core Apache Kafka distribution. We advise allocating a Kafka Connect cluster for the purpose of running the stream processing with SQL.

KUBERNETES is an execution mode that provides scalability by deploying Lenses SQL runners into Kubernetes clusters. Lenses can deploy and monitor SQL runner deployments created through Lenses or existing tools such as Helm or kubectl.

Connect

Kafka Connect provides scalable, fault tolerant, distributed processing by forming a cluster of workers. The cluster provides endpoints to which Lenses will submit the processor configuration. From this point, Kafka Connect will persist configurations and distribute work to the cluster workers. When restart happens, Lenses will recover the status, configuration and metrics of any Lenses SQL Connectors that are found in the configured clusters – this ensures that if Lenses is offline, the processing of your data in your topologies continues. Lenses will also identify any connectors created outside of Lenses at runtime and start tracking them to provide you visibility.

To scale in or out the number of processor applications, we can simply instruct Kafka Connect to decrease or increase the number of tasks across the cluster. The Lenses UI provides a simple way to deploy Lenses SQL processors and scale them, simply:

  1. Creating a new processor and selecting the cluster to deploy to
  2. Composing your SQL statement
  3. Setting the parallelization, that is how many tasks/application instances to run
  4. Giving the processor a name
  5. Deploying

Lenses will check the validity of the SQL statement and, if valid, create the Connector instance and start to monitor its behavior.

Lenses supports the following Connector functionality:

  • CREATE - Register and create a new connector
  • PAUSE - Pause the connector and tasks
  • START - Start a paused connector
  • DELETE - Remove a connector

Note

Updating an existing connector is not directly supported. The KStream app cannot be updated and the update is more than likely going to break the Schema compatibility of the target insert table but it can be scaled.

When Landoop’s FAST DATA CSD is used, the Cloudera parcel lenses-SQL-connect can install and provision the connector in a few seconds. More information and step-by-step instructions on how to install the parcel can be found at FAST DATA docs

Kubernetes

Kubernetes, a container orchestration engine, provides the perfect platform for running streaming microservices. It has the ability to ensure that a configured number of application instances or pods are running and to scale them up or down accordingly. We provide a Docker image –via Helm charts– for the SQL runner to deploy via CI/CD in a repeatable and audited manner.

Lenses can deploy SQL runners and recover the runners currently deployed as well as track and identify deployments created outside of Lenses.

The Lenses SQL Runner image accepts the same configuration options and environment variables as the Kafka Connect runner with an additional sql.port to expose a rest endpoint.

Lenses deploys SQL runners as Kubernetes deployment resources. These deployments are labeled so Lenses can identify and track any changes via either Lenses or outside such as Helm or kubectl from a CI/CD pipeline.

The following labels are attached to deployments.

Label Value Description UserDefined Resource
lenses.io/app.type lenses-processor Identifier of Lenses No Deployment
lenses.io/app
This is the name of
the processor, it must be unique.
If created via the Lenses API it will handle
this. It must also conform to
Kubernetes naming conventions,
[a-z0-9]([-a-z0-9]*[a-z0-9])?.
Yes Deployment
lenses.io/lenses-user
The username who created the deployment Yes Deployment

Warning

Altering pod and deployment labels of existing SQL runner deployments might cause issues related to the tracking of the deployments.

The following rest endpoints are exposed by the containers. You can, for example, create a Kubernetes services with pod selectors to access them and access the state store for interactive queries.

# get health
curl pod_id:8083/health

# get metrics
curl pod_id:8083/metrics

# get the metadata, the internal state stored information for interactive queries
curl pod_id:8083/metadata

# get the metadata for the specified name, the internal state store for interactive queries
curl pod_id:8083/metadata/(name)

# get the stream information, sql, runners, consumer group id, username
curl pod_id:8083/stream

# stop the stream
curl -X POST pod_id:8083/stop

# start the stream
curl -X POST pod_id:8083/start

Annotations

Lenses SQL Processors expose metrics internally via Rest but also via the Prometheus JMX exporter. The following annotations are added to allow Prometheus to scrape the pods.

prometheus.io/path: /metrics
prometheus.io/port: "8083"
prometheus.io/scrape: "true"

Secrets

If Lenses is configured for SSL and or SASL as secret called lenses-sasl-secret will be created in each namespace for use by the SQL Processors. These secrets are labelled lenses.io/app.type = lenses-secret.

Processors can be configured with SSL/SASL setting different to Lenses, .i.e. different key and truststore and different keytab files. Visit here for information on how to configure the SQL Processors for SASL/SSL.

Deployment Recovery

Lenses can recover existing SQL deployments at startup that have been previously created by Lenses. Additionally, it can also track deployments created outside of Lenses. For example, you may be using Helm to control and manage deployments in your production environment out of a CI/CD pipeline.