Building your application topology

Topologies are represented as a series of nodes, where the series of nodes will begin with one or more Kafka topics, and optionally end with an output topic. In between, there will be a series of nodes, such as count, group, which represent the operations that your streaming library is performing. Each node has a name which uniquely identifies it, and a type which informs Lenses about the purpose of that node - a topic, a transformation step, a filter, and so on. In the case of topic nodes, the name of the node should be the name of the topic.

Nodes also need to declare any parent nodes, so that the Lenses UI knows how to render the relationships between the nodes. For example, a node a that reads from a topic b would need to declare the topic node as its parent.

For example, if we consider the classic word count application that was the original poster child application for Hadoop, and rework it for a streaming platform, then we would be reading lines of data, splitting up each line into words, grouping common words together, counting each group of words, and finally publishing the (word, count) tuples to an output topic.

Here is the full Java code of this five node topology “word count app”, with an input topic, a split operation, a groupby operation, the count operation, and finally the output topic. The builder entry method is start and we complete with build.

Topology topology = 
    TopologyBuilder.start(AppType.KafkaStreams, "spark-streaming-wordcount")
        .withTopic(inputTopic)
            .withDescription("Raw lines of text")
            .withRepresentation(Representation.TABLE)
        .endNode()
        .withNode("split", NodeType.SPLIT)
            .withDescription("Split lines into words")
            .withRepresentation(Representation.TABLE)
            .withParent(inputTopic)
        .endNode()
            .withNode("groupby", NodeType.GROUPBY)
            .withDescription("Group by word")
            .withRepresentation(Representation.TABLE)
            .withParent("split")
        .endNode()
            .withNode("count", NodeType.COUNT)
            .withDescription("Count instances of each word")
            .withRepresentation(Representation.TABLE)
            .withParent("groupby")
        .endNode()
            .withTopic(outputTopic)
            .withParent("count")
            .withDescription("Write output to the output topic")
            .withRepresentation(Representation.TABLE)
        .endNode()
        .build();

Notice that when we first create the builder, we supply the type and the name of the application we are building. These are displayed in the UI and are useful when there are multiple topologies to be rendered.

Also, look at the first node. This does not declare a parent, which is because it is an input topic and therefore has no upstream dependencies. The rest of the nodes declare the previous node as a parent.