Overview

Our goal is to provide self-service SQL based operational capabilities (SQL-Ops) to organizations and teams that want to leverage existing skills and do not want to worry about the complexity of Apache Kafka. When using Kafka, people need a way to easily visualize the data. Lenses SQL is addressing the data blindness Kafka comes with. Enterprise-grade streaming requires more than just data visualization, it requires joining, filtering and aggregation of data streams in order to provide real-time insights and reports. Until now, to achieve this with Apache Kafka requires developer skillsets (Java, Scala, Kotlin, or any Java Virtual Machine based language) and quite a steep learning curve to master the Kafka Streams API.

In addition, Lenses goes a step further and takes care of deployment and monitoring of real-time stream processing with SQL. And it does this while meeting enterprise requirements for security, auditing and data governance. Integration with Kubernetes, the de facto container orchestration framework, provides the scalability and fault tolerance mission-critical applications require.

Lenses contains all the Data Ops building blocks for operating real-time data in production, giving leverage to your team to focus on the business requirements rather than the platform infrastructure and learning new technology. It simplifies real-time analytics by encapsulating the underlying complexity of data management operations, and provides significant cost saving and significantly faster time to implementation.

The SQL engine for Apache Kafka which enterprises need

Using SQL in an enterprise environment needs to adhere to i) access control, ii) compliance and iii) regulation policies. This spans beyond the scope of securing access to the process executing the code. In a corporate environment, people have access to some topics and are restricted to others. Secure by design, Lenses makes sure user restrictions rules are easy to manage and be applied.

Traceability is another requirement. It is paramount to know Who has accessed the data and When. Therefore, all queries, regardless of being table-based or streaming ones, are tracked and audited.

Lenses platform empowers you with all the necessary protections, fine-grained access, and security, as well as data governance capabilities your organization needs.

DataOps with Lenses SQL

Why limit visibility and innovation only to specialists? Lenses provides state of the art SQL visibility into your data flows, engaging all areas of your business as you accelerate towards data driven platforms. Lenses SQL engine sits above your middleware (Apache Kafka), allowing you to focus on discovering and innovating with the data you know best. Lenses SQL support both bounded and unbounded streaming queries to join, filter and aggregate any data format. Integrate Lenses SQL engine into you applications with our Python, Redux or Golang client even you Spark or Flink jobs with the JDBC driver.

No Application Code

Lenses enables real-time data processing using SQL. The execution engine handles all the complexity of running real-time computation on data streams while providing live metrics for your queries.

Data enrichment

You can enrich your data in motion, to enhance or refine it in order to make it correct and insightful to work with.

Streaming Ingestion with SQL capabilities

As part of Lenses SQL, we enhance our Kafka Connectors configuration with an idiomatic SQL syntax to instruct the target systems with special options, expressed with friendly SQL code.

Streaming Analytics

Joins, filters, and aggregates, allow for real-time streaming analytics to be performed. You can use it to build:

  • Anomaly and threat detection
  • Recommendations
  • Predictive analytics
  • Sentiment analysis

Machine Learning

By continuously filtering and transforming the data, you can ensure it is ready for scoring. Thus, you can:

  • Continuously train models with real-time data
  • Integrate with tools data scientists love ie. Jupyter notebooks
  • Visualize data and topologies

Data Wrangling

Don't spend time in cleaning and preparing data. Lenses SQL makes it easy to spin up continuously running queries, over a massive amount of data to easily transform into a format which is easy to work with.

  • Perform pre-analysis to correct, filter, route, clean and enhance messages in real time
  • Build streaming SQL queries (sums, averages, aggregates, anonymize functions, etc.)
  • Preview live data as it flows through so that you can see filters and transformations continuously throughout the pipeline
  • Python native library, Jupyter integration

Streaming queries types

Bounded Queries

you can quickly investigate your data, try out your joins which can be easily taken to a continuous query after. There are many reasons why you would investigate your data, they span from looking for data anomaly to looking at a given record to check a specific column(or field - the two notions are interchangeable) value for a given record.

Continuous Queries

you can quickly investigate your data, try out your joins which can be easily taken to a continuous query after. There are many reasons why you would investigate your data, they span from looking for data anomaly to looking at a given record to check a specific column(or field - the two notions are interchangeable) value for a given record.

Connect Queries

you can quickly investigate your data, try out your joins which can be easily taken to a continuous query after. There are many reasons why you would investigate your data, they span from looking for data anomaly to looking at a given record to check a specific column(or field - the two notions are interchangeable) value for a given record.

Editor with impressive intellisense

Every aspect of Lenses is specifically designed to maximize user productivity. Lenses provides, intellisense, not just autocompletion. This brings context aware suggestions to make exploring your data simpler and more intuitive for every user.

Context aware intellisense, provides appropriate suggestions based on the position in the statement, for example, projections/where clauses being limited to tables referenced in the query

Data Formats

A SQL engine for any data format and any structure

A Kafka record is constructed of a key-value pair and metadata (offset, topic, timestamp, headers). When stored in a topic (or table - the terms are used here interchangeably) it can take various formats: AVRO, JSON, Protobuf or custom format.

The SQL engine has been written to handle out of the box AVRO, JSON, XML, CSV, STRING, INT, LONG and via extensions Google’s Protobuf or any other custom format one may use.

Irrespective of the data storage format, a user can select nested fields and arrays element, all using an ANSI SQL syntax. The query below has the same outcome regardless of the storage format be it AVRO, JSON, XML, Google’s Protobuf or any other one.

SELECT speed
, longitude AS long
, latitude AS lat
, current_timestamp AS ts
FROM   car_IoT
WHERE  speed > 0

A rich set of functions are provided out of the box, in order to support string, date or number manipulation. With the addition of User Defined Function, a user can provide their own functions to enrich the possibilities of Lenses SQL.

Auto-discovery of topics record format

Lenses platform does its best to understand the table storage format and regardless of the actual format, it keeps a table schema. It can work out if a topic contains AVRO, JSON, XML, INT or Binary, however distinguishing between String, Bytes, Int, Long is not possible without incurring a risk of falsely setting the payload. For the tables using these formats, or Google’s Protobuf or custom format, user input is required. Lenses user interface makes it easy to set the formats - an operation required once. When the tables are empty (i.e. there are no records present) Lenses does not have enough context to determine the payload type, hence it is required for the user to set both key and value payload.

Monitoring

It’s important to be able to monitor any executing SQL (be it Table-based or Streaming-based)in order for your team to have leverage in ensuring they meet their SLAs. SQL streaming is a first class citizen. It receives special treatment and gets out of the box real-time performance metrics, its own topology graph (to see how it manipulates the data) and each SQL continuous query will show up in the Lenses powerful data pipelines topology graph.

Scalability

There are three execution modes to run Lenses continuous SQL.

A continuous SQL query will be materialized at runtime into a Kafka Streams application.

IN_PROC is the default execution mode and the processors are executed locally within the Lenses application. This can have scalability issues and poses a risk to the running application, and can affect stability. IN_PROC is recommended only for testing.

CONNECT is the execution mode that solves these limitations and provides availability guarantees and scalability. Lenses can deploy the Lenses SQL processors in Kafka Connect. Kafka Connect provides a distributed, fault-tolerant and scalable framework as part of the core Apache Kafka distribution. We advise allocating a Kafka Connect cluster for the purpose of running the stream processing with SQL.

KUBERNETES is an execution mode that provides scalability by deploying Lenses SQL runners into Kubernetes clusters. Lenses can deploy and monitor SQL runner deployments created through Lenses or existing tools such as Helm or kubectl.

Integrations

Lenses provides a wide ecosystem of clients to access and enhance data in-motion. Python, Javascript and Golang clients are available and additionally a JDBC driver. Integrating via Lenses clients, helps you run queries and bring data to your own tools backed by the security controls and policies in Lenses. This way, you get out of the box lineage, data redaction features and a central secure gateway to access your data.

Protection

Authentication & Authorisation

Lenses makes data access simple and a gateway for innovation for your teams, but at the same we make sure that you have enterprise ready integrations. Lenses provides Role Based Access Control for basic authentication, but also LDAP and Kerberos integrations. Multi-tenancy is simplified, with whitelisting and blacklisting that also applies to any applications try to access the data.

Field Policies & Compliance & GDPR

Do you know where you data is going to? Which systems in your data platform contain PII data, which topics and which applications are using them? Lenses applies policies on the data flowing through your platform enabling you to answer these questions and track the flow of sensitive data through your organisation.

Traceability

It is paramount to know Who has accessed the data and When. Therefore, all queries, regardless of being table-based or streaming ones, are tracked and audited.

Videos

28 Nov 2018
Querying streaming data with Lenses SQL for Kafka records /1

28 Nov 2018
Querying streaming data with Lenses SQL for Kafka records /2

28 Nov 2018
Querying streaming data with Lenses SQL Insert Kafka records /3

27 Nov 2018
Working with streaming data with Lenses SQL

Documentation

Lenses SQL Quick start
Bounded Queries with Lenses Tabled based engine Syntax ( Functions, UDF, Examples)
Continuous Queries to create SQL processors with Lenses Streaming Engine Syntax ( Functions, Transformations, Aggregations, Joins ) Scaling, Examples
Connectors SQL Syntax
JDBC driver