During the last decade, applications dealing with high data throughput were limited to near real time operation. Take Network Intrusion Detection Systems (NIDS) for example; a crucial tool in network security —whatever your definition of security is. Until a few years ago, such systems required expensive proprietary hardware solutions, tied to the hardware vendor’s —often poor— software tools. With the advent of cheap and powerful hardware and open source networking solutions, NIDS is now within the grasp of organizations of all sizes; route your traffic through a linux-powered router, then use a tool such as MantisNet’s Programmamble Packet Engine (PPE) to capture network traffic and at last send the data to a high performance streaming framework such as Kafka where, you can use extremely scalable SQL to analyze and react in real-time to threats using Lenses.
August just became more hot; with great pleasure we —team Landoop— announce the immediate release of Lenses 2.1. Following an ambitious road-map, this version focuses on fortifying our SQL engine’s capabilities, the all-new global streaming topology graph and an improved user experience. Lenses SQL now supports ALL data formats The Lenses SQL streaming engine for Apache Kafka can now handle any type of serialization format, including the much requested Google’s Protobuf.
So you gave our Lenses Box a spin and were sold immediately, now you are ready to run Lenses against your own Kafka cluster. We actually get this a lot! As it happens your cluster is on Azure, maybe even on Azure HDInsight; Microsoft after all, a few days ago, announced the general availability of Kafka 1.0 for HDInsight. As always, we’ve got you covered.
In this article we’ll go through a simple demonstration on how to setup Lenses on Azure and connect it to a HDInsight Kafka cluster, maybe even throw in a Schema Registry instance for good measure. Note that I am an Azure beginner myself, so many clouds to learn, so little time. Luckily Lenses is designed to work effortlessly with any Kafka installation. We love, support and learn from all vendors.
In our previous blog we introduced the exciting new open source JDBC driver for Apache Kafka via Lenses. In this article we’ll delve deeper and show how we can use the driver in conjunction with Apache Spark. For those who are new to Spark, Apache Spark is an in-memory distributed processing engine which supports both a programatic and SQL API. Spark will split a dataset into partitions and distribute these partitions across a cluster.
Last December we announced our commitment to provide the necessary capabilities for data streaming systems, that will enable data-driven businesses to achieve compliance with GDPR prior to the regulation’s effective date (May 25, 2018), and this post explains how Lenses delivers by providing Data Governance capabilities and GDPR compliance by design. The immutable nature of modern high-performance distributed systems, provides a lot of competitive advantages to various industries that are interested in fast loading streams of events and apply low latency queries and scalable processing for data in motion.
One of the most well supported protocols that the JDK has introduced is the JDBC (Java Database Connectivity) interface for accessing relational databases. Over two decades since it debuted, the number of supported databases has grown to include databases that are not relational, and in some cases, not even databases. Now, through the recently released Lenses JDBC driver, Apache Kafka can be added to the list of supported technologies.
Today, we are very pleased to announce the release of Lenses v.2.0
Lenses is the streaming management platform for Apache Kafka. This release focuses on improvements based on the feedback we’ve received and introduces a ton of exciting new features.
Here’s a quick overview:
Mike Barlotta, Agile Data Engineer at WalmartLabs introduces how Kafka Connect and
Stream Reactor can be leveraged to bring data from Cassandra into Apache Kafka.
In the first part of this series (see Getting started with the Kafka Connect Cassandra Source) we looked at how to get Kafka Connect setup with the Cassandra Source connector from Landoop. We also took a look at some design considerations for the Cassandra tables. In this post we will examine some of the options we have for tuning the Cassandra Source connector.
Mike Barlotta, Agile Data Engineer at WalmartLabs introduces how Kafka Connect and
Stream Reactor can be leveraged to bring data from Cassandra into Apache Kafka.
This post will look at how to setup and tune the Cassandra Source connector that is available from Landoop. The Cassandra Source connector is used to read data from a Cassandra table, writing the contents into a Kafka topic using only a configuration file. This enables data that has been saved to be easily turned into an event stream.
Angelos Petheriotis, Senior Data Engineer at Centrica (Hive Home/British Gas) shares parts of their data journey,
building IoT realtime data pipelines with Stream-Reactor, Kafka and Kubernetes.
Driving billions of messages per day through multiple processing pipelines requires a significant amount of processing and persisting jobs. We designed our pipelines having in mind a real time, durable and stable continuous data pipeline. In order to achieve this goal we made our services and our infrastructure as decoupled as possible.
IoT with Kafka via Lenses The rapidly growing number of interconnected devices confirms the Internet of Things (IoT) is a fast maturing technology. The digital economy has its own currency and that is data. Similar to the standard currencies, data is valuable if you can use it. The IoT is a driver for being data rich. However, having the data is not quite enough; you need to be able to analyze the data and take the appropriate action.
Stream Reactor, the largest open-source collection of Apache Kafka connectors, has released today many new features, bug fixes and new connectors for Apache Pulsar!
In this previous post we showed how to scale out Lenses SQL processors with Kafka Connect. Connect is one on three execution modes for LSQL processors via Lenses, we also have in process, mainly for developers and Kubernetes, the subject of this post.
We are super excited to announce the new Lenses release v1.1! Lenses is a streaming platform for Apache Kafka which supports the core elements of Kafka, vital enterprise features and a rich web interface to simplify your Kafka development and operations. Lenses also ships with a free single broker development environment which provides a pre-setup Kafka environment with connectors and examples for your local development. Since November’s release, Lenses has been widely adopted and we would like to thank you all for your valuable feedback which we have taken into account as part of this release.
As mentioned in previous post, Lenses SQL leverages Kafka Streams to process data and currently provides 3 execution modes to run Lenses SQL processors. In this video we demonstrate how to scale out using CONNECT mode as well as how to manage the Lenses SQL processors via the Lenses web interface or the CLI tool.
General Data Protection Regulation Read the May 2018 updated article GDPR - Data Governance with Apache Kafka and Lenses 2.0 GDPR is an important piece of legislation designed to strengthen and unify data protection laws for all individuals within the European Union. The regulations becomes effective and enforceable on the 25th May 2018. Our commitment is to provide the necessary capabilities in data streaming systems, to allow your data-driven business to achieve compliance with GDPR prior to the regulation’s effective date.
In this post we are going to see how you can leverage Lenses and SQL (Lenses SQL - our own SQL layer solution for Apache Kafka called LSQL) to create, execute and monitor Kafka Streams application defined with SQL. If you’ve worked with data before, a lot of time is allocated to extracting and massaging the data from various sources, and enhancing them into the required format. Lenses SQL Engine for Apache Kafka makes your ETL challenges a quick and integrated experience.
In this post we are going to see how Lenses can help you explore data in Kafka. Lenses comes with a powerful user interface for Kafka to explore historical or in motion data, for which you can run Lenses SQL Engine queries. This helps to quickly access data for debugging, analyzing or reporting but at the same time is not requiring being a developer to do so. In addition, Lenses comes with a set of REST and Web Socket endpoints that makes integration with your Kafka data simple.
In this brief entry we will discuss how count aggregation can be coded faster with Lenses SQL. Count aggregations is a very common scenario in stream processing. Some common use cases include aggregated time reports of transaction counts for a payment provider, views of products for an e-commerce site, how many customers are viewing a hotel and many more. In this article we will see how Lenses allows you to run these aggregations leveraging Lenses SQL engine.
Streaming Topologies out of the box! Lenses SQL in action Lenses SQL for Analyze, Process, Connect Lenses SQL supports the 3 major layers of your data streaming pipeline: Analyze: Run ad-hoc queries over Kafka topic in real time or history. Browsing your Kafka topics has never been easier and more powerful. Process: on top of Kafka Streams API in order to run comprehensive and production-quality streaming analytics. Connect: We build all our connectors by bringing SQL capability at the ingestion process.
Excited to have attended the Athens Big Data meetup in mid September to present our open source contributions to the streaming technologies and in particular around Apache Kafka®.
Fast Data 3.2 is officially out! This release cycle took longer than usual but it brings many changes that will provide you with a more streamlined Kafka experience and let us build, test and enhance future releases quicker and with more confidence. The release has been available for our clients since last month.
Fast Data is our solution for installing and managing a modern Kafka stack through Cloudera Manager. Check here for an overview and request a trial today!
If you already use our CSD, read our documentation for instructions on how to upgrade without downtime. We are always available to help and can arrange for an engineer to walk you through.
When it comes to security, Apache Kafka as every distributed system provides the mechanisms to transfer data securely across the components being involved. Depending on your set up this might involve different services such as Kerberos, relying on multiple TLS certificates and advanced ACL setup in brokers and Zookeeper. In many cases, with encryption features enabled, performance is also taking a penalty hit.
The new version implements the Rest proxy v2 API so make sure you upgrade το the right version of rest proxy.
This presentation is by Angelos Petheriotis, senior engineer at HiveHome British Gas, at Apache Kafka meetup. Angelos presented the team’s journey to 50K msg/s from IoT Devices featuring our DM Stream-reactor connectors, how they use Kafka Connect Query Language and Landoop Web Tools. Enjoy! Streaming 4 billion Messages per day. Lessons Learned. from Angelos Petheriotis
This article presents how Avro lib writes to files and how we can achieve significant performance improvements by parallelizing the write. A (JVM) library has been implemented and is available on Github fast-avro-write The reason we proceeded with this implementation was a project that required writing multiple Μillions of Avro messages from Kafka onto a star DW (data warehouse) in HIVE (HDFS). You might have heard about (or even dealt with) the challenges of working with HDFS.
In this mini tutorial we will explore how to create a Kafka Connect Pipeline using the Kafka Development Environment (fast-data-dev) in order to move real time telemetry data into Elasticsearch and finally visualize the positions in a Kibana Tile Map by writing zero code…!
An FTP server, together with a pair of credentials is a common pattern, on how data providers expose data as a service. In this article we are going to implement custom file transformers to efficiently load files over FTP and using Kafka Connect convert them to meaningful events in Avro format. Depending on data subscriptions we might get access to FTP locations with files updated daily , weekly or monthly.
MQTT stands for MQ Telemetry Transport. It is a lightweight messaging protocol, designed for embedded hardware, low-power or limited-network applications and microcontrollers with limited RAM and/or CPU. It is a protocol that drives the IoT expansion. On the other hand, large numbers of small devices that produce frequent readings, lead to big data and the need for analysis in both time and space domain (spatial-temporal analysis). Kafka can be the highway that connects your IoT with your backend analytics and persistence.
How to simplify your ETL process using Kafka Connect for (E) and (L). Introducing KCQL - the Kafka Connect Query Language for fast-data pipelines. Using KCQL to set up Kafka Connectors for popular in-memory and analytical systems (live demos) such as HazelCast, Redis and InfluxDB. Use fast-data-dev docker for your kafka development environment. Enhancing your existing Cloudera (Hadoop) clusters with fast-data capabilities. Demos: http://schema-registry-ui.landoop.com http://kafka-topics-ui.landoop.com http://kafka-connect-ui.
Time-series datastores are of particular interest these days and influxDB is a popular open source distributed time-series database. In this tutorial we will integrate Kafka with InfluxDB using Kafka Connect and implement a Scala avro message producer to test the setup. The steps we are going to follow are: Setup a docker development environment Run an InfluxDB Sink Kafka Connector Create a Kafka Avro producer in Scala (use the schema registry) Generate some messages in Kafka Finally, we will verify the data in influxDB and visualise them in Chronograph.
A few days ago we open source’d Coyote, a tool we created in order to automate testing of our Landoop Boxes, which features a large range of environments for Big Data and Fast Data (see Kafka). Coyote does one simple thing: it takes a .yml file with a list of commands to setup, run and check their exit code and/or output. It has some other functionality too, but its essence is this.
Today we release our first beta CSD for Confluent Platform 3.0.0. It is robust enough to use in production; but we want to add at least some small touches before the final release which we expect to be fully compatible with the beta —drop-in replacement and upgrade.
Kafka is now the de-facto platform for streaming architectures, and it’s eco-system is maturing, but is not just
Enterprise Ready as many people in Big | Fast Data would like it to be. Landoop is a London based start-up
that wants to drive Kafka faster to the future, and thus..
We are announcing the kafka-topics-ui a User Interface that allows browsing data from Kafka Topics and a lot more
If you are looking for a safe way to interchange messages while using a fast streaming architecture such as Kafka, you need to look no further than Confluent’s schema-registry. This simple and state-less micro-service, uses the _schemas topic to hold schema versions, can run as a single-master multiple-slave architecture and supports multi data-center deployments.
We are happy to announce a UI, the schema-registry-ui a fully-featured tool for your underlying schema registry that allows visualization and exploration of registered schemas and a lot more…
We want to thank @Argos - the third largest retailer in UK - for inviting Landoop and @Accenture for hosting our presentation in one of the most beautiful theaters in the world, the IMAX theater in SCIENCE MUSEUM, London.
Important. Our Confluent CSD is deprecated and replaced by our most complete yet solution for a managed Kafka stack through Cloudera Manager, including monitoring, alerts and our exclusive UIs. See it here and request a trial today!
We are happy to announce the first version of our Confluent CSD.
Utilizing Landoop’s Confluent CSD you can create a Kafka Cluster with support services such as REST Proxy, Schema Registry and Kafka Connect in a few clicks.
Automatic SSL certificate issuance and renew with Ansible and Let’s Encrypt Here on Landoop we prototype fast and new (sub)domains are frequently added to complement our back and front-end services. Since the beginning our specifications included “ssl everywhere”. The journey into providing fully secure and encrypted services is a long one; hence we need an adventure in the SSL land. The tools, the needs. We use ansible to manage our servers.