In this post we are going to see how you can leverage Lenses and SQL (Lenses SQL - our own SQL layer solution for Apache Kafka called LSQL) to create, execute and monitor Kafka Streams application defined with SQL. If you’ve worked with data before, a lot of time is allocated to extracting and massaging the data from various sources, and enhancing them into the required format. Lenses SQL Engine for Apache Kafka makes your ETL challenges a quick and integrated experience.
In this post we are going to see how Lenses can help you explore data in Kafka. Lenses comes with a powerful user interface for Kafka to explore historical or in motion data, for which you can run Lenses SQL Engine queries. This helps to quickly access data for debugging, analyzing or reporting but at the same time is not requiring being a developer to do so. In addition, Lenses comes with a set of REST and Web Socket endpoints that makes integration with your Kafka data simple.
In this brief entry we will discuss how count aggregation can be coded faster with Lenses SQL. Count aggregations is a very common scenario in stream processing. Some common use cases include aggregated time reports of transaction counts for a payment provider, views of products for an e-commerce site, how many customers are viewing a hotel and many more. In this article we will see how Lenses allows you to run these aggregations leveraging Lenses SQL engine.
Streaming Topologies out of the box! Lenses SQL in action Lenses SQL for Analyze, Process, Connect Lenses SQL supports the 3 major layers of your data streaming pipeline: Analyze: Run ad-hoc queries over Kafka topic in real time or history. Browsing your Kafka topics has never been easier and more powerful. Process: on top of Kafka Streams API in order to run comprehensive and production-quality streaming analytics. Connect: We build all our connectors by bringing SQL capability at the ingestion process.
Excited to have attended the Athens Big Data meetup in mid September to present our open source contributions to the streaming technologies and in particular around Apache KafkaTM.
Fast Data 3.2 is officially out! This release cycle took longer than usual but it brings many changes that will provide you with a more streamlined Kafka experience and let us build, test and enhance future releases quicker and with more confidence. The release has been available for our clients since last month.
Fast Data is our solution for installing and managing a modern Kafka stack through Cloudera Manager. Check here for an overview and request a trial today!
If you already use our CSD, read our documentation for instructions on how to upgrade without downtime. We are always available to help and can arrange for an engineer to walk you through.
When it comes to security, Apache Kafka as every distributed system provides the mechanisms to transfer data securely across the components being involved. Depending on your set up this might involve different services such as Kerberos, relying on multiple TLS certificates and advanced ACL setup in brokers and Zookeeper. In many cases, with encryption features enabled, performance is also taking a penalty hit.
The new version implements the Rest proxy v2 API so make sure you upgrade το the right version of rest proxy.
This presentation is by Angelos Petheriotis, senior engineer at HiveHome British Gas, at Apache Kafka meetup. Angelos presented the team’s journey to 50K msg/s from IoT Devices featuring our DM Stream-reactor connectors, how they use Kafka Connect Query Language and Landoop Web Tools. Enjoy! Streaming 4 billion Messages per day. Lessons Learned. from Angelos Petheriotis
This article presents how Avro lib writes to files and how we can achieve significant performance improvements by parallelizing the write. A (JVM) library has been implemented and is available on Github fast-avro-write The reason we proceeded with this implementation was a project that required writing multiple Μillions of Avro messages from Kafka onto a star DW (data warehouse) in HIVE (HDFS). You might have heard about (or even dealt with) the challenges of working with HDFS.
In this mini tutorial we will explore how to create a Kafka Connect Pipeline using the Kafka Development Environment (fast-data-dev) in order to move real time telemetry data into Elasticsearch and finally visualize the positions in a Kibana Tile Map by writing zero code…!
An FTP server, together with a pair of credentials is a common pattern, on how data providers expose data as a service. In this article we are going to implement custom file transformers to efficiently load files over FTP and using Kafka Connect convert them to meaningful events in Avro format. Depending on data subscriptions we might get access to FTP locations with files updated daily , weekly or monthly. File structures might be positional, csv, json , xml or even binary.
MQTT stands for MQ Telemetry Transport. It is a lightweight messaging protocol, designed for embedded hardware, low-power or limited-network applications and microcontrollers with limited RAM and/or CPU. It is a protocol that drives the IoT expansion. On the other hand, large numbers of small devices that produce frequent readings, lead to big data and the need for analysis in both time and space domain (spatial-temporal analysis). Kafka can be the highway that connects your IoT with your backend analytics and persistence.
How to simplify your ETL process using Kafka Connect for (E) and (L). Introducing KCQL - the Kafka Connect Query Language for fast-data pipelines. Using KCQL to set up Kafka Connectors for popular in-memory and analytical systems (live demos) such as HazelCast, Redis and InfluxDB. Use fast-data-dev docker for your kafka development environment. Enhancing your existing Cloudera (Hadoop) clusters with fast-data capabilities. Demos: http://schema-registry-ui.landoop.com http://kafka-topics-ui.landoop.com http://kafka-connect-ui.landoop.com https://fast-data-dev.demo.landoop.com/ Code https://github.com/landoop/ Connectors
Time-series datastores are of particular interest these days and influxDB is a popular open source distributed time-series database. In this tutorial we will integrate Kafka with InfluxDB using Kafka Connect and implement a Scala avro message producer to test the setup. The steps we are going to follow are: Setup a docker development environment Run an InfluxDB Sink Kafka Connector Create a Kafka Avro producer in Scala (use the schema registry) Generate some messages in Kafka Finally, we will verify the data in influxDB and visualise them in Chronograph.
A few days ago we open source’d Coyote, a tool we created in order to automate testing of our Landoop Boxes, which features a large range of environments for Big Data and Fast Data (see Kafka). Coyote does one simple thing: it takes a .yml file with a list of commands to setup, run and check their exit code and/or output. It has some other functionality too, but its essence is this.
Today we release our first beta CSD for Confluent Platform 3.0.0. It is robust enough to use in production; but we want to add at least some small touches before the final release which we expect to be fully compatible with the beta —drop-in replacement and upgrade.
Kafka is now the de-facto platform for streaming architectures, and it’s eco-system is maturing, but is not just
Enterprise Ready as many people in Big | Fast Data would like it to be. Landoop is a London based start-up
that wants to drive Kafka faster to the future, and thus..
We are announcing the kafka-topics-ui a User Interface that allows browsing data from Kafka Topics and a lot more
If you are looking for a safe way to interchange messages while using a fast streaming architecture such as Kafka, you need to look no further than Confluent’s schema-registry. This simple and state-less micro-service, uses the _schemas topic to hold schema versions, can run as a single-master multiple-slave architecture and supports multi data-center deployments.
We are happy to announce a UI, the schema-registry-ui a fully-featured tool for your underlying schema registry that allows visualization and exploration of registered schemas and a lot more…
We want to thank @Argos - the third largest retailer in UK - for inviting Landoop and @Accenture for hosting our presentation in one of the most beautiful theaters in the world, the IMAX theater in SCIENCE MUSEUM, London.
Important. Our Confluent CSD is deprecated and replaced by our most complete yet solution for a managed Kafka stack through Cloudera Manager, including monitoring, alerts and our exclusive UIs. See it here and request a trial today!
We are happy to announce the first version of our Confluent CSD.
Utilizing Landoop’s Confluent CSD you can create a Kafka Cluster with support services such as REST Proxy, Schema Registry and Kafka Connect in a few clicks.
Automatic SSL certificate issuance and renew with Ansible and Let’s Encrypt Here on Landoop we prototype fast and new (sub)domains are frequently added to complement our back and front-end services. Since the beginning our specifications included “ssl everywhere”. The journey into providing fully secure and encrypted services is a long one; hence we need an adventure in the SSL land. The tools, the needs. We use ansible to manage our servers.