skip to Main Content

The DataOps Blog

Where Change Is Welcome

Redis Pipeline: How to Publish and Subscribe Data from Redis to Your Destination

Wilson Shamim
By Posted in Engineering May 13, 2022

Looking to build a Redis pipeline? Before we dig in and build our pipeline, let’s discuss what Redis is, its benefits, and common Redis pipeline use cases.

What is Redis?

Redis, which stands for Remote Dictionary Server, is an open source (BSD licensed), in-memory data structure store used as a database, cache, message broker, and streaming engine. Redis delivers sub-millisecond response times, enabling millions of requests per second for real-time applications in industries like ad-tech, financial services, gaming, healthcare, and IoT.

 

Today, Redis is one of the most popular open source engines today. Because of its fast performance, Redis is a popular choice for caching, session management, gaming, leaderboards, real-time analytics, geospatial, ride-hailing, chat/messaging, media streaming, and pub/sub apps.

Benefits of Redis

Performance

All Redis data resides in memory, which enables low latency and high throughput data access. Unlike traditional databases, In-memory data stores don’t require a trip to disk, reducing engine latency to microseconds. Because of this, in-memory data stores can support an order of magnitude more operations and faster response times. The result is blazing-fast performance with average read and write operations taking less than a millisecond and support for millions of operations per second.

Flexible Data Structures

Redis has a vast variety of data structures

  • Strings – text or binary data up to 512MB in size
  • Lists – a collection of Strings in the order they were added
  • Sets – an unordered collection of strings with the ability to intersect, union, and diff other Set types
  • Sorted Sets – Sets ordered by a value
  • Hashes – a data structure for storing a list of fields and values
  • Bitmaps – a data type that offers bit level operations
  • HyperLogLogs – a probabilistic data structure to estimate the unique items in a data set
  • Streams – a log data structure Message queue
  • Geospatial – a longitude-/latitude-based entries Maps, “nearby”

Simplicity and Ease-of-use

Redis enables you to write traditionally complex code with fewer and simpler lines. With Redis, you write fewer lines of code to store, access, and use data in your applications. Over a hundred open source clients are available for Redis developers. Supported languages include Java, Python, PHP, C, C++, C#, JavaScript, Node.js, Ruby, R, Go, and many others.

Replication and Persistence

Redis employs a primary-replica architecture and supports asynchronous replication where data can be replicated to multiple replica servers. This provides improved read performance (as requests can be split among the servers) and faster recovery when the primary server experiences an outage. For persistence, Redis supports point-in-time backups (copying the Redis data set to disk).

High Availability and Scalability

Redis offers a primary-replica architecture in a single node primary or a clustered topology. This allows you to build highly available solutions providing consistent performance and reliability. When you need to adjust your cluster size, various options to scale up and scale in or out are also available. This allows your cluster to grow with your demands.

Open Source

Redis is an open source project supported by a vibrant community.

Popular Redis Use Cases

  • Caching
  • Chat, messaging, and queues
  • Gaming leaderboards
  • Session store
  • Rich media streaming
  • Geospatial
  • Machine Learning
  • Real-time analytics

Redis Pub/Sub implements the messaging system where the senders (in Redis terminology, publishers) send the messages while the receivers (subscribers) receive them. The link by which the messages are transferred is called channel.

In Redis, a client can subscribe to any number of channels.

Connect to Redis and publish message on a channel

redis-cli -u redis://localhost:6379/0

localhost:6379> publish ch2 "{\"a\":2077941584}"

(integer) 1

Connect to Redis and consume message from a channel

redis-cli -u redis://localhost:6379/0

localhost:6379> subscribe ch2

Reading messages... (press Ctrl-C to quit)

1) "subscribe"
2) "ch2"
3) (integer) 1
1) "message"
2) "ch2"
3) "{\"a\":2988789}"
1) "message"
2) "ch2"
3) "{\"a\":2077941584}"

Publish and Subscribe Message: Redis Pipeline Creation in StreamSets

Publish to Redis:

Configuration:

Redis tab:

  • URI: redis://localhost:6379/0
  • Mode : Publish
  • Channel : ch2

Data Format:

  • Data Format: JSON

Publishing to Redis Step One

Publishing to Redis Step 2

 

Consumer:

Configuration:

Redis tab:

  • URL: redis://localhost:6379/0
  • Mode : Publish
  • Channel : ch2

Data Format:

  • Data Format: JSON

Redis Configuration JSON Data Format

Redis Consumer Record Count and Throughput

You can also view the data in the cli:

localhost:6379> subscribe ch2
Reading messages... (press Ctrl-C to quit)
1) "subscribe"
2) "channel2"
3) (integer) 1
1) "message"
2) "channel2"
3) "{\"city\":\"NY\",\"latitude_longitude\":{\"latitude\":\"37.7749\",\"longitude\":\"-122.4194\"},\"lst\":[\"one\",\"two\"]}"

Redis Destination (Batch Mode)

Say I have a json file with below content:

{"city":  "city","city_name": "San Francisco",
"State": "state","state_name":"CA",
"Country": "country","country_name":"USA",
"other":"other",
"latitude_longitude":
[{"latitude": "37.7749","longitude": "-122.4194"}]
}
~ 

In the Redis configuration, select the mode as “Batch” and key values of type string like below:

Redis Destination Batch Mode

 

Preview the Redis pipeline:

Preview Redis Pipeline

 

And view the data in the Redis cli:

redis-cli -h localhost -p 6379
localhost:6379> keys *
1) "NY"
localhost:6379> keys *
1) "country"
2) "state"
3) "NY"
4) "city"
localhost:6379> 

Create Map object for batch:

Sample json file:

{"city":  "NY",
"latitude_longitude":
{"latitude": "37.7749","longitude": "-122.4194"}
}

In the configuration of Redis, enter key, value and type as Hash (shown below).

Redis Configuration Hash

 

Key of name “NY” is created in Redis:

localhost:6379> keys *

1) "NY"
2) "country"
3) "state"
4) "city"

localhost:6379> HGETALL NY
1) "latitude"
2) "37.7749"
3) "longitude"
4) "-122.4194"

Create List object in json

{"city": "NY",
"latitude_longitude":
{"latitude": "37.7749","longitude": "-122.4194"},
 "lst": ["one","two"]

}

 

Redis List Object JSON

Now, the city object with hold list values 

Redis City Object List Values

 

localhost:6379> DEL NY
(integer) 1

localhost:6379> keys *
1) "NY"
2) "country"
3) "state"
4) "city"

localhost:6379> TYPE NY
List
localhost:6379> lrange NY 0 1
1) "two"
2) "one"

You can see from this Redis pipeline example the benefits and limitations of embedded python in your smart data pipelines. StreamSets aims to bridge the gap between the ultimate control of hand coding and ease and repeatability of a graphical interface.

With StreamSets you can:

  • Quickly build, deploy, and scale streaming, batch, CDC, ETL and ML pipelines
  • Handle data drift automatically, keeping jobs running even when schemas and structures change
  • Deploy, monitor, and manage all your data pipelines – across hybrid and multi-cloud – from a single dashboard 

Try smart data pipelines out yourself with StreamSets, a fully cloud-based, all-in-one DataOps platform. Sign up now and start building pipelines for free

Back To Top