Looking to build a Redis pipeline? Before we dig in and build our pipeline, let’s discuss what Redis is, its benefits, and common Redis pipeline use cases.
- What is Redis?
- Benefits of Redis
- Popular Redis Use Cases
- Publish and Subscribe Message: Redis Pipeline Creation in StreamSets
What is Redis?
Redis, which stands for Remote Dictionary Server, is an open source (BSD licensed), in-memory data structure store used as a database, cache, message broker, and streaming engine. Redis delivers sub-millisecond response times, enabling millions of requests per second for real-time applications in industries like ad-tech, financial services, gaming, healthcare, and IoT.
Today, Redis is one of the most popular open source engines today. Because of its fast performance, Redis is a popular choice for caching, session management, gaming, leaderboards, real-time analytics, geospatial, ride-hailing, chat/messaging, media streaming, and pub/sub apps.
Benefits of Redis
Performance
All Redis data resides in memory, which enables low latency and high throughput data access. Unlike traditional databases, In-memory data stores don’t require a trip to disk, reducing engine latency to microseconds. Because of this, in-memory data stores can support an order of magnitude more operations and faster response times. The result is blazing-fast performance with average read and write operations taking less than a millisecond and support for millions of operations per second.
Flexible Data Structures
Redis has a vast variety of data structures
- Strings – text or binary data up to 512MB in size
- Lists – a collection of Strings in the order they were added
- Sets – an unordered collection of strings with the ability to intersect, union, and diff other Set types
- Sorted Sets – Sets ordered by a value
- Hashes – a data structure for storing a list of fields and values
- Bitmaps – a data type that offers bit level operations
- HyperLogLogs – a probabilistic data structure to estimate the unique items in a data set
- Streams – a log data structure Message queue
- Geospatial – a longitude-/latitude-based entries Maps, “nearby”
Simplicity and Ease-of-Use
Redis enables you to write traditionally complex code with fewer and simpler lines. With Redis, you write fewer lines of code to store, access, and use data in your applications. Over a hundred open source clients are available for Redis developers. Supported languages include Java, Python, PHP, C, C++, C#, JavaScript, Node.js, Ruby, R, Go, and many others.
Replication and Persistence
Redis employs a primary-replica architecture and supports asynchronous replication where data can be replicated to multiple replica servers. This provides improved read performance (as requests can be split among the servers) and faster recovery when the primary server experiences an outage. For persistence, Redis supports point-in-time backups (copying the Redis data set to disk).
High Availability and Scalability
Redis offers a primary-replica architecture in a single node primary or a clustered topology. This allows you to build highly available solutions providing consistent performance and reliability. When you need to adjust your cluster size, various options to scale up and scale in or out are also available. This allows your cluster to grow with your demands.
Open Source
Redis is an open source project supported by a vibrant community.
Popular Redis Use Cases
- Caching
- Chat, messaging, and queues
- Gaming leaderboards
- Session store
- Rich media streaming
- Geospatial
- Machine Learning
- Real-time analytics
Redis Pub/Sub implements the messaging system where the senders (in Redis terminology, publishers) send the messages while the receivers (subscribers) receive them. The link by which the messages are transferred is called channel.
In Redis, a client can subscribe to any number of channels.
Connect to Redis and publish message on a channel
redis-cli -u redis://localhost:6379/0 localhost:6379> publish ch2 "{\"a\":2077941584}" (integer) 1
Connect to Redis and consume message from a channel
redis-cli -u redis://localhost:6379/0 localhost:6379> subscribe ch2 Reading messages... (press Ctrl-C to quit) 1) "subscribe" 2) "ch2" 3) (integer) 1 1) "message" 2) "ch2" 3) "{\"a\":2988789}" 1) "message" 2) "ch2" 3) "{\"a\":2077941584}"
Publish and Subscribe Message: Redis Pipeline Creation in StreamSets
Publish to Redis:
Configuration:
Redis tab:
- URI: redis://localhost:6379/0
- Mode : Publish
- Channel : ch2
Data Format:
- Data Format: JSON
Consumer:
Configuration:
Redis tab:
- URL: redis://localhost:6379/0
- Mode : Publish
- Channel : ch2
Data Format:
- Data Format: JSON
You can also view the data in the cli:
localhost:6379> subscribe ch2 Reading messages... (press Ctrl-C to quit) 1) "subscribe" 2) "channel2" 3) (integer) 1 1) "message" 2) "channel2" 3) "{\"city\":\"NY\",\"latitude_longitude\":{\"latitude\":\"37.7749\",\"longitude\":\"-122.4194\"},\"lst\":[\"one\",\"two\"]}"
Redis Destination (Batch Mode)
Say I have a json file with below content:
{"city": "city","city_name": "San Francisco", "State": "state","state_name":"CA", "Country": "country","country_name":"USA", "other":"other", "latitude_longitude": [{"latitude": "37.7749","longitude": "-122.4194"}] } ~
In the Redis configuration, select the mode as “Batch” and key values of type string like below:
Preview the Redis pipeline:
And view the data in the Redis cli:
redis-cli -h localhost -p 6379 localhost:6379> keys * 1) "NY" localhost:6379> keys * 1) "country" 2) "state" 3) "NY" 4) "city" localhost:6379>
Create Map object for batch:
Sample json file:
{"city": "NY", "latitude_longitude": {"latitude": "37.7749","longitude": "-122.4194"} }
In the configuration of Redis, enter key, value and type as Hash (shown below).
Key of name “NY” is created in Redis:
localhost:6379> keys *
1) "NY" 2) "country" 3) "state" 4) "city" localhost:6379> HGETALL NY 1) "latitude" 2) "37.7749" 3) "longitude" 4) "-122.4194"
Create List object in json
{"city": "NY", "latitude_longitude": {"latitude": "37.7749","longitude": "-122.4194"}, "lst": ["one","two"] }
Now, the city object with hold list values
localhost:6379> DEL NY (integer) 1 localhost:6379> keys * 1) "NY" 2) "country" 3) "state" 4) "city" localhost:6379> TYPE NY List localhost:6379> lrange NY 0 1 1) "two" 2) "one"
You can see from this Redis pipeline example the benefits and limitations of embedded python in your smart data pipelines. StreamSets aims to bridge the gap between the ultimate control of hand coding and ease and repeatability of a graphical interface.
With StreamSets you can:
- Quickly build, deploy, and scale streaming, batch, CDC, ETL and ML pipelines
- Handle data drift automatically, keeping jobs running even when schemas and structures change
- Deploy, monitor, and manage all your data pipelines – across hybrid and multi-cloud – from a single dashboard
Try smart data pipelines out yourself with StreamSets, a fully cloud-based, all-in-one DataOps platform. Sign up now and start building pipelines for free!