Caching in Plain English
Will Using Cache leads to Better Performance…
Objectives
- Introduction to Caching
- Application vs Database Caching
- In-Memory Caches
- Content Distribution Network (CDN)
- Cache Invalidation
- Cache Eviction Policies
Caching
- Load Balancing helps you to scale horizontally across an ever-increasing number of servers.Caching will enable you to make vastly better use of the resources you already have.
- Caching consists of: pre-calculating results (e.g. the number of visits from each referring domain for the previous day), pre-generating expensive indexes (e.g. suggested stories based on a user’s click history), and storing copies of frequently accessed data in a faster backend (e.g. Memcache instead of PostgreSQL.
- Cache takes advantages of the locality of reference principle: recently requested data is likely to be requested again. They are used in almost every layer of computing: hardware, operating systems, web browsers, web applications, and more. A cache is like short-term memory: it has a limited amount of space, but is typically faster than the original data source and contains the most recently accessed items. Caches can exist at all levels in architecture, but are often found at the level nearest to the front end where they are implemented to return data quickly without taxing downstream levels.
Application vs Database Caching
There are two primary approaches to caching i.e., Application caching and database caching (mostly system relies on both)
Application Caching
Application caching requires explicit integration in the application code itself, usually it will check if a value is in the cache, if not retrieve the value from the database, then write that value into the cache.
Database Caching
In Database caching, we are going to get some value of default configuration, which will provide some degree of caching and performance. Those initial settings will be optimized for a generic usecase, and by tweaking them to your system’s access patterns you can generally squeeze a great deal of performance improvement.
In-Memory Caches
The most potent-in terms of raw performance caches you’ll encounter are those which store their entire set of data in memory. Memcached and Redis are both examples of in-memory caches (Redis can be configured to store some data into disk). This is because accesses to RAM are orders of RAM to magnitude faster than those to disk.
On the other hand, you will generally have far less RAM access available than disk space. So, you will need a strategy for only keeping the hot subset of your data in to your Memory cache. The most straight forward strategy is least-recently used (LRU) and it is employed by Memcache. LRU works by evicting less commonly used data in preference of more frequently used data, and is almost always an appropriate caching strategy.
Content Distribution Network (CDN)
CDNs are a kind of cache that comes into play for sites serving large amounts of static media. In a typical CDN setup, a request will first ask the CDN for a piece of static media; the CDN will serve that content if it has it locally available. If it isn’t available, the CDN will query the back-end servers for the file, cache it locally, and serve it to the requesting user.
If the system we are building isn’t yet large enough to have its own CDN, we can ease a future transition by serving the static media off a separate subdomain (e.g. mycdncontent.com) using a lightweight HTTP server like Nginx, and cut-over the DNS from your servers to a CDN later.
Cache Invalidation
While caching is fantastic, it does require you to maintain consistency between your caches and the source of truth (i.e. your database), at risk of truly bizarre application behavior.
Solving this problem is known as cache invalidation; there are three main schemes that are used:
- Write- through Cache: Data is written into cache and corresponding database at the same time.The cached data allows for faster retrieval and the same data written into permanent storage. We will have complete data consistency between the cache and storage.
Pros:
It ensures that nothing will get lost in case of a crash, power failure, or other system disruptions.
Minimizes the risk of data loss.
Cons:
Higher latency for write operations
2. Write around Cache: This technique is similar to write through cache, but data is written directly to permanent storage, bypassing the cache.
Pros:
Reduce the cache being flooded with write operations that will not subsequently be re-read
Cons:
Read request for recently written data will create a “cache miss” and must be read from slower back-end storage and experience higher latency.
3.Write- back Cache: Data is written to cache alone and completion is immediately confirmed to the client. The write to the permanent storage is done after specified intervals or under certain conditions.
Pros and Cons:
low latency and high throughput for write-intensive applications.
Risk of data loss due to only copy of the written data is in the cache.
Cache Eviction Policies
Following are some of the most common cache eviction policies:
- First In First Out (FIFO): The cache evicts the first block accessed first without any regard to how often or how many times it was accessed before.
- Last In First Out (LIFO): The cache evicts the block accessed most recently first without any regard to how often or how many times it was accessed before.
- Least Recently Used (LRU): Discards the least recently used items first.
- Most Recently Used (MRU): Discards, in contrast to LRU, the most recently used items first.
- Least Frequently Used (LFU): Counts how often an item is needed. Those that are used least often are discarded first.
- Random Replacement (RR): Randomly selects a candidate item and discards it to make space when necessary.
Conclusion
In this article we learnt about what is caching and the role it plays , types of caching and how to invalidate cache.
Thanks for all your support and feel free to share and comment . I am happy to learn from any one. That’s what droves me towards my passion to learn new things.