Caching is the storing of data/response in some temporary storage area so that it could be retrieved efficiently without undergoing database hits or computations over and over again. A perfectly implemented caching system will definitely boost up the performance of a web app/API response.
We can cache data at two levels i.e., server level and browser level. Here I would like to share some ideas based on my experience with server side caching.
You may be feeling that your site is a bit slower in giving responses, taking unusually long time to load content or that the overall performance is noticeably low. This is when the need arises for you to devise methods to boost up the performance. Caching plays an important part in boosting your site’s performance. Caching is one of the best methods to improve your sites performance.
It is important to identify where all we need to implement caching mechanism i.e., the places in the site which tends to drag it and make it slow. We can cache the entire response or we can cache certain db query results.
Let me explain with the example of a recent project I worked in. I went about documenting the places to implement cache and the type of caching required (whether it is response caching or db caching).
I checked the places which deliver some heavy responses. By this I mean responses which usually do not change every time. In such a scenario we can cache the entire response. The same response should be invalidated whenever anything changes in db or any content that affects the cached response changes.
Then I checked for the statements which extract bundles of data from the database, the statements which hit the db often. Database hits actually eat up time. In fact coders make their queries better and better to reduce the time at the SQL level. This is where we cache our queried data. So when the request is received we serve from the cache, thereby avoiding the need to run queries and take data from the database. This avoids unwanted db hits and apparently saves time.
The system will behave erroneously if proper cache invalidation is not implemented. It will look as if your changes are not reflected in the response/query results.
To illustrate this, take the case of cached db results. If anything got changed in the db that is related to the cached data, the server will be taking data from cache, the cached data is outdated and that will create bugs. So we have to invalidate the cached data when it needs to be modified /deleted.
Suppose that we have a cached API that computes data from two tables and gives response back. If we have changed any entries or added any data to the tables, i.e., when we have to carry out invalidation, we will have to again compute the queries and store the new result in cache.
It will be better if we store the cached data in any other server to reduce memory consumption. The better way of doing this is using redis caching back end. Amazon provides redis services in cloud. It will be a bit faster and efficient to use.
Basically the data is cached as key value pairs in redis. Write a common service that can be used for creation of the key, so the same logic can be used to regenerate the key when we have to invalidate it. The key can be made from certain strings or any data. Any algorithm can be used for this. It is just that you should be able to recreate the same key at invalidation level and retrieval level.
If we do not write any common service for caching and invalidation, we will have to write caching/invalidation mechanisms everywhere in the code and that will look horrible. :)
Some responses will be dependent on the logged in user and some will not be. The key generation algorithm should consider the logged in user’s unique identity at key creation time if the response is dependent of the logged in user.