[Forward]Ten Caching Mistakes that Break your App
Introduction
Caching frequently used objects, that are expensive to fetch from the source, makes application perform faster under high load. It helps scale an application under concurrent requests. But some hard to notice mistakes can lead the application to suffer under high load, let alone making it perform better, especially when you are using distributed caching where there’s separate cache server or cache application that stores the items. Moreover, code that works fine using in-memory cache can fail when the cache is made out-of-process. Here I will show you some common distributed caching mistakes that will help you make better decisions when to cache and when not to cache.
Here are the top 10 mistakes I have seen:
- Relying on .NET’s default serializer
- Storing large objects in a single cache item
- Using cache to share objects between threads
- Assuming items will be in cache immediately after storing them
- Storing entire collection with nested objects
- Storing parent-child objects together and also separately
- Caching Configuration settings
- Caching Live Objects that have open handle to stream, file, registry, or network
- Storing same item using multiple keys
- Not updating or deleting items in cache after updating or deleting them on persistent storage
Let’s see what they are and how to avoid them.
I am assuming you have been using ASP.NET Cache or Enterprise Library Cache for a while, you are satisfied, now you need more scalability and have thus moved to an out-of-process or distributed cache like Velocity or Memcache. After that, things have started to fall apart and thus the common mistakes listed below apply to you.
Relying on .NET’s Default Serializer
When you use an out-of-process caching solution like Velocity or memcached, where items in cache are stored in a separate process than where your application runs; every time you add an item to the cache, it serializes the item into byte array and then sends the byte array to the cache server to store it. Similarly, when you get an item from the cache, the cache server sends back the byte array to your application and then the client library deserializes the byte array into the target object. Now .NET’s default serializer is not optimal since it relies on Reflection which is CPU intensive. As a result, storing items in cache and getting items from cache add high serialization and deserialization overhead that results in high CPU, especially if you are caching complex types. This high CPU usage happens on your application, not on the cache server. So, you should always use one of the better approaches shown in this article so that the CPU consumption in serialization and deserialization is minimized. I personally prefer the approach where you serialize and deserialize the properties all by yourself by implementing ISerializable interface and then implementing the deserialization constructor.
Collapse | Copy Code[Serializable]
public class Customer : ISerializable
{
public string FirstName;
public string LastName;
public int Salary;
public DateTime DateOfBirth; public Customer()
{
} public Customer(SerializationInfo info, StreamingContext context)
{
FirstName = info.GetString("FirstName");
LastName = info.GetString("LastName");
Salary = info.GetInt32("Salary");
DateOfBirth = info.GetDateTime("DateOfBirth");
} #region ISerializable Members public void GetObjectData(SerializationInfo info, StreamingContext context)
{
info.AddValue("FirstName", FirstName);
info.AddValue("LastName", LastName);
info.AddValue("Salary", Salary);
info.AddValue("DateOfBirth", DateOfBirth);
} #endregion
}
This prevents the formatter from using reflection. The performance improvement you get using this approach is sometimes 100 times better than the default implementation when you have large objects. So, I strongly recommend that at least for the objects that are cached, you should always implement your own serialization and deserialization code and not let .NET use Reflection to figure out what to serialize.
Storing Large Objects in a Single Cache Item
Sometimes we think large objects should be cached because they are too expensive to fetch from the source. For example, you might think caching an object graph of 1 MB might give you better performance than loading that object graph from file or database. You would be surprised how non scalable that is. It will certainly work a lot faster than loading the same thing from database when you have only one request at a time. But under concurrent load, frequent access to that large object graph will blow up server’s CPU. This is because Caching has high serialization and deserialization overhead. Every time you will try to get an 1 MB object graph from an out of process cache, it will consume significant CPU to build that object graph in memory.
Collapse | Copy Codevar largeObjectGraph = myCache.Get("LargeObjectGraph");
var anItem =
largeObjectGraph.FirstLevel.SecondLevel.ThirdLevel.FourthLevel.TheItemWeNeed;
Solution is not to cache the large object graph as a single item in the cache using a single key. Instead you should break that large object graph into smaller items and then cache those smaller items individually. You should only retrieve from cache the smallest item you need.
Collapse | Copy Code// store smaller parts in cache as individual item
var largeObjectGraph = new VeryLargeObjectGraph();
myCache.Add("LargeObjectGraph.FirstLevel.SecondLevel.ThirdLevel",
largeObjectGraph.FirstLevel.SecondLevel.ThirdLevel);
...
...
// get the smaller parts from cache
var thirdLevel = myCache.Get("LargeObjectGraph.FirstLevel.SecondLevel.ThirdLevel");
var anItem = thirdLevel.FourthLevel.TheItemWeNeed;
The idea is to look at the items that you need most frequently from the large object (say the connection strings from a configuration object graph) and store those items separately in the cache. Always keep in mind that the item that you retrieve from cache is always small, say max 8 KB.
Using Cache to Share Objects Between Multiple Threads
Since you can access cache from multiple threads, sometimes you use it to conveniently pass data between multiple threads. But cache, like static variables, can suffer from race conditions. It’s even more common when the cache is distributed since storing and reading an item requires out-of-process communication and your threads get more chance to overlap on each other than in-memory cache. The following example shows how in-memory cache rarely demonstrates the race condition but an out-of-process cache almost always shows it:
Collapse | Copy CodemyCache["SomeItem"] = 0; var thread1 = new Thread(new ThreadStart(() =>
{
var item = myCache["SomeItem"]; // Most likely 0
item ++;
myCache["SomeItem"] = item;
});
var thread2 = new Thread(new ThreadStart(() =>
{
var item = myCache["SomeItem"]; // Most likely 1
item ++;
myCache["SomeItem"] = item;
});
var thread3 = new Thread(new ThreadStart(() =>
{
var item = myCache["SomeItem"]; // Most likely 2
item ++;
myCache["SomeItem"] = item;
}); thread1.Start();
thread2.Start();
thread3.Start();
.
.
.
The above code most of the time demonstrates the most likely behavior when you are using in-memory cache. But when you go out-of-process or distributed, it will always fail to demonstrate the most-likely behavior. You need to implement some kind of locking here. Some caching provider allows you to lock an item. For example, Velocity has locking feature, but memcache does not. In Velocity, you can lock an item:
Collapse | Copy Code// get an item and lock it
DataCacheLockHandle handle;
SomeClass someItem = _defaultCache.GetAndLock("SomeItem",
TimeSpan.FromSeconds(1), out handle, true) as SomeClass;
// update an item
someItem.FirstName = "Version2";
// put it back and get the new version
DataCacheItemVersion version2 = _defaultCache.PutAndUnlock("SomeItem",
someItem, handle);
You can use locking to reliably read and write to cache items that get changed by multiple threads.
Assuming Items will be in Cache Immediately After Storing It
Sometimes you store an item in cache on a submit button click and assume that upon the page postback, the item can be read from cache because it was just stored in cache. You are wrong.
Collapse | Copy Codeprivate void SomeButton_Clicked(object sender, EventArgs e)
{
myCache["SomeItem"] = someItem;
} private void OnPreRender()
{
var someItem = myCache["SomeItem"]; // It's gone dude!
Render(someItem);
}
You can never assume an item will be in cache for sure. Even if you are storing the item in Line 1 and reading it from Line 3. When your application is under pressure and there’s a scarcity of physical memory, cache will flush out items that aren’t frequently used. So, by the time code reaches Line 3, cache could be flushed out. Never assume you can always get an item back from cache. Always have a null check and retrieve from persistent storage.
Collapse | Copy Codevar someItem = myCache["SomeItem"] as SomeClass ?? GetFromSource();
You should always use this format when reading an item from cache.
Storing Entire Collection with Nested Objects
Sometimes you store an entire collection in a single cache item because you need to access the items in the collection frequently. Thus every time you try to read an item from the collection, you have to load the collection first and then read that particular item. Something like this:
Collapse | Copy Codevar products = myCache.Get("Products");
var product = products[1];
This is inefficient. You are unnecessarily loading an entire collection just to read a certain item. You will have absolutely no problem when the cache is in-memory, as the cache will just store a reference to the collection. But in a distributed cache, where the entire collection is deserialized every time you access it, it will result in poor performance. Instead of caching a whole collection, you should cache individual items separately.
Collapse | Copy Code// store individual items in cache
foreach (Product product in products)
myCache.Add("Product." + product.Index, product);
...
...
// read the individual item from cache
var product = myCache.Get("Product.0");
The idea is simple, you store each item in the collection individually using a key that can be guessed easily, for example using the index as a padding.
Storing Parent-child Objects Together and Also Separately
Sometimes you store an object in cache that has a child object, which you also separately store in another cache item. For example, say you have a customer object that has an order collection. So, when you cache customer, the order collection gets cached as well. But then you separately cache the individual orders. So, when an individual order is updated in cache, the orders collection containing the same order inside the customer object is not updated and thus gives you inconsistent result. Again this works fine when you have in-memory cache but fails when your cache is made out-of-process or distributed.
Collapse | Copy Codevar customer = SomeCustomer();
var recentOrders = SomeOrders();
customer.Orders = GetCustomerOrders();
myCache.Add("RecentOrders", recentOrders);
myCache.Add("Customer", customer);
...
...
var recentOrders = myCahce.Get("RecentOrders");
var order = recentOrders["ORDER10001"];
order.Status = CANCELLED;
...
...
...
var customer = myCache.Get("Customer");
var order = customer.Orders["ORDER10001"];
order.Status = PROCESSING; // Inconsistent. The order has already been cancelled
This is a hard problem to solve. It requires clever design so that you never end up having the same object stored twice in the cache. One common approach is not to store child objects in cache, instead store keys of child object so that they can be retrieved from cache individually. So, in the above scenario, you would not store thecustomer’s order collection in cache. Instead you will store the OrderID collection with Customer and then when you need to see the orders of a customer, you try to load the individual order object using the OrderID.
Collapse | Copy Codevar recentOrders = SomeOrders();
foreach (Order order in recentOrders)
myCache.Add("Order." + order.ID, order);
...
var customer = SomeCustomer();
customer.OrderKeys = GetCustomerOrders(); // Store keys only
myCache.Add("Customer", customer);
...
...
var order = myCache.Get["Order.10001"];
order.Status = CANCELLED;
...
...
...
var customer = myCache.Get("Customer");
var customerOrders = customer.OrderKeys.ConvertAll<string, Order>
(key => myCache.Get("Order." + key));
var order = customerOrders["10001"]; // Correct object from cache
This approach ensures that a certain instance of an entity is stored in the cache only once, no matter how many times it appears in collections or parent objects.
Caching Configuration Settings
Sometimes you cache configuration settings. You use some cache expiration logic to ensure the configuration is refreshed periodically or refreshed when the configuration file or database table changes. Since configuration settings are access very frequently, reading them from cache adds significant CPU overhead. Instead you should just use static variables to store configurations.
Collapse | Copy Codevar connectionString = myCache.Get("Configuration.ConnectionString");
You should not follow such an approach. Getting an item from cache is not cheap. It may not be as expensive as reading from a file or registry. But it’s not very cheap either, especially if the item is a custom class that adds some serialization overhead. So, you should instead store the configuration settings in static variables. But you might ask, how do we refresh configuration without restarting appdomain when it’s stored in static variable? You can use some expiration logic like file listener to reload the configuration when configuration file changes or use some database polling to check for database update.
Caching Live Objects that have Open File, Registry or Network Handle
I have seen developers cache instance of classes which hold open connection to file, registry or external network connection. This is dangerous. When items are removed from cache, they aren’t disposed automatically. Unless you dispose such class, you leak system resource. Every time such a class instance is removed from cache due to expiration or some other reason without being disposed, it leaks the resources it was holding onto.
You should never cache such objects that hold open streams, file handles, registry handles or network connections just because you want to save opening the resource every time you need them. Instead you should use some static variable or use some in-memory cache that is guaranteed to give you expiration callback so that you can dispose them properly. Out of process caches or session stores do not give you expiration callback consistently. So, never store live objects there.
Storing Same Item using Multiple Keys
Sometimes you store objects in cache using the key and also by index because you not only need to retrieve items by key but also need to iterate through items using index. For example,
Collapse | Copy Codevar someItem = new SomeClass();
myCache["SomeKey"] = someItem;
.
.
myCache["SomeItem." + index] = someItem;
.
.
If you are using in-memory cache, the following code will work fine:
Collapse | Copy Codevar someItem = myCache["SomeKey"];
someItem.SomeProperty = "Hello";
.
.
.
var someItem = myCache["SomeItem." + index];
var hello = someItem.SomeProperty; // Returns Hello, fine, when In-memory cache
/* But fails when out of process cache */
The above code works when you have in-memory cache. Both of the items in the cache are referring to the same instance of the object. So, no matter how you get the item from cache, it always returns the same instance of the object. But in an out-of-process cache, especially in a distributed cache, items are stored after serializing them. Items aren’t stored by reference. Thus you store copies of items in cache, you never store the item itself. So, if you retrieve an item using a key, you are getting a freshly made copy of that item as the item is deserialized and created fresh every time you get it from cache. As a result, changes made to the object never reflects back to the cache unless you overwrite the item in the cache after making the changes. So, in a distributed cache, you will have to do the following:
Collapse | Copy Codevar someItem = myCache["SomeKey"];
someItem.SomeProperty = "Hello";
myCache["SomeKey"] = someItem; // Update cache
myCache["SomeItem." + index] = someItem; // Update all other entries
.
.
.
var someItem = myCache["SomeItem." + index];
var hello = someItem.SomeProperty; // Now it works in out-of-process cache
Once you update the cache entry using the modified item, it works as the items in the cache receive a new copy of the item.
Not Updating or Deleting Objects from Cache when Items are Updated or Deleted from Data Source
This again works in in-memory cache, but fails when you go to out-of-process/distributed cache. Here’s an example:
Collapse | Copy Codevar someItem = myCache["SomeItem"];
someItem.SomeProperty = "Hello Changed";
database.Update(someItem);
.
.
.
var someItem = myCache["SomeItem"];
Console.WriteLine(someItem.SomeProperty); // "Hello Changed"? Nope.
This works fine in an in-memory cache, but fails when it’s out-of-process or distributed cache. The reason is you changed the object but never updated the cache with the latest object. Items in cache are stored as a copy, not the original instance.
Another mistake is not deleting items from cache when the item is deleted from the database.
Collapse | Copy Codevar someItem = myCache["SomeItem"];
database.Delete(someItem);
.
.
.
var someItem = myCache["SomeItem"];
Console.WriteLine(someItem.SomeProperty); // Works fine. Oops!
Don’t forget to delete items from cache, all possible ways it has been stored in cache, when you delete an item from database, file or some persistent store.
Conclusion
Caching requires careful planning and clear understanding of the data being cached. Otherwise when cache is made distributed, it not only performs worse but can also fail the code. Keeping these common mistakes in mind while caching will help you cash out from your code.

License
This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)
[Forward]Ten Caching Mistakes that Break your App的更多相关文章
- UWP/Win10新特性系列—App Service
Win10中,新增了一个很实用的新特性叫做App Service,App Service允许App不在前台运行的情况下提供出一个或多个对外服务供其他App使用,这看起来就好像Web开发中的Web Ap ...
- hook框架-frida使用-APP在模拟器无法打开,用钩子去除限制
app拿soul为例子 一.环境配置 #模拟器的frida服务为86 #frida-server-12.9.8-android-x86 adb push frida-server-12.9.8-and ...
- Ionic + AngularJS
Ionic Framework Ionic framework is the youngest in our top 5 stack, as the alpha was released in lat ...
- 使用Core Data应避免的十个错误
原文:Avoiding Ten Big Mistakes iOS Developers Make with Core Data http://www.cocoachina.com/applenew ...
- 《深入理解Android2》读书笔记(五)
接上篇<深入理解Android2>读书笔记(四) startActivity Am void run() throws RemoteException { try { printMessa ...
- comp.lang.javascript FAQ [zz]
comp.lang.javascript FAQ Version 32.2, Updated 2010-10-08, by Garrett Smith FAQ Notes 1 Meta-FAQ met ...
- NetCoreApi框架搭建三、JWT授权验证)
1.首先还是粘贴大神的链接 虽然说大神的博客已经讲得很详细了,但是此处还是自己动手好点. 首先配置Startup Swagger的验证 2.新建一个项目存放tokenmodel和生成token并且存入 ...
- Flutter-动画-实践篇
一.了解AnimatedWidget 通常我们给一个Widget添加动画的时候都需要监听Animation的addListener()方法,并在这个方法里面不停的调用setState()方法通知Wei ...
- I18N
App.config <?xml version="1.0" encoding="utf-8" ?> <configuration> & ...
随机推荐
- c 常见错误
."c" not an argument in function sum 该标识符不是函数的参数2.array bounds missing ] in function main ...
- GDC2017资料整理
GDC2017的资料最近放出来了,我筛选了一下 特别是Horizon放出很多干货,也有一些去年的末班车.Vulkan有一堆,但不是很感兴趣 感谢王同学的搬运和分类(包含以下链接pdf和视频): htt ...
- vue-cli 本地数据模拟
方法一: 使用express搭建静态服务 mock数据写在json文件中,proxyTable 里将接口代理到具体mock数据json文件上.具体方法: 创建 mock 文件夹 build/dev-s ...
- 猫猫学iOS之小知识之xcode6自己主动提示图片插件 KSImageNamed的安装
猫猫分享,必须精品 原创文章,欢迎转载.转载请注明:翟乃玉的博客 地址:http://blog.csdn.net/u013357243 一:首先看效果 KSImageNamed是让XCode能预览项目 ...
- 谷歌Chrome浏览器小于12px字号显示的BUG
webkit的私有属性:html{-webkit-text-size-adjust:none;}
- Android 编程下 TextView 添加链接的一种方式
通过如下这种方式给 TextView 添加的链接支持链接样式.点击事件.href 样式,代码如下: package cn.sunzn.tlink; import android.app.Activit ...
- 关于electron的跨域问题,有本地的图片的地址,访问不了本地的图片
项目中有上传图片功能,自定义input type=file 改变透明度和超出部分隐藏,把按钮和 点击的图标放在上传文件的按钮上面,然后又碰到到获取input里面的图片的本地的路径, 在electron ...
- 常用查找算法(Java)
常用查找算法(Java) 2018-01-22 1 顺序查找 就是一个一个依次查找 2 二分查找 二分查找(Binary Search)也叫作折半查找. 二分查找有两个要求, 一个是数列有序, 另一个 ...
- goto语句的升级版,setjmp,longjmp
我们知道goto语句是不能跳过函数的,但是在我么C语言的应用中,在不使用汇编的情况下,遇到需要跳出深层循环比如检错机制的时候,有确实想要跨函数跳转,有没有上面办法可以做到呢? 这就是今天要讲的两个库函 ...
- jQuery 学习笔记2 点击时弹出一个对话框
上次学习的是页面加载完成后弹出一个警告框,这里我们改为当用户点击后弹出一个警告框. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Trans ...