Caching large objects, duplicate objects, caching collections, live objects, thread unsafe caching and other common mistakes break your app instead of making it fly. Learn ten common caching mistakes devs make.

Introduction

Caching frequently used objects, that are expensive to fetch from the source, makes application perform faster under high load. It helps scale an application under concurrent requests. But some hard to notice mistakes can lead the application to suffer under high load, let alone making it perform better, especially when you are using distributed caching where there’s separate cache server or cache application that stores the items. Moreover, code that works fine using in-memory cache can fail when the cache is made out-of-process. Here I will show you some common distributed caching mistakes that will help you make better decisions when to cache and when not to cache.

Here are the top 10 mistakes I have seen:

  1. Relying on .NET’s default serializer
  2. Storing large objects in a single cache item
  3. Using cache to share objects between threads
  4. Assuming items will be in cache immediately after storing them
  5. Storing entire collection with nested objects
  6. Storing parent-child objects together and also separately
  7. Caching Configuration settings
  8. Caching Live Objects that have open handle to stream, file, registry, or network
  9. Storing same item using multiple keys
  10. Not updating or deleting items in cache after updating or deleting them on persistent storage

Let’s see what they are and how to avoid them.

I am assuming you have been using ASP.NET Cache or Enterprise Library Cache for a while, you are satisfied, now you need more scalability and have thus moved to an out-of-process or distributed cache like Velocity or Memcache. After that, things have started to fall apart and thus the common mistakes listed below apply to you.

Relying on .NET’s Default Serializer

When you use an out-of-process caching solution like Velocity or memcached, where items in cache are stored in a separate process than where your application runs; every time you add an item to the cache, it serializes the item into byte array and then sends the byte array to the cache server to store it. Similarly, when you get an item from the cache, the cache server sends back the byte array to your application and then the client library deserializes the byte array into the target object. Now .NET’s default serializer is not optimal since it relies on Reflection which is CPU intensive. As a result, storing items in cache and getting items from cache add high serialization and deserialization overhead that results in high CPU, especially if you are caching complex types. This high CPU usage happens on your application, not on the cache server. So, you should always use one of the better approaches shown in this article so that the CPU consumption in serialization and deserialization is minimized. I personally prefer the approach where you serialize and deserialize the properties all by yourself by implementing ISerializable interface and then implementing the deserialization constructor.

 Collapse | Copy Code
[Serializable]
public class Customer : ISerializable
{
public string FirstName;
public string LastName;
public int Salary;
public DateTime DateOfBirth; public Customer()
{
} public Customer(SerializationInfo info, StreamingContext context)
{
FirstName = info.GetString("FirstName");
LastName = info.GetString("LastName");
Salary = info.GetInt32("Salary");
DateOfBirth = info.GetDateTime("DateOfBirth");
} #region ISerializable Members public void GetObjectData(SerializationInfo info, StreamingContext context)
{
info.AddValue("FirstName", FirstName);
info.AddValue("LastName", LastName);
info.AddValue("Salary", Salary);
info.AddValue("DateOfBirth", DateOfBirth);
} #endregion
}

This prevents the formatter from using reflection. The performance improvement you get using this approach is sometimes 100 times better than the default implementation when you have large objects. So, I strongly recommend that at least for the objects that are cached, you should always implement your own serialization and deserialization code and not let .NET use Reflection to figure out what to serialize.

Storing Large Objects in a Single Cache Item

Sometimes we think large objects should be cached because they are too expensive to fetch from the source. For example, you might think caching an object graph of 1 MB might give you better performance than loading that object graph from file or database. You would be surprised how non scalable that is. It will certainly work a lot faster than loading the same thing from database when you have only one request at a time. But under concurrent load, frequent access to that large object graph will blow up server’s CPU. This is because Caching has high serialization and deserialization overhead. Every time you will try to get an 1 MB object graph from an out of process cache, it will consume significant CPU to build that object graph in memory.

 Collapse | Copy Code
var largeObjectGraph = myCache.Get("LargeObjectGraph");
var anItem =
largeObjectGraph.FirstLevel.SecondLevel.ThirdLevel.FourthLevel.TheItemWeNeed;

Solution is not to cache the large object graph as a single item in the cache using a single key. Instead you should break that large object graph into smaller items and then cache those smaller items individually. You should only retrieve from cache the smallest item you need.

 Collapse | Copy Code
// store smaller parts in cache as individual item
var largeObjectGraph = new VeryLargeObjectGraph();
myCache.Add("LargeObjectGraph.FirstLevel.SecondLevel.ThirdLevel",
largeObjectGraph.FirstLevel.SecondLevel.ThirdLevel);
...
...
// get the smaller parts from cache
var thirdLevel = myCache.Get("LargeObjectGraph.FirstLevel.SecondLevel.ThirdLevel");
var anItem = thirdLevel.FourthLevel.TheItemWeNeed;

The idea is to look at the items that you need most frequently from the large object (say the connection strings from a configuration object graph) and store those items separately in the cache. Always keep in mind that the item that you retrieve from cache is always small, say max 8 KB.

Using Cache to Share Objects Between Multiple Threads

Since you can access cache from multiple threads, sometimes you use it to conveniently pass data between multiple threads. But cache, like static variables, can suffer from race conditions. It’s even more common when the cache is distributed since storing and reading an item requires out-of-process communication and your threads get more chance to overlap on each other than in-memory cache. The following example shows how in-memory cache rarely demonstrates the race condition but an out-of-process cache almost always shows it:

 Collapse | Copy Code
myCache["SomeItem"] = 0;

var thread1 = new Thread(new ThreadStart(() =>
{
var item = myCache["SomeItem"]; // Most likely 0
item ++;
myCache["SomeItem"] = item;
});
var thread2 = new Thread(new ThreadStart(() =>
{
var item = myCache["SomeItem"]; // Most likely 1
item ++;
myCache["SomeItem"] = item;
});
var thread3 = new Thread(new ThreadStart(() =>
{
var item = myCache["SomeItem"]; // Most likely 2
item ++;
myCache["SomeItem"] = item;
}); thread1.Start();
thread2.Start();
thread3.Start();
.
.
.

The above code most of the time demonstrates the most likely behavior when you are using in-memory cache. But when you go out-of-process or distributed, it will always fail to demonstrate the most-likely behavior. You need to implement some kind of locking here. Some caching provider allows you to lock an item. For example, Velocity has locking feature, but memcache does not. In Velocity, you can lock an item:

 Collapse | Copy Code
// get an item and lock it
DataCacheLockHandle handle;
SomeClass someItem = _defaultCache.GetAndLock("SomeItem",
TimeSpan.FromSeconds(1), out handle, true) as SomeClass;
// update an item
someItem.FirstName = "Version2";
// put it back and get the new version
DataCacheItemVersion version2 = _defaultCache.PutAndUnlock("SomeItem",
someItem, handle);

You can use locking to reliably read and write to cache items that get changed by multiple threads.

Assuming Items will be in Cache Immediately After Storing It

Sometimes you store an item in cache on a submit button click and assume that upon the page postback, the item can be read from cache because it was just stored in cache. You are wrong.

 Collapse | Copy Code
private void SomeButton_Clicked(object sender, EventArgs e)
{
myCache["SomeItem"] = someItem;
} private void OnPreRender()
{
var someItem = myCache["SomeItem"]; // It's gone dude!
Render(someItem);
}

You can never assume an item will be in cache for sure. Even if you are storing the item in Line 1 and reading it from Line 3. When your application is under pressure and there’s a scarcity of physical memory, cache will flush out items that aren’t frequently used. So, by the time code reaches Line 3, cache could be flushed out. Never assume you can always get an item back from cache. Always have a null check and retrieve from persistent storage.

 Collapse | Copy Code
var someItem = myCache["SomeItem"] as SomeClass ?? GetFromSource();

You should always use this format when reading an item from cache.

Storing Entire Collection with Nested Objects

Sometimes you store an entire collection in a single cache item because you need to access the items in the collection frequently. Thus every time you try to read an item from the collection, you have to load the collection first and then read that particular item. Something like this:

 Collapse | Copy Code
var products = myCache.Get("Products");
var product = products[1];

This is inefficient. You are unnecessarily loading an entire collection just to read a certain item. You will have absolutely no problem when the cache is in-memory, as the cache will just store a reference to the collection. But in a distributed cache, where the entire collection is deserialized every time you access it, it will result in poor performance. Instead of caching a whole collection, you should cache individual items separately.

 Collapse | Copy Code
// store individual items in cache
foreach (Product product in products)
myCache.Add("Product." + product.Index, product);
...
...
// read the individual item from cache
var product = myCache.Get("Product.0");

The idea is simple, you store each item in the collection individually using a key that can be guessed easily, for example using the index as a padding.

Storing Parent-child Objects Together and Also Separately

Sometimes you store an object in cache that has a child object, which you also separately store in another cache item. For example, say you have a customer object that has an order collection. So, when you cache customer, the order collection gets cached as well. But then you separately cache the individual orders. So, when an individual order is updated in cache, the orders collection containing the same order inside the customer object is not updated and thus gives you inconsistent result. Again this works fine when you have in-memory cache but fails when your cache is made out-of-process or distributed.

 Collapse | Copy Code
var customer = SomeCustomer();
var recentOrders = SomeOrders();
customer.Orders = GetCustomerOrders();
myCache.Add("RecentOrders", recentOrders);
myCache.Add("Customer", customer);
...
...
var recentOrders = myCahce.Get("RecentOrders");
var order = recentOrders["ORDER10001"];
order.Status = CANCELLED;
...
...
...
var customer = myCache.Get("Customer");
var order = customer.Orders["ORDER10001"];
order.Status = PROCESSING; // Inconsistent. The order has already been cancelled

This is a hard problem to solve. It requires clever design so that you never end up having the same object stored twice in the cache. One common approach is not to store child objects in cache, instead store keys of child object so that they can be retrieved from cache individually. So, in the above scenario, you would not store thecustomer’s order collection in cache. Instead you will store the OrderID collection with Customer and then when you need to see the orders of a customer, you try to load the individual order object using the OrderID.

 Collapse | Copy Code
var recentOrders = SomeOrders();
foreach (Order order in recentOrders)
myCache.Add("Order." + order.ID, order);
...
var customer = SomeCustomer();
customer.OrderKeys = GetCustomerOrders(); // Store keys only
myCache.Add("Customer", customer);
...
...
var order = myCache.Get["Order.10001"];
order.Status = CANCELLED;
...
...
...
var customer = myCache.Get("Customer");
var customerOrders = customer.OrderKeys.ConvertAll<string, Order>
(key => myCache.Get("Order." + key));
var order = customerOrders["10001"]; // Correct object from cache

This approach ensures that a certain instance of an entity is stored in the cache only once, no matter how many times it appears in collections or parent objects.

Caching Configuration Settings

Sometimes you cache configuration settings. You use some cache expiration logic to ensure the configuration is refreshed periodically or refreshed when the configuration file or database table changes. Since configuration settings are access very frequently, reading them from cache adds significant CPU overhead. Instead you should just use static variables to store configurations.

 Collapse | Copy Code
var connectionString = myCache.Get("Configuration.ConnectionString");

You should not follow such an approach. Getting an item from cache is not cheap. It may not be as expensive as reading from a file or registry. But it’s not very cheap either, especially if the item is a custom class that adds some serialization overhead. So, you should instead store the configuration settings in static variables. But you might ask, how do we refresh configuration without restarting appdomain when it’s stored in static variable? You can use some expiration logic like file listener to reload the configuration when configuration file changes or use some database polling to check for database update.

Caching Live Objects that have Open File, Registry or Network Handle

I have seen developers cache instance of classes which hold open connection to file, registry or external network connection. This is dangerous. When items are removed from cache, they aren’t disposed automatically. Unless you dispose such class, you leak system resource. Every time such a class instance is removed from cache due to expiration or some other reason without being disposed, it leaks the resources it was holding onto.

You should never cache such objects that hold open streams, file handles, registry handles or network connections just because you want to save opening the resource every time you need them. Instead you should use some static variable or use some in-memory cache that is guaranteed to give you expiration callback so that you can dispose them properly. Out of process caches or session stores do not give you expiration callback consistently. So, never store live objects there.

Storing Same Item using Multiple Keys

Sometimes you store objects in cache using the key and also by index because you not only need to retrieve items by key but also need to iterate through items using index. For example,

 Collapse | Copy Code
var someItem = new SomeClass();
myCache["SomeKey"] = someItem;
.
.
myCache["SomeItem." + index] = someItem;
.
.

If you are using in-memory cache, the following code will work fine:

 Collapse | Copy Code
var someItem = myCache["SomeKey"];
someItem.SomeProperty = "Hello";
.
.
.
var someItem = myCache["SomeItem." + index];
var hello = someItem.SomeProperty; // Returns Hello, fine, when In-memory cache
/* But fails when out of process cache */

The above code works when you have in-memory cache. Both of the items in the cache are referring to the same instance of the object. So, no matter how you get the item from cache, it always returns the same instance of the object. But in an out-of-process cache, especially in a distributed cache, items are stored after serializing them. Items aren’t stored by reference. Thus you store copies of items in cache, you never store the item itself. So, if you retrieve an item using a key, you are getting a freshly made copy of that item as the item is deserialized and created fresh every time you get it from cache. As a result, changes made to the object never reflects back to the cache unless you overwrite the item in the cache after making the changes. So, in a distributed cache, you will have to do the following:

 Collapse | Copy Code
var someItem = myCache["SomeKey"];
someItem.SomeProperty = "Hello";
myCache["SomeKey"] = someItem; // Update cache
myCache["SomeItem." + index] = someItem; // Update all other entries
.
.
.
var someItem = myCache["SomeItem." + index];
var hello = someItem.SomeProperty; // Now it works in out-of-process cache

Once you update the cache entry using the modified item, it works as the items in the cache receive a new copy of the item.

Not Updating or Deleting Objects from Cache when Items are Updated or Deleted from Data Source

This again works in in-memory cache, but fails when you go to out-of-process/distributed cache. Here’s an example:

 Collapse | Copy Code
var someItem = myCache["SomeItem"];
someItem.SomeProperty = "Hello Changed";
database.Update(someItem);
.
.
.
var someItem = myCache["SomeItem"];
Console.WriteLine(someItem.SomeProperty); // "Hello Changed"? Nope.

This works fine in an in-memory cache, but fails when it’s out-of-process or distributed cache. The reason is you changed the object but never updated the cache with the latest object. Items in cache are stored as a copy, not the original instance.

Another mistake is not deleting items from cache when the item is deleted from the database.

 Collapse | Copy Code
var someItem = myCache["SomeItem"];
database.Delete(someItem);
.
.
.
var someItem = myCache["SomeItem"];
Console.WriteLine(someItem.SomeProperty); // Works fine. Oops!

Don’t forget to delete items from cache, all possible ways it has been stored in cache, when you delete an item from database, file or some persistent store.

Conclusion

Caching requires careful planning and clear understanding of the data being cached. Otherwise when cache is made distributed, it not only performs worse but can also fail the code. Keeping these common mistakes in mind while caching will help you cash out from your code.

 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

[Forward]Ten Caching Mistakes that Break your App的更多相关文章

  1. UWP/Win10新特性系列—App Service

    Win10中,新增了一个很实用的新特性叫做App Service,App Service允许App不在前台运行的情况下提供出一个或多个对外服务供其他App使用,这看起来就好像Web开发中的Web Ap ...

  2. hook框架-frida使用-APP在模拟器无法打开,用钩子去除限制

    app拿soul为例子 一.环境配置 #模拟器的frida服务为86 #frida-server-12.9.8-android-x86 adb push frida-server-12.9.8-and ...

  3. Ionic + AngularJS

    Ionic Framework Ionic framework is the youngest in our top 5 stack, as the alpha was released in lat ...

  4. 使用Core Data应避免的十个错误

    原文:Avoiding Ten Big Mistakes iOS Developers Make with Core Data   http://www.cocoachina.com/applenew ...

  5. 《深入理解Android2》读书笔记(五)

    接上篇<深入理解Android2>读书笔记(四) startActivity Am void run() throws RemoteException { try { printMessa ...

  6. comp.lang.javascript FAQ [zz]

    comp.lang.javascript FAQ Version 32.2, Updated 2010-10-08, by Garrett Smith FAQ Notes 1 Meta-FAQ met ...

  7. NetCoreApi框架搭建三、JWT授权验证)

    1.首先还是粘贴大神的链接 虽然说大神的博客已经讲得很详细了,但是此处还是自己动手好点. 首先配置Startup Swagger的验证 2.新建一个项目存放tokenmodel和生成token并且存入 ...

  8. Flutter-动画-实践篇

    一.了解AnimatedWidget 通常我们给一个Widget添加动画的时候都需要监听Animation的addListener()方法,并在这个方法里面不停的调用setState()方法通知Wei ...

  9. I18N

    App.config <?xml version="1.0" encoding="utf-8" ?> <configuration> & ...

随机推荐

  1. Atitit 架构的原则attilax总结

    Atitit 架构的原则attilax总结 1.1. Rule of three称为"三次原则",指的是当某个功能第三次出现时,才进行"抽象化".是DRY原则和 ...

  2. [nginx]站点目录及文件访问控制

    nginx.conf配置文件 http ->多个server -> 多个location ->可限制目录和文件访问(根据i扩展名限制或者rewrite.) 根据目录或扩展名,禁止用户 ...

  3. 【Android】2.0 第2章 初识Android App

    分类:C#.Android.VS2015:  创建日期:2016-02-04 一.认识Android操作系统 Android最早由安迪•罗宾(Andy Rubin)创办,2007年被Google公司收 ...

  4. fpm制做mysql-5.6.33 rpm包

    增加用户: # groupadd -r mysql # useradd -g mysql -r -s /sbin/nologin -M -d /data/my_db mysql 源码安装mysql-5 ...

  5. 菜鸟教程之工具使用(一)——Git的基本使用

    Git是进来比较火的版本控制工具,大有取代svn的趋势.关于两种孰好孰坏我就不多费口舌了,网上关于二者的对比文章比比皆是.作为一个IT人员关注行业的发展动态是必须的,所以抽空研究了一下Git的使用.跟 ...

  6. Linux 服务管理两种方式service和systemctl

    Linux 服务管理两种方式service和systemctl 1.service命令 service命令其实是去/etc/init.d目录下,去执行相关程序 # service命令启动redis脚本 ...

  7. 每日英语:Got 5 Minutes? 'Flash Fiction' Catches On

    Chinese author Lao Ma has a simple approach to his short stories: In the face of life, everything is ...

  8. python(57):私有变量,代码块

    转载:http://blog.csdn.net/zhu_liangwei/article/details/7667745 引子 我热情地邀请大家猜测下面这段程序的输出: class A(object) ...

  9. 【ARM】ARM程序规范

    1.函数名单词之间用_隔开,每一个字母大写      Uart_Printf()    //这个由三星的TEST风格延续下来,因此没有参数时,必须加void,否则ADS会编译报警    void Te ...

  10. C#如何删除数组中的一个元素

    C#如何删除数组中的一个元素,剩余的元素组成新数组,数组名不变double[] arr = new double[n];需要删除的是第m+1个数据arr[m]求新数组arr.(新数组arr包含n-1个 ...