The Story:

Recently, I’m working as a deployment engineer at customer site with my team members. The system under deployment consists of a cluster service, a NoSql database and some client applications which interact with the cluster/database by using the Dynamic Link Library (DLL) provided by the vender.

We were requested to deploy one new module which collects the database usage and health statistics such as running time, average read/write rates, database response delays and etc. Let’s call it Statistics Service as brief. We were also asked to upgrade the DLL for each client programs. I didn’t think too much even a big question mark in my mind: “Why we also need to upgrade the DLL for client for the Statistics Service?”

Everything went well at the beginning, but after deployment, we found that some statistics items were still zero. We re-checked our upgrade process including checking the configuration files, the file md5sums, DLL version numbers. It seems nothing was wrong. We asked the developer in the other city for help but no valuable information still. Then we did the deployment again and still no luck.

After several days, problems were still there. We started to analysis the log of the Statistics Service line by line. We happened to recognize a phenomenon that if the client program creates the connection to the database before the Statistics Service is ready, the Statistics Service won’t initiate statistics data collection for this client. The question popped up in my mind again: “Why we also need to upgrade the DLL for client for the statistics service?”

I made a call to the developer to check why the system has such weird behavior. To my surprise, he said that the clients notify the statistics service via the DLL method calls quietly. And the DLL’s cluster connection method creates a TCP connection to the Statistics Service in the background for the clients. I asked why? He says: “I’m a newcomer; I failed to persuade the developers of the cluster service to monitor the statistics data and notify the Statistics Service. So we added code for collecting the statistics data in the DLL file for client programs which is an easiest and fastest way.” And later I asked another question: “why we still have problems that some statistics items are still zero? We already checked that DLL version is the latest one. ” Again, he managed to surprise me: “We just found that we might released different DLL versions with the same version number before.” Later that day, we received a new DLL and after upgrading, everything went OK!

No, it’s still far away from OK!!!

The Questions:

Why such a version control pitfall happened?

Why does the DLL care about statistics and does weird work furtively?

Why did the vender choose this easiest and fastest design without carefully thinking about the whole system and the consequences?

Further Reading:

What is DLL? http://zh.wikipedia.org/wiki/%E5%8A%A8%E6%80%81%E9%93%BE%E6%8E%A5%E5%BA%93

Lazy Makes Others Busy – a bad experience with DLL的更多相关文章

  1. Optimizing Oracle RAC

    Oracle Real Application Clusters (RAC) databases form an increasing proportion of Oracle database sy ...

  2. 转:Busy Developers' Guide to HSSF and XSSF Features

    Busy Developers' Guide to Features Want to use HSSF and XSSF read and write spreadsheets in a hurry? ...

  3. NFS : device is busy

    unmount [ options ] -f : Force unmount (in case of an unreachable NFS system). -l  : Lazy unmount. D ...

  4. POJ 1337 A Lazy Worker(区间DP, 背包变形)

    Description There is a worker who may lack the motivation to perform at his peak level of efficiency ...

  5. 3 Ways to Force Unmount in Linux Showing “device is busy”

    3 Ways to Force Unmount in Linux Showing “device is busy” Updated August 8, 2019By Bobbin ZachariahL ...

  6. 代码的坏味道(15)——冗余类(Lazy Class)

    坏味道--冗余类(Lazy Class) 特征 理解和维护类总是费时费力的.如果一个类不值得你花费精力,它就应该被删除. 问题原因 也许一个类的初始设计是一个功能完全的类,然而随着代码的变迁,变得没什 ...

  7. Mach-O 的动态链接(Lazy Bind 机制)

    ➠更多技术干货请戳:听云博客 动态链接 要解决空间浪费和更新困难这两个问题最简单的方法就是把程序的模块相互分割开来,形成独立的文件,而不再将它们静态的链接在一起.简单地讲,就是不对那些组成程序的目标文 ...

  8. Pramp - mock interview experience

    Pramp - mock interview experience   February 23, 2016 Read the article today from hackerRank blog on ...

  9. ORA-00054: resource busy and acquire with NOWAIT specified

    删除表时遇到 ORA-00054:资源正忙,要求指定NOWAIT 错误.以前在灾备中心遇到过. 资源被锁定了,没有办法删除. 报错日志:ORA-00054: resource busy and acq ...

随机推荐

  1. Codeforces Round #288 (Div. 2)D. Tanya and Password 欧拉通路

    D. Tanya and Password Time Limit: 20 Sec Memory Limit: 256 MB 题目连接 http://codeforces.com/contest/508 ...

  2. TP的一条sql语句(子查询)

    $model=M(''); $model->table(C('DB_PREFIX').'goods as g') ->join(C('DB_PREFIX').'orders as o on ...

  3. PHP __autoload函数(自动载入类文件)的使用方法(转)

    详细出处参考:http://www.jb51.net/article/29625.htm 在使用PHP的OO模式开发系统时,通常大家习惯上将每个类的实现都存放在一个单独的文件里,这样会很容易实现对类进 ...

  4. 文件系统缓存dirty_ratio与dirty_background_ratio两个参数区别

    这两天在调优数据库性能的过程中需要降低操作系统文件Cache对数据库性能的影响,故调研了一些降低文件系统缓存大小的方法,其中一种是通过修改/proc/sys/vm/dirty_background_r ...

  5. iPhone重绘机制drawRect

    如何使用iPhone进行绘图.重绘操作iPhone的绘图操作是在UIView类的drawRect方法中完成的,所以如果我们要想在一个UIView中绘图,需要写一个扩展UIView 的类,并重写draw ...

  6. Golang学习 - reflect 包

    ------------------------------------------------------------ 在 reflect 包中,主要通过两个函数 TypeOf() 和 ValueO ...

  7. 记录一下bing的图片 - 升级版冰糖葫芦

    记录一下bing的图片 - 升级版冰糖葫芦

  8. 【PHP代码审计】 那些年我们一起挖掘SQL注入 - 7.全局防护盲点的总结上篇

    0x01 背景 现在的WEB应用对SQL注入的防护基本都是判断GPC是否开启,然后使用addlashes函数对单引号等特殊字符进行转义.但仅仅使用这样的防护是存在很多盲点的,比如最经典的整型参数传递, ...

  9. 计算圆周率 Pi (π)值, 精确到小数点后 10000 位 只需要 30 多句代码

    大家都知道π=3.1415926……无穷多位, 历史上很多人都在计算这个数, 一直认为是一个非常复杂的问题.现在有了电脑, 这个问题就简单了.电脑可以利用级数计算出很多高精度的值, 有关级数的问题请参 ...

  10. 请谨慎使用 @weakify 和 @strongify

    来源:酷酷的哀殿 链接:http://www.jianshu.com/p/d8035216b257 前言 相信大部分见过 @weakify 和 @strongify 的开发者都会喜欢上这两个宏.但是很 ...