Data mining is the process of finding patterns in a given data set. These patterns can often provide meaningful and insightful data to whoever is interested in that data. Data mining is used today in a wide variety of contexts – in fraud detection, as an aid in marketing campaigns, and even supermarkets use it to study their consumers.

Data warehousing can be said to be the process of centralizing or aggregating data from multiple sources into one common repository.

Example of data mining

If you’ve ever used a credit card, then you may know that credit card companies will alert you when they think that your credit card is being fraudulently used by someone other than you. This is a perfect example of data mining – credit card companies have a history of your purchases from the past and know geographically where those purchases have been made. If all of a sudden some purchases are made in a city far from where you live, the credit card companies are put on alert to a possible fraud since their data mining shows that you don’t normally make purchases in that city. Then, the credit card company can disable your card for that transaction or just put a flag on your card for suspicious activity.

Another interesting example of data mining is how one grocery store in the USA used the data it collected on it’s shoppers to find patterns in their shopping habits.
They found that when men bought diapers on Thursdays and Saturdays, they also had a strong tendency to buy beer.

The grocery store could have used this valuable information to increase their profits. One thing they could have done – odd as it sounds – is move the beer display closer to the diapers. Or, they could have simply made sure not to give any discounts on beer on Thursdays and Saturdays. This is data mining in action – extracting meaningful data from a huge data set.

Subscribe to our newsletter for more free interview questions.

Example of data warehousing – Facebook

A great example of data warehousing that everyone can relate to is what Facebook does. Facebook basically gathers all of your data – your friends, your likes, who you stalk, etc – and then stores that data into one central repository. Even though Facebook most likely stores your friends, your likes, etc, in separate databases, they do want to take the most relevant and important information and put it into one central aggregated database. Why would they want to do this? For many reasons – they want to make sure that you see the most relevant ads that you’re most likely to click on, they want to make sure that the friends that they suggest are the most relevant to you, etc – keep in mind that this is the data mining phase, in which meaningful data and patterns are extracted from the aggregated data. But, underlying all these motives is the main motive: to make more money – after all, Facebook is a business.

We can say that data warehousing is basically a process in which data from multiple sources/databases is combined into one comprehensive and easily accessible database. Then this data is readily available to any business professionals, managers, etc. who need to use the data to create forecasts – and who basically use the data for data mining.

Datawarehousing vs Datamining

Remember that data warehousing is a process that must occur before any data mining can take place. In other words, data warehousing is the process of compiling and organizing data into one common database, and data mining is the process of extracting meaningful data from that database. The data mining process relies on the data compiled in the datawarehousing phase in order to detect meaningful patterns.

In the Facebook example that we gave, the data mining will typically be done by business users who are not engineers, but who will most likely receive assistance from engineers when they are trying to manipulate their data. The data warehousing phase is a strictly engineering phase, where no business users are involved. And this gives us another way of defining the 2 terms: data mining is typically done by business users with the assistance of engineers, and data warehousing is typically a process done exclusively by engineers.

What’s the difference between data mining and data warehousing?的更多相关文章

  1. Datasets for Data Mining and Data Science

    https://github.com/mattbane/RecommenderSystem http://grouplens.org/datasets/movielens/ KDDCUP-2012官网 ...

  2. Machine Learning and Data Mining(机器学习与数据挖掘)

    Problems[show] Classification Clustering Regression Anomaly detection Association rules Reinforcemen ...

  3. Distributed Databases and Data Mining: Class timetable

    Course textbooks Text 1: M. T. Oszu and P. Valduriez, Principles of Distributed Database Systems, 2n ...

  4. What is the most common software of data mining? (整理中)

    What is the most common software of data mining? 1 Orange? 2 Weka? 3 Apache mahout? 4 Rapidminer? 5 ...

  5. A web crawler design for data mining

    Abstract The content of the web has increasingly become a focus for academic research. Computer prog ...

  6. cluster analysis in data mining

    https://en.wikipedia.org/wiki/K-means_clustering k-means clustering is a method of vector quantizati ...

  7. Weka 3: Data Mining Software in Java

    官方网站: Weka 3: Data Mining Software in Java 相关使用方法博客 WEKA使用教程(经典教程转载) (实例数据:bank-data.csv) Weka初步一.二. ...

  8. data mining,machine learning,AI,data science,data science,business analytics

    数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)的区别是什么? 数据科学(data science)和商业分析(business analytics ...

  9. 数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)的区别是什么? 数据科学(data science)和商业分析(business analytics)之间有什么关系?

    本来我以为不需要解释这个问题的,到底数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)有什么区别,但是前几天因为有个学弟问我,我想了想发现我竟然也回答 ...

随机推荐

  1. BZOJ4546(原) : 三元组

    设$f(x)=\sum_{x|d}p(d)$. 则$ans=\sum_{i=1}^n\sum_{j=1}^n\sum_{k=1}^n\mu(i)\mu(j)\mu(k)f(lcm(i,j))f(lcm ...

  2. UVa 3487 & 蜜汁建图

    题意: 有两家公司都想向政府申请某些资源的使用权,并且他们都提供了一些申请列表,列表中含有申请费用和资源种类,同一家公司的申请列表之间不含有重复的资源.政府只可以完整地接受和拒绝谋一份申请列表,问政府 ...

  3. 纯CSS 实现tooltip 内容提示信息效果

    Tooltip 也就是内容的提示信息,合理使用可以给用户比较好的体验. 实现方法有很多种,有很多JS 插件,我这里介绍的是纯CSS实现的方法,兼容性也比较靠谱,IE8+均可正常显示.实现方法也非常简单 ...

  4. [Cocos2d-x For WP8]ActionManager动作管理

    在Cocos2d-x里面可以通过CCActionManger类来管理动作的暂停和恢复,CCActionMessage是管理所有Action的单例,一般情况下并不直接使用这个单例,而是使用CCNode的 ...

  5. 友盟微博分享Undefined symbols for architecture x86_64: "_OBJC_CLASS_$_CTTelephonyNetworkInfo", referenced from: objc-class-ref in libWeiboSDK.a

    一,分析过程 1.第一次看到这个问题,以为是缺少导入框架或缺少编译文件,导入了微博 sso 框架和编译文件后仍有问题. 2.上网搜了搜也就以上两方面的问题. 3.于是我又仔细看了一遍友盟的分享接口文档 ...

  6. C语言(2)

    C语言(2)---变量 基本格式: 变量类型  变量名1[,变量名2,变量名3,...变量名n]: 注意: 1.在C语言中如果申请一个变量,里面存放小数,则用float表示,且在输出时需要注意prin ...

  7. Selenium_webdriver获取iframe子页面元素

    有时候我们在定位一个页面元素的时候发现一直定位不了,反复检查自己写的定位器没有任何问题,代码也没有任何问题.这时你就要看一下这个页面元素是否在一个iframe中,这可能就是找不到的原因之一.如果你在一 ...

  8. Java_DES 加密和解密源码

    Java密码学结构设计遵循两个原则: 1) 算法的独立性和可靠性. 2) 实现的独立性和相互作用性. 算法的独立性是通过定义密码服务类来获得.用户只需了解密码算法的概念,而不用去关心如何实现这些概念. ...

  9. 自己签发免费ssl证书

    自己制作ssl证书:自己签发免费ssl证书,为nginx生成自签名ssl证书 这里说下Linux 系统怎么通过openssl命令生成 证书. 首先执行如下命令生成一个keyopenssl genrsa ...

  10. Linux 下安装mysql 链接库

    1.mysql 客户端 开发 链接库 1.1)CentOS yum install mysql-devel