What’s the difference between data mining and data warehousing?
Data mining is the process of finding patterns in a given data set. These patterns can often provide meaningful and insightful data to whoever is interested in that data. Data mining is used today in a wide variety of contexts – in fraud detection, as an aid in marketing campaigns, and even supermarkets use it to study their consumers.
Data warehousing can be said to be the process of centralizing or aggregating data from multiple sources into one common repository.
Example of data mining
|
|
If you’ve ever used a credit card, then you may know that credit card companies will alert you when they think that your credit card is being fraudulently used by someone other than you. This is a perfect example of data mining – credit card companies have a history of your purchases from the past and know geographically where those purchases have been made. If all of a sudden some purchases are made in a city far from where you live, the credit card companies are put on alert to a possible fraud since their data mining shows that you don’t normally make purchases in that city. Then, the credit card company can disable your card for that transaction or just put a flag on your card for suspicious activity.
Another interesting example of data mining is how one grocery store in the USA used the data it collected on it’s shoppers to find patterns in their shopping habits.
They found that when men bought diapers on Thursdays and Saturdays, they also had a strong tendency to buy beer.
The grocery store could have used this valuable information to increase their profits. One thing they could have done – odd as it sounds – is move the beer display closer to the diapers. Or, they could have simply made sure not to give any discounts on beer on Thursdays and Saturdays. This is data mining in action – extracting meaningful data from a huge data set.
Subscribe to our newsletter for more free interview questions.
Example of data warehousing – Facebook
A great example of data warehousing that everyone can relate to is what Facebook does. Facebook basically gathers all of your data – your friends, your likes, who you stalk, etc – and then stores that data into one central repository. Even though Facebook most likely stores your friends, your likes, etc, in separate databases, they do want to take the most relevant and important information and put it into one central aggregated database. Why would they want to do this? For many reasons – they want to make sure that you see the most relevant ads that you’re most likely to click on, they want to make sure that the friends that they suggest are the most relevant to you, etc – keep in mind that this is the data mining phase, in which meaningful data and patterns are extracted from the aggregated data. But, underlying all these motives is the main motive: to make more money – after all, Facebook is a business.
We can say that data warehousing is basically a process in which data from multiple sources/databases is combined into one comprehensive and easily accessible database. Then this data is readily available to any business professionals, managers, etc. who need to use the data to create forecasts – and who basically use the data for data mining.
Datawarehousing vs Datamining
Remember that data warehousing is a process that must occur before any data mining can take place. In other words, data warehousing is the process of compiling and organizing data into one common database, and data mining is the process of extracting meaningful data from that database. The data mining process relies on the data compiled in the datawarehousing phase in order to detect meaningful patterns.
In the Facebook example that we gave, the data mining will typically be done by business users who are not engineers, but who will most likely receive assistance from engineers when they are trying to manipulate their data. The data warehousing phase is a strictly engineering phase, where no business users are involved. And this gives us another way of defining the 2 terms: data mining is typically done by business users with the assistance of engineers, and data warehousing is typically a process done exclusively by engineers.
What’s the difference between data mining and data warehousing?的更多相关文章
- Datasets for Data Mining and Data Science
https://github.com/mattbane/RecommenderSystem http://grouplens.org/datasets/movielens/ KDDCUP-2012官网 ...
- Machine Learning and Data Mining(机器学习与数据挖掘)
Problems[show] Classification Clustering Regression Anomaly detection Association rules Reinforcemen ...
- Distributed Databases and Data Mining: Class timetable
Course textbooks Text 1: M. T. Oszu and P. Valduriez, Principles of Distributed Database Systems, 2n ...
- What is the most common software of data mining? (整理中)
What is the most common software of data mining? 1 Orange? 2 Weka? 3 Apache mahout? 4 Rapidminer? 5 ...
- A web crawler design for data mining
Abstract The content of the web has increasingly become a focus for academic research. Computer prog ...
- cluster analysis in data mining
https://en.wikipedia.org/wiki/K-means_clustering k-means clustering is a method of vector quantizati ...
- Weka 3: Data Mining Software in Java
官方网站: Weka 3: Data Mining Software in Java 相关使用方法博客 WEKA使用教程(经典教程转载) (实例数据:bank-data.csv) Weka初步一.二. ...
- data mining,machine learning,AI,data science,data science,business analytics
数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)的区别是什么? 数据科学(data science)和商业分析(business analytics ...
- 数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)的区别是什么? 数据科学(data science)和商业分析(business analytics)之间有什么关系?
本来我以为不需要解释这个问题的,到底数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)有什么区别,但是前几天因为有个学弟问我,我想了想发现我竟然也回答 ...
随机推荐
- YSLOW
什么是YSlow? YSlow是Yahoo发布的一款基于FireFox的插件. 如何安装YSlow? 安装YSlow必须首先先安装 Firebug,然后下载YSlow,再对其安装. YSlow有什么用 ...
- 【JAVA】Spring 事物管理
在Spring事务管理中通过TransactionProxyFactoryBean配置事务信息,此类通过3个重要接口完成事务的配置及相关操作,分别是PlatformTransactio ...
- 20145330《Java程序设计》课程总结
20145330第八周<Java学习笔记> 每周读书笔记汇总 第一周学习总结 第二周学习总结 第三周学习总结 第四周学习总结 第五周学习总结 第六周学习总结 第七周学习总结 第八周学习总结 ...
- linux下搭建svn代码库
1.安装svn客户端 2.创建svn代码库 1.安装svn客户端 1.1.使用命令安装 1)CentOS $ yum install subversion 2)ubuntu sudo apt-get ...
- C#读取文本播放相应语音【转】
第一种方案: 利用微软text to speech引擎(TTS),读取文本 (1)添加Microsoft Speech Object Library的项目引用 (2)引入using SpeechLib ...
- JSP 页面缓存以及清除缓存
一.概述 缓存的思想可以应用在软件分层的各个层面.它是一种内部机制,对外界而言,是不可感知的. 数据库本身有缓存,持久层也可以缓存.(比如:hibernate,还分1级和2级缓存) 业务层也可以有缓存 ...
- Redis常用命令(一)
Redis::__construct描述:创建一个Redis客户端范例:$redis = new Redis(); connect, open描述:实例连接到一个Redis.参数:host: stri ...
- #define is unsafe——I
I. #define is unsafe Have you used #define in C/C++ code like the code below? #include <stdio.h&g ...
- [CareerCup] 18.12 Largest Sum Submatrix 和最大的子矩阵
18.12 Given an NxN matrix of positive and negative integers, write code to find the submatrix with t ...
- 修改wamp默认网站目录
使用WAMP集成环境,如何更改web根目录 做php开发使用WAMP集成环境的同学大部分有过这样的经历: 如果你试图修改web根目录,那么你肯定会想到要修改apache/apache2.2.11/co ...