Chapter 1 data mining is knowledge discovery from data; The knowledge discovery process is an iterative sequence of 7 steps: data cleaning: to remove noise and inconsistent data data integration: where multiple data sources may be combined (step1 and…
data ------> knowledge Are all patterns interesting? No. only a small fraction of the patterns potentially generated would actually be of interest to a given user. What makes a pattern interesting? easily understood by humans valid potentially useful…
Why: real-world data are typically noisy, enormous in volume, and may originate from a hodgepodge of heterogeneous sources. mean; median; mode(most common value); distribution; Knowing such basic statistics regarding each attribute makes it easier to…
Course textbooks Text 1: M. T. Oszu and P. Valduriez, Principles of Distributed Database Systems, 2nd ed., Prentice-Hall, 1999.Errata Text 2: J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2000.Errata Lecture Schedule Th…
Problems[show] Classification Clustering Regression Anomaly detection Association rules Reinforcement learning Structured prediction Feature engineering Feature learning Online learning Semi-supervised learning Unsupervised learning Learning to rank…
Abstract The content of the web has increasingly become a focus for academic research. Computer programs are needed in order to conduct any large-scale processing of web pages, requiring the use of a web crawler at some stage in order to fetch the pa…
Learning Resources 书籍: 期刊: 业界先驱: 开阔视野,掌握业界最新动态. 工具: 数据挖掘是很多学科的综合体: 甭管叫什么名字,归根到底都是数据挖掘: Comprehensive Learning: Learning != Listening 数据 What is Big Data? Big Data: Data Mning Data Integration & Analasis The Process of Data Mining DM Techniques -- Cla…
$textbf{Trajectory Data Mining: An Overview}$ 很好的一篇概述,清晰明了地阐述了其框架,涉及内容又十分宽泛.值得细读. 未完成,需要补充. $textbf{Trajectory Data}$:主要分为四个类别 $texttt{Mobility of people}$ $texttt{Mobility of transportation}$ $texttt{Mobility of animals}$ $texttt{Mobility of natural…
新书到手 TRANSACTION PROCESSING:CONCEPTS AND TECHNIQUES Jim Gray大神的著作 本文版权归作者所有,未经作者同意不得转载.…
What is the most common software of data mining? 1 Orange? 2 Weka? 3 Apache mahout? 4 Rapidminer? 5 R? and which one? If you have any explanation about the topic, I appreciate it.…