前言:

众所周知,关联规则挖掘是数据挖掘中重要的一部分,如著名的啤酒和尿布的问题。今天要学习的是经典的关联规则挖掘算法——Apriori算法

一、算法的基本原理

由k项频繁集去导出k+1项频繁集。

二、算法流程

1.扫描事务数据库,找出1项集,并根据最小支持度计数,剪枝得出频繁1项集。k=1.

2.由频繁k项集进行连接步操作,形成候选的k+1项集,并扫描数据库,得出每一项的支持度计数,并根据最小支持度计数,剪枝得到频繁k+1项集。

迭代的进行第2步直到频繁k项集是空的。

3.由频繁项集构造关联规则。先列出所有可能的关联规则,然后计算相应的置信度,最终筛选出满足最小置信度的强关联规则。

三、算法实现

测试数据集:data.txt

1, 7, 15, 44, 49
2, 1, 19
3, 1, 19
4, 3, 4, 15, 18, 35, 44
5, 2, 4, 7, 9, 23
6, 14, 21, 44
7, 4, 12, 31, 36, 44, 48
8, 15, 27, 28
9, 2, 28
10, 3, 18, 35
11, 23, 24, 40, 41, 43
12, 20, 43, 48
13, 49
14, 1, 19, 26
15, 5, 22, 39
16, 16, 32, 45
17, 4, 6, 9, 10, 16, 22
18, 1, 19, 23
19, 7, 11, 37, 45
20, 3, 18, 32, 35
21, 1, 8, 19, 47
22, 34, 39, 44
23, 13, 19
24, 4, 9, 38
25, 7, 22, 48
26, 7, 11, 14
27, 23, 24, 40, 41, 43
28, 9, 14
29, 0, 2, 42
30, 13, 35
31, 23
32, 8, 21, 25, 38
33, 4, 46
34, 23, 24, 40, 41, 43
35, 4, 17, 29, 47
36, 12, 31, 36
37, 14, 22, 26, 37, 44
38, 0, 16, 30, 32, 45, 47
39, 1, 11, 19, 25, 27, 29, 46
40, 15, 16, 18, 21, 26
41, 4, 10, 14
42, 3, 36
43, 23, 27, 28
44, 15, 21, 40
45, 10, 19, 25, 32
46, 11, 22, 44
47, 8
48, 0, 2, 46
49, 33, 42
50, 28, 39
51, 7, 17, 28
52, 1, 19
53, 32, 34
54, 0, 2, 46
55, 15, 30, 45
56, 39, 49
57, 46
58, 4, 9, 19
59, 0, 2, 16, 19, 46
60, 17, 21, 40
61, 2, 4, 6, 9, 39
62, 23, 24, 40, 41, 43
63, 13, 27, 28
64, 40
65, 12, 14, 44
66, 27, 28
67, 1, 36
68, 12, 31, 36, 48
69, 3, 18, 35
70, 5, 12, 16
71, 30, 49
72, 7, 29, 30
73, 17, 19, 30
74, 1, 13, 36
75, 19, 33, 49
76, 1, 5, 14, 38
77, 31
78, 1, 5, 8, 14, 22, 26
79, 7, 27, 30, 36, 37, 43
80, 13, 35, 42
81, 5, 22, 32
82, 7, 11, 20, 28, 37, 45
83, 23, 24, 40, 41
84, 3, 5, 14, 22, 36
85, 12, 31, 36, 48
86, 45
87, 12, 31, 36, 48
88, 14, 44
89, 5, 19, 22, 34, 43
90, 0, 2, 45, 46
91, 16, 32, 45, 47, 48
92, 1, 48
93, 1, 3, 18, 35
94, 7, 11, 37, 45
95, 16, 26, 32, 41, 45
96, 1, 29, 30
97, 1, 19
98, 1
99, 14, 22, 37
100, 22, 34, 40
101, 31
102, 16, 17, 43
103, 8, 19, 20, 48
104, 1, 12, 15, 27, 31, 34, 36, 48
105, 3, 40, 45
106, 13, 18, 21, 41, 47, 48
107, 3, 18, 22, 27
108, 6, 16, 17, 31, 33, 42
109, 5, 22, 27, 35
110, 7, 15, 49
111, 6, 12, 26, 29, 31, 32, 36, 48
112, 11, 13, 14, 44
113, 23, 33, 35
114, 5, 8, 18, 29, 46
115, 41
116, 1, 3, 8, 27, 28
117, 1, 11, 19, 33
118, 23, 24, 40, 41
119, 16, 18, 23, 27, 29, 30, 40, 49
120, 14, 18, 44
121, 0, 2, 46
122, 3, 18, 35
123, 12, 31, 36, 48
124, 7, 11, 20, 29, 37, 45
125, 2, 24, 27
126, 5, 13, 22
127, 22, 42, 46, 47
128, 39, 41
129, 13, 23
130, 1, 24, 40, 42
131, 14, 20, 26, 44
132, 12, 31, 36, 48
133, 7, 21, 27, 28, 33, 36, 47
134, 17, 29, 47
135, 12, 31, 36
136, 4, 9, 11
137, 3, 11, 14, 18, 29, 35
138, 20, 33, 42
139, 22, 44, 46
140, 5, 6, 16, 22, 33, 38, 49
141, 0, 2, 12, 24, 26, 31, 46
142, 0, 6, 25
143, 5, 19, 22
144, 11, 13, 14, 35, 48
145, 17, 29, 35, 44, 47
146, 7, 15
147, 5, 22
148, 0, 1, 2, 46
149, 4, 9, 18, 41
150, 1, 12, 18, 19, 39
151, 0, 2, 19, 22, 46
152, 33, 37, 42
153, 0, 2, 37, 46
154, 11, 18, 35, 45
155, 4, 9, 15
156, 19, 24, 28, 35, 49
157, 5, 22, 37
158, 0, 2, 5, 19, 33, 35, 39, 46
159, 3, 4, 5, 48
160, 4, 6, 18, 28, 35
161, 3, 18, 31, 35
162, 6, 17, 25, 49
163, 17
164, 3, 8, 9, 20, 22, 23, 42
165, 4, 15, 17, 21, 26, 36, 48
166, 14, 41, 44
167, 19, 28, 42
168, 4, 9
169, 13, 31, 33, 41, 42
170, 5, 22, 28
171, 0, 16, 32
172, 5, 28, 43
173, 24, 36, 37, 42
174, 31
175, 40, 42
176, 11, 34, 48
177, 14, 28, 40, 43
178, 0, 13, 26
179, 16, 32, 45
180, 10, 14, 44, 46
181, 4, 7, 9, 19, 36
182, 23, 24, 40, 41, 43
183, 4, 9, 37
184, 5, 22, 31
185, 21, 24, 29
186, 0, 6, 22, 46
187, 3, 18, 21, 35, 39, 46
188, 12, 31, 36, 38, 48
189, 28
190, 23, 24, 40, 41, 43
191, 12, 24
192, 6, 27, 28
193, 23, 24, 40, 41, 43
194, 27, 28
195, 5, 14, 16, 22
196, 5, 22
197, 27, 28, 44, 47
198, 5, 22
199, 0, 2, 46
200, 0, 10, 30
201, 15, 33, 42
202, 4, 9, 40
203, 1, 36
204, 23, 24, 40, 41
205, 25, 27, 28, 32
206, 7, 11, 37, 45
207, 20, 48
208, 7, 11, 37, 45
209, 9, 27, 28
210, 7, 11, 37
211, 1, 2, 9, 22, 48
212, 7, 27, 39, 45
213, 1, 6, 8, 15, 18, 19, 21, 42
214, 23, 24, 40, 41
215, 4, 14, 20, 32, 36, 44
216, 16, 32, 45
217, 18, 26, 42, 45
218, 2, 11, 23
219, 11, 15, 35
220, 31, 39, 43, 45, 46
221, 0, 2, 28, 40, 46
222, 13, 24, 26, 35, 39, 47
223, 7, 21, 23, 30, 36
224, 23, 40, 47
225, 5, 22
226, 1, 5, 19, 30
227, 7, 15, 49
228, 7, 11, 37, 45
229, 23, 31, 34, 45
230, 7, 29, 48
231, 27, 28
232, 24, 44
233, 4, 5, 9, 39
234, 5, 22, 25, 28, 34, 46
235, 23, 32, 48
236, 16, 42
237, 5, 22
238, 4, 9, 15, 25
239, 1, 12, 16, 17, 19, 26, 48
240, 1, 3, 16, 19, 35
241, 11, 12
242, 4, 9, 10, 12, 45
243, 12, 31, 36, 46, 48
244, 17, 28, 47
245, 3, 18, 35
246, 4, 9
247, 7, 11, 37, 45
248, 5, 12, 22
249, 6, 14, 38, 48
250, 9, 10, 34, 36, 42
251, 3, 18, 35
252, 3
253, 14, 27
254, 0, 1, 19
255, 3, 18, 24, 35
256, 12, 31, 36, 48
257, 19, 31, 41
258, 1, 11, 14, 32, 34, 35
259, 5, 14, 28, 39, 49
260, 9, 14, 21, 46
261, 6
262, 5, 22, 31
263, 11, 40
264, 0, 2, 46
265, 23, 24, 40, 41, 43
266, 3, 14, 23, 44
267, 18, 22, 42, 49
268, 3, 24
269, 12, 31, 36, 48
270, 20, 25, 27
271, 15, 30, 39
272, 7, 15, 49
273, 12, 31, 36, 48
274, 16, 18, 32, 42
275, 27, 31, 33, 42
276, 5, 9, 22, 43
277, 2, 4, 36, 40, 41, 48
278, 19, 28, 30, 32
279, 2, 31, 39, 44
280, 1, 2, 19, 21, 23, 28, 45
281, 7, 10, 14, 16, 23, 32, 45
282, 4, 9, 14, 36, 44
283, 21, 29
284, 9, 13, 17, 46
285, 38, 39
286, 1, 8, 19
287, 0, 33, 42
288, 2, 33, 36, 47
289, 12, 31, 36
290, 28
291, 16, 27, 28
292, 7, 15, 49
293, 8, 21, 24, 33, 34, 40, 42, 44
294, 1, 14, 26, 32, 41
295, 23, 33, 42
296, 14, 21, 23, 24
297, 0, 2, 46
298, 11, 29
299, 1, 23, 33, 42, 47
300, 16, 32, 45
301, 3, 4, 9, 33, 37, 43
302, 19, 25, 30, 43, 46
303, 0, 2, 46
304, 16, 17, 34
305, 5, 21, 22, 25, 27, 31, 42
306, 5, 12, 16, 31, 36
307, 8, 22, 42
308, 3, 27, 36
309, 16, 32, 45
310, 12, 31, 36, 48
311, 5, 22
312, 7, 11, 37, 45
313, 9, 13, 20, 21, 24, 39, 41, 45
314, 1, 9, 15, 24, 37
315, 4, 9
316, 18, 35, 45
317, 0, 2, 46
318, 5, 10, 14, 47
319, 1, 4, 10, 14, 25, 36, 49
320, 12, 34
321, 7, 15, 49
322, 23, 28
323, 1, 19
324, 0, 2, 46
325, 1, 2, 19
326, 7, 11, 37, 45
327, 29, 33, 38, 45, 48
328, 4, 9, 24
329, 1, 19, 23, 30, 44
330, 5, 13
331, 12, 31, 36, 48
332, 0, 2, 46
333, 5, 14, 20, 24, 28, 31, 39, 46
334, 4, 9, 13, 33, 43, 46
335, 5, 14, 22, 29
336, 2, 8, 10, 19, 35, 45
337, 14, 24, 34, 44
338, 5, 22, 28, 30, 47
339, 0, 28, 47
340, 7, 15, 49
341, 5, 9, 28, 41
342, 1, 7, 11, 37
343, 26
344, 44
345, 0, 2, 46
346, 5, 10, 26, 30
347, 8, 12, 14, 33, 44, 47
348, 3, 4, 27, 42
349, 9, 11, 40, 45
350, 3, 4, 9, 11, 12, 47
351, 5, 22, 35
352, 26, 29, 45
353, 4, 9, 31
354, 0, 13, 27, 28, 31, 36
355, 17, 32, 47
356, 1, 31, 41
357, 14, 43, 44
358, 18, 35
359, 5, 10, 16
360, 33, 37
361, 4, 30, 31
362, 9, 14, 25
363, 14, 44
364, 23, 27, 28
365, 9, 18, 22
366, 5, 8, 22
367, 10, 29
368, 3, 15, 16, 20, 33, 45
369, 4, 8, 12, 25, 34
370, 9, 26, 28
371, 7, 9, 15, 49
372, 3, 48, 49
373, 5, 21, 30, 31, 43
374, 4, 9
375, 1, 19, 28
376, 4, 9, 26, 33, 47
377, 1, 13, 19, 41
378, 15, 18, 41
379, 14, 28, 48
380, 5, 22, 47
381, 49
382, 7, 15, 49
383, 8, 28, 47
384, 8, 25, 29
385, 0, 4, 10, 21
386, 41
387, 5, 23, 26, 44
388, 25
389, 4, 9
390, 12, 17, 26, 29, 31, 47
391, 3, 18, 35
392, 7, 11, 34, 37, 45
393, 9, 13, 18, 23, 25, 33, 40
394, 15
395, 5, 17, 22, 40, 48
396, 23, 33, 46
397, 7, 15, 49
398, 16, 32, 45
399, 7, 15, 49
400, 1, 48
401, 2
402, 36, 39, 49
403, 3, 4, 10
404, 34, 36
405, 7, 17, 34, 35, 46
406, 5, 22
407, 32, 34, 36, 42
408, 14, 46
409, 3, 18, 29, 35, 37, 48
410, 27, 33, 42
411, 27, 28
412, 27, 28
413, 4, 9, 13, 21
414, 1, 5, 7, 10, 21, 22, 30, 31
415, 7, 15, 49
416, 4, 5, 14, 23, 42
417, 17, 29, 47
418, 30
419, 5, 33, 42
420, 27, 28, 31
421, 7, 15, 49
422, 16, 32, 45
423, 1, 2, 8, 25, 32
424, 12, 13
425, 16, 44
426, 0, 17, 24
427, 26
428, 4, 10, 27, 28, 43
429, 14
430, 4, 19, 47
431, 33, 42
432, 47
433, 8, 17, 23, 29, 43, 47
434, 0, 5, 33, 46
435, 9, 18, 35, 38, 40, 47
436, 2, 4, 11, 39
437, 4, 5, 9, 23, 24
438, 4, 24, 32
439, 8, 47
440, 2, 19
441, 2, 4, 9, 40
442, 0, 2, 46
443, 11, 19, 33, 42
444, 16, 32, 45
445, 5, 7, 22, 32, 42
446, 4, 33, 47
447, 19, 27, 36
448, 1, 28, 40
449, 23
450, 4, 9, 31, 33
451, 4, 28, 47
452, 25, 27, 34, 49
453, 3, 18, 35
454, 9, 27, 28
455, 14, 15, 35, 36
456, 14, 27, 28
457, 12, 16, 34
458, 0, 2, 13, 18, 24, 36, 46, 47
459, 14, 32, 44
460, 16, 32, 45
461, 2, 8, 16
462, 7, 15, 49
463, 14, 26, 30, 39, 44
464, 1, 23, 32
465, 0, 12, 20, 22, 49
466, 16, 20, 32, 49
467, 23, 24, 40, 41
468, 1, 3, 29, 41, 42, 43, 46
469, 5, 16, 25, 48
470, 17, 29, 47
471, 11, 16, 32, 45
472, 4, 9, 13
473, 17, 47
474, 11, 32
475, 12, 33, 42, 46
476, 23, 24, 40, 41
477, 9, 13, 14, 33, 38, 49
478, 14, 15, 26, 47
479, 3, 18, 35
480, 3, 21, 44
481, 3, 46
482, 9
483, 16, 24, 30, 31, 48
484, 19
485, 33, 42
486, 4, 34
487, 23, 24, 40, 41, 43
488, 5, 22
489, 10, 31
490, 4, 17, 40, 47
491, 39, 47
492, 14, 15, 31
493, 5, 26, 33, 42, 44
494, 13, 30, 38
495, 2, 3, 18, 35, 47
496, 12, 31, 36, 48
497, 17, 27, 28
498, 17, 29, 47
499, 14, 44
500, 0, 2, 46
501, 0, 21, 33, 39
502, 14, 27, 44
503, 12, 31, 36
504, 5, 18
505, 7, 11, 37, 45
506, 0, 20
507, 23, 30
508, 0, 14, 16, 32
509, 8
510, 13, 26, 27, 28, 46
511, 4, 9, 12, 17, 37
512, 10, 20, 37
513, 3, 7, 11, 21, 23
514, 21, 31, 32
515, 0, 1, 5, 23, 32, 42, 44
516, 0, 2, 46
517, 34, 37, 47
518, 16, 28, 32, 45
519, 3, 18, 35
520, 0, 5, 23, 33, 35, 46, 48
521, 14, 28, 29, 44
522, 13, 15, 24
523, 8, 22, 23
524, 11, 14, 26, 28, 41, 43, 45
525, 22, 39
526, 7, 11, 37, 45
527, 0, 2, 15, 16, 22, 35, 46
528, 41, 42
529, 6, 9, 30
530, 4, 13, 37, 45, 49
531, 0, 16
532, 14, 27, 28, 37, 48
533, 13, 28
534, 13, 18
535, 14, 29, 31, 35, 41, 49
536, 15, 34, 46, 48
537, 15, 28
538, 3, 18, 35
539, 12, 31, 36, 48
540, 33, 35, 42
541, 16, 32, 45
542, 7, 9, 15, 43
543, 7, 11, 37, 45
544, 12, 31, 36, 48
545, 14
546, 5, 8, 14, 21, 33, 38
547, 0, 3, 10, 38, 40
548, 10, 14, 18, 47
549, 11, 37, 38
550, 1, 3, 18, 19, 35
551, 3, 5, 22
552, 9, 24, 27, 34
553, 4, 7, 9
554, 0, 12, 35
555, 28, 29, 37
556, 9, 11, 28, 33, 41
557, 14, 44
558, 4, 11, 41, 43
559, 17, 47
560, 16, 20, 27, 32, 45
561, 2, 14, 42
562, 4, 9, 21, 43
563, 17, 29, 47
564, 23, 28, 33, 42
565, 6, 46
566, 3, 18, 35
567, 5, 9, 14, 18, 40
568, 4, 9, 34
569, 13, 14, 43, 44
570, 27, 28
571, 16, 32
572, 6, 14, 22
573, 0, 10, 41
574, 0, 18, 27, 31, 34, 37, 39, 47
575, 5, 7, 22
576, 4, 7
577, 7, 15, 49
578, 4, 15
579, 1, 16, 18, 20, 25, 29, 36, 46
580, 13, 30, 42
581, 24, 25, 31
582, 4, 7, 15, 18
583, 5, 22
584, 28
585, 23, 33, 42
586, 44
587, 15, 19, 27, 28
588, 0, 11, 18, 23, 47
589, 5, 22
590, 7, 11, 37
591, 7, 11, 37, 45
592, 6, 16, 33, 39, 42
593, 23, 24, 40, 41
594, 0, 9, 30, 31
595, 1, 32, 48
596, 2, 27, 28, 43
597, 1, 19
598, 35
599, 3, 10, 18, 35
600, 4, 9, 12, 48
601, 1, 6, 19
602, 7, 15, 49
603, 20, 22, 27
604, 13
605, 15, 16, 26
606, 3, 18, 35
607, 19
608, 5, 22
609, 1, 19
610, 12, 36, 37, 39
611, 2, 3, 10, 34
612, 30, 44
613, 13, 20, 22, 47
614, 4, 12, 26, 45
615, 13, 27, 35, 36
616, 0, 2, 46
617, 7, 10
618, 1, 2, 34, 39
619, 7, 11, 37, 45
620, 39, 43, 44
621, 3, 14, 30
622, 2, 12, 30, 32, 49
623, 1, 19, 32
624, 10, 21, 28, 31
625, 9, 17, 27, 28, 45
626, 10, 11, 16, 32
627, 12, 31, 36, 48
628, 3, 18, 29, 35, 40
629, 23, 24, 40, 41, 43
630, 14, 17, 29, 44, 47
631, 38
632, 0, 2, 46
633, 3, 18, 20, 35
634, 17, 29, 43, 47
635, 0, 6, 19, 26, 43
636, 3, 18, 35
637, 5, 21, 26, 27, 28, 39, 42, 49
638, 12, 31, 36, 48
639, 0, 2, 30, 46, 47
640, 5, 22, 33
641, 14, 33, 42
642, 0, 30, 43, 46
643, 11, 17, 35, 47
644, 0, 2, 46
645, 16, 32, 45
646, 20, 37, 39, 42
647, 14, 30, 44
648, 36, 39
649, 1, 17, 25, 37, 41, 49
650, 30, 36, 44
651, 12, 31, 36, 48
652, 24, 27, 28, 34, 39, 45, 48
653, 0, 37, 46
654, 18, 25, 44
655, 4, 9
656, 5, 22
657, 33, 42
658, 0, 26, 27, 28
659, 5
660, 2, 22, 33, 42
661, 7, 15, 49
662, 4, 9, 13, 17, 37
663, 5, 22, 35, 38, 47
664, 0, 2, 46
665, 27, 28, 39, 42
666, 12, 31, 36, 48
667, 27, 28
668, 16, 32, 45, 49
669, 21, 40
670, 5, 22, 48
671, 23, 33, 39, 42, 45
672, 1, 7, 25
673, 33, 42
674, 16, 19, 32, 38, 45
675, 7, 23, 30, 46
676, 42
677, 2, 10, 23, 42
678, 26, 30, 48
679, 7, 15, 49
680, 12, 25, 39, 48
681, 16, 17, 33, 47, 49
682, 49
683, 27, 45
684, 1, 11, 21, 42, 43
685, 8, 9, 22
686, 27, 28
687, 0, 2, 26, 43, 46
688, 16, 32, 45
689, 10, 22, 39
690, 7, 11, 14, 18, 23
691, 5, 22, 43
692, 17, 23, 29, 47
693, 1, 2, 15, 19
694, 7, 11, 37, 45
695, 5, 12, 22, 44
696, 8, 11, 22, 37
697, 20, 29
698, 7, 9, 11, 13, 37, 39, 45
699, 7, 11, 37, 45
700, 3, 36, 42, 49
701, 16, 32, 45
702, 21, 35
703, 0, 2, 46
704, 10, 18, 44
705, 6, 15, 21, 27, 28, 34, 41, 46
706, 7, 11, 37, 45
707, 16, 40
708, 37, 43
709, 7, 15, 29, 49
710, 12, 31, 36, 48
711, 5, 22, 24
712, 12, 13, 14, 32
713, 4, 5, 9, 32
714, 10, 40, 44
715, 3, 18, 35
716, 1, 13, 20, 31, 45
717, 37, 47, 48
718, 4, 9
719, 1, 19
720, 4, 12, 31
721, 10, 33
722, 14, 44
723, 5, 15, 34
724, 23, 36
725, 5, 29, 32, 35, 42
726, 3, 11, 29
727, 4, 6, 8, 10, 18, 22, 35, 37
728, 21, 24, 27
729, 33, 39
730, 27, 28
731, 6, 10, 44
732, 22
733, 0, 46
734, 16, 32, 45
735, 3, 9, 37, 42, 44
736, 19, 21, 23, 27, 30
737, 5, 19, 30
738, 5, 15, 19, 22, 23
739, 5, 22
740, 3, 18, 35
741, 17, 29, 47
742, 12, 26, 31, 36, 41, 48
743, 3, 18, 35
744, 10
745, 7, 15
746, 28, 43, 49
747, 12, 27, 28
748, 1, 19
749, 17, 21
750, 0, 18, 24, 43, 48
751, 16, 32, 45
752, 0, 11, 33
753, 3
754, 9, 15, 40
755, 12, 30, 49
756, 45
757, 23, 24, 40, 41, 43
758, 14, 40, 44
759, 15, 41, 47
760, 27, 28, 29
761, 0, 2, 23, 46
762, 7, 8, 11, 20
763, 6, 15, 22, 32
764, 5, 13, 26
765, 2, 20
766, 1, 7, 15, 48
767, 34, 45
768, 12, 24, 31, 46
769, 26, 27, 28
770, 23, 24, 40, 41
771, 20, 27, 29
772, 26
773, 27, 28
774, 1, 13, 29
775, 11, 23, 25, 36, 38, 45
776, 17, 29, 47
777, 22, 30, 39
778, 33, 42, 45, 47
779, 0, 2, 12, 39, 41, 46, 49
780, 12, 31, 36, 48
781, 12, 17, 27, 43, 45
782, 17, 47
783, 12, 31, 36, 48
784, 8, 20, 29, 32, 46
785, 10, 22, 23, 26, 36
786, 7, 20, 26, 44, 47
787, 2, 18, 27, 28, 33
788, 14, 21, 23, 24, 30, 42, 46
789, 18, 35
790, 1
791, 0, 14, 20, 44, 46
792, 5, 7, 11, 24, 27
793, 0, 18, 25, 39
794, 1, 19, 27
795, 23, 24, 40, 41, 43
796, 16
797, 0, 6, 28, 33, 35, 46, 48
798, 1, 19, 44
799, 17, 29, 47
800, 6, 31
801, 23, 24, 40, 41, 43
802, 33, 42
803, 1, 3, 19, 29, 31, 44
804, 17, 29, 47
805, 27, 31, 32, 36, 46
806, 16, 32, 45
807, 1, 19, 34, 44
808, 0, 2, 7, 18, 48
809, 14, 19
810, 7, 15, 49
811, 5, 13, 22, 30
812, 17, 29, 47
813, 13, 27, 28, 42, 48
814, 1, 5, 22
815, 4, 9
816, 7, 11, 37, 45
817, 0, 3, 7, 22, 37, 39, 40
818, 1, 19
819, 22, 26
820, 33, 37, 42
821, 4, 8, 13, 27, 28, 46
822, 27, 28, 40
823, 27, 28
824, 33, 42
825, 5, 22
826, 14, 22, 44
827, 16, 32, 45
828, 3, 16, 28, 48
829, 22, 23, 24, 39
830, 15, 26, 28, 33, 36
831, 7, 15, 49
832, 15, 22, 27, 31, 33, 40
833, 18, 35, 41, 43, 49
834, 4, 9
835, 12, 31, 36
836, 1, 19, 28, 31, 38, 44, 48
837, 4, 9, 18
838, 6, 32, 34
839, 15, 42
840, 13, 27, 28, 36
841, 7, 15, 22, 37, 43, 49
842, 7, 30, 43
843, 15, 25, 49
844, 22, 33, 41
845, 34
846, 4, 14, 21, 25, 41
847, 20, 23, 33, 42
848, 9, 13, 22, 42, 45
849, 0, 14, 22, 29, 46
850, 1, 19
851, 3, 16, 32, 45
852, 4, 9
853, 4, 9, 21, 39
854, 3, 18, 39
855, 4, 13, 32
856, 17, 29, 47
857, 12, 31, 36, 48
858, 14, 15
859, 23, 46
860, 5, 22
861, 3, 18, 35
862, 12, 22, 47, 48
863, 3, 5, 18, 35
864, 23, 24, 40, 41
865, 22
866, 8, 12, 16, 18, 29, 34, 49
867, 33, 34, 42, 44
868, 13, 23, 24, 41
869, 23, 24, 40, 41, 43
870, 13, 25
871, 5, 36, 41, 42
872, 5, 22, 31, 37, 47
873, 5, 7, 20
874, 4, 14, 16, 37, 44
875, 3, 18, 35
876, 8, 21, 29, 33, 42
877, 5, 14, 18
878, 27
879, 14, 20, 22, 24
880, 1, 3, 14, 31, 44, 45, 47
881, 29, 36, 41, 46
882, 16, 32, 45
883, 23, 24, 40, 41, 43
884, 7, 11, 19, 38, 42, 43, 47
885, 0, 27, 28, 32
886, 0, 2, 28, 41, 46
887, 23, 24, 40, 41, 43
888, 3, 18, 35
889, 14, 31
890, 8, 14, 44
891, 3, 14, 18, 22, 35
892, 1, 3, 30
893, 6, 12, 41, 44
894, 11, 30, 43
895, 7, 15, 49
896, 0, 2, 46
897, 1, 17, 38, 43
898, 25, 36
899, 7, 9, 27
900, 5, 13, 22
901, 12, 31, 36, 48
902, 26, 39, 43
903, 12, 26, 38
904, 1, 16, 20, 40
905, 23, 24, 40, 41, 43
906, 6, 28, 37, 38, 49
907, 14, 16, 17, 18, 26, 31
908, 37, 44
909, 11, 30
910, 22, 28, 41, 48
911, 12, 31, 36, 48
912, 4, 5, 7, 21
913, 3, 9, 18, 35, 41
914, 5, 27, 28, 42
915, 5, 22, 31
916, 33, 42
917, 4, 9, 15
918, 7, 11, 37, 45
919, 23, 24, 40, 41, 43
920, 40
921, 1, 8, 14, 19
922, 14, 31, 44
923, 6, 28, 33, 39
924, 0, 6, 16, 30, 39, 47
925, 12, 31, 36
926, 1, 36, 46
927, 27, 42, 44
928, 4, 9
929, 20
930, 5, 22
931, 10, 16, 26
932, 13, 36, 42
933, 12, 33, 42, 47
934, 4, 27, 28
935, 1, 19, 39
936, 1, 16, 32
937, 12, 31, 36
938, 16, 32, 41, 45
939, 16, 33, 42
940, 12, 13, 31, 35, 36, 41, 44
941, 6, 9, 19, 24, 44
942, 23, 24, 40, 41, 43
943, 5, 12, 31, 36, 48
944, 0, 2, 20, 44, 46
945, 3, 18, 35
946, 4, 12, 14, 19, 32, 48
947, 41
948, 0, 14, 44
949, 1, 10, 13, 19, 25, 26, 33, 39
950, 7, 15, 16
951, 4, 29, 46
952, 14, 20, 40, 43
953, 7, 38
954, 3, 18, 45, 47
955, 19, 29, 43
956, 9, 16
957, 0, 20, 42
958, 5, 10, 32, 33, 39, 44
959, 27, 28
960, 5, 22, 25, 29, 37, 43
961, 19
962, 4, 16, 34
963, 9, 17, 44, 45
964, 7, 15, 29
965, 7, 11, 27, 37, 39, 45, 49
966, 12, 31, 36, 48
967, 34, 49
968, 31
969, 25, 33, 42, 45
970, 16, 27, 32, 45
971, 25, 41
972, 6, 9, 15, 24, 25, 40, 45
973, 7, 8, 9, 11, 29, 37, 45
974, 21, 36
975, 5, 7, 14, 17, 22
976, 38, 40
977, 2, 7, 15, 22, 25, 33, 41
978, 27, 28
979, 16, 32, 45
980, 8, 18, 28
981, 13
982, 41
983, 27, 28, 38, 48
984, 3, 18, 35
985, 16, 20, 32, 36
986, 16, 32, 45
987, 4, 9
988, 22, 31, 39
989, 0, 2, 46
990, 34
991, 0, 2, 46
992, 0
993, 3, 6, 13
994, 3, 29, 44
995, 9, 27, 28, 45
996, 2, 10, 22, 31, 33, 46
997, 2, 6, 22, 32
998, 0, 4, 9, 30, 33
999, 3, 18, 35
1000, 15, 34, 47

package second;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Map;
import java.util.Set; /*
* date:2015-06-10
* by wenbaoli
*
*/
public class Apriori { private float min_sup;//minimum support
private float min_conf;//minimum confidence private Map<Integer,Set<String>> TransDataBase;//transaction database
private Integer DBnum; private Map<Integer,Map<Set<String>,Float>> freqItermSet;//frequent iterm set,from 1 to k...
private Map<Set<String>,Set<Map<Set<String>,Float>>> associationRules;//the final associate rules public Apriori(Map<Integer,Set<String>> DB , float minSup, float minConf){
this.TransDataBase = DB;
this.min_conf = minConf;
this.min_sup = minSup;
this.DBnum = DB.size();
freqItermSet =new HashMap<Integer,Map<Set<String>,Float>>();
associationRules = new HashMap<Set<String>,Set<Map<Set<String>,Float>>>(); } /**
* @param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub //initial String file = "data/data.txt";
String delimeter = ","; //load data
Map<Integer,Set<String>> data = new HashMap<Integer,Set<String>>(); int num = 0;
try {
File mFile = new File(file);
FileReader fr = new FileReader(mFile);
BufferedReader br = new BufferedReader(fr);
String line; while((line=br.readLine())!= null) {
line = line.trim();
String[] str = line.split(delimeter);
// int i = Integer.parseInt(str[0]);
Set<String> set = new HashSet<String>(); for (int i = 1; i < str.length; i++) {
set.add(str[i].trim());
}
num++;
data.put(num, set); }
br.close();
} catch(IOException ex) {
ex.printStackTrace();
}
Apriori ap = new Apriori(data,0.005f,0.6f);
ap.findAllFreqItermSet();
ap.findAssociationRules();
}
public void findAllFreqItermSet(){ //找出频繁一项集 Map<Set<String>,Float> f1 = new HashMap<Set<String>,Float>();
Map<Set<String>,Integer> OneItermSet = new HashMap<Set<String>,Integer>();
Iterator<Map.Entry<Integer, Set<String>>> it = this.TransDataBase.entrySet().iterator(); while(it.hasNext()){
Map.Entry<Integer, Set<String>> entry = it.next();
Set<String> itermSet = entry.getValue();
for(String iterm:itermSet){
Set<String> key = new HashSet<String>();
key.add(iterm);
if(!OneItermSet.containsKey(key)){
OneItermSet.put(key, 1);
}else{
int value = 1 + OneItermSet.get(key);
OneItermSet.put(key, value);
}
}
}
//找出支持度大于minSup的频繁一项集
Iterator<Map.Entry<Set<String>,Integer>> iter = OneItermSet.entrySet().iterator();
while(iter.hasNext()){
Map.Entry<Set<String>,Integer> entry = iter.next();
//计算支持度
Float support = new Float(entry.getValue().toString())/new Float(this.DBnum);
if(support >= this.min_sup){
f1.put(entry.getKey(), support);
}
} System.out.println("频繁1" + "项集:" + f1);//打印频繁1-项集
System.out.println("-------------------------------------------");
this.freqItermSet.put(1, f1); //由频繁k项集得到频繁k+1项集
int k = 2;
while(true){ Set<Set<String>> candFreItermSets = this.apriori_gen(k,this.freqItermSet.get(k-1).keySet());
Map<Set<String>,Float> freqKItermSetMap = this.getFreqKItermSet(k,candFreItermSets);
if(!freqKItermSetMap.isEmpty()){
this.freqItermSet.put(k, freqKItermSetMap);
} else {
break;
}
System.out.println("频繁" + k + "项集:" + freqKItermSetMap);
System.out.println("-------------------------------------------");
k++; } }
public Map<Set<String>, Float> getFreqKItermSet(int k,
Set<Set<String>> candFreItermSets) {
Map<Set<String>,Integer> candFreqKItermSetMap = new HashMap<Set<String>,Integer>(); //扫描事物数据库
Iterator<Map.Entry<Integer, Set<String>>> it = this.TransDataBase.entrySet().iterator();
//统计支持度计数
while (it.hasNext()){
Map.Entry<Integer, Set<String>> entry = it.next();
Iterator<Set<String>> iter = candFreItermSets.iterator();
while(iter.hasNext()){
Set<String> set = iter.next();
if(entry.getValue().containsAll(set)){
if(!candFreqKItermSetMap.containsKey(set)){
candFreqKItermSetMap.put(set, 1);
}else {
int value = 1+ candFreqKItermSetMap.get(set);
candFreqKItermSetMap.put(set, value);
}
}
}
} Iterator<Map.Entry<Set<String>, Integer>> iter = candFreqKItermSetMap.entrySet().iterator();
Map<Set<String>,Float> freqKIntermSet = new HashMap<Set<String>,Float>();
while(iter.hasNext()){
Map.Entry<Set<String>, Integer> entry = iter.next();
float support = new Float(entry.getValue().toString())/this.DBnum;
if(support < this.min_sup){
iter.remove();
} else {
freqKIntermSet.put(entry.getKey(), support);
}
} return freqKIntermSet;
} public Set<Set<String>> apriori_gen(int k, Set<Set<String>> fk){
Set<Set<String>> ck = new HashSet<Set<String>>();
Iterator<Set<String>> it1 = fk.iterator();
while (it1.hasNext()) {
Set<String> itermSet1 = it1.next();
Iterator<Set<String>> it2 = fk.iterator();
while (it2.hasNext()) {
Set<String> itermSet2 = it2.next();
if(!itermSet1.equals(itermSet2)) {
//连接步
Set<String> commIterms = new HashSet<String>();
commIterms.addAll(itermSet1);
commIterms.retainAll(itermSet2);
if(commIterms.size() == (k - 2)){
Set<String> candIterms = new HashSet<String>();
candIterms.addAll(itermSet1);
candIterms.removeAll(itermSet2);
candIterms.addAll(itermSet2);
//剪枝步骤
if(!this.has_infrequent_subset(candIterms, fk)){
ck.add(candIterms);
}
}
}
}
}
System.out.println(ck.size());
return ck;
}
public boolean has_infrequent_subset(Set<String> set,Set<Set<String>> fk){
Set<Set<String>> subItermSet = new HashSet<Set<String>>();
Iterator<String> itr = set.iterator();
while(itr.hasNext()){
//深拷贝
Set<String> subItem = new HashSet<String>();
Iterator<String> it = set.iterator();
while(it.hasNext()){
subItem.add(it.next());
} //去掉一个项后为k-1子集
subItem.remove(itr.next());
subItermSet.add(subItem);
} Iterator<Set<String>> it = subItermSet.iterator();
while(it.hasNext()){
if(!fk.contains(it.next())){
return true;
}
}
return false;
}
public void findAssociationRulesTemp(){ }
public void findAssociationRules(){
Iterator<Map.Entry<Integer, Map<Set<String>,Float>>> it = this.freqItermSet.entrySet().iterator();
while (it.hasNext()) {
Map.Entry<Integer, Map<Set<String>,Float>> entry = it.next();
for (Set<String> itemSet : entry.getValue().keySet()) {
int n = itemSet.size() / 2;
for (int i = 1; i <= n; i++) {
Set<Set<String>> subset = this.getProperSubset(i,itemSet); for(Set<String> conditionSet:subset){
Set<String> conclusionSet = new HashSet<String>();
conclusionSet.addAll(itemSet);
conclusionSet.removeAll(conditionSet);
int s1 = conditionSet.size();
int s2 = conclusionSet.size(); float supF = this.freqItermSet.get(s1).get(conditionSet);
float supS = this.freqItermSet.get(s2).get(conclusionSet);
float sup = this.freqItermSet.get(s1+s2).get(itemSet); float conf1 = sup/supF;
float conf2 = sup/supS; if(conf1 >= this.min_conf){
if(this.associationRules.get(conditionSet) == null){
Set<Map<Set<String>,Float>> conclusionSetSet = new HashSet<Map<Set<String>,Float>>();
Map<Set<String>,Float> sets = new HashMap<Set<String>,Float>();
sets.put(conclusionSet, conf1);
conclusionSetSet.add(sets); this.associationRules.put(conditionSet, conclusionSetSet); } else {
Map<Set<String>,Float> sets = new HashMap<Set<String>,Float>();
sets.put(conclusionSet, conf1);
associationRules.get(conditionSet).add(sets);
}
}
if(conf2 >= this.min_conf){
if(this.associationRules.get(conditionSet) == null){
Set<Map<Set<String>,Float>> conclusionSetSet = new HashSet<Map<Set<String>,Float>>();
Map<Set<String>,Float> sets = new HashMap<Set<String>,Float>();
sets.put(conclusionSet, conf2);
conclusionSetSet.add(sets); this.associationRules.put(conditionSet, conclusionSetSet); } else {
Map<Set<String>,Float> sets = new HashMap<Set<String>,Float>();
sets.put(conclusionSet, conf2);
associationRules.get(conditionSet).add(sets);
}
}
}
} } }
System.out.println("关联规则(强规则):" + associationRules);
} private Set<Set<String>> getProperSubset(int i, Set<String> itemSet) { Set<Set<String>> subset = new HashSet<Set<String>>();
if(itemSet.size() <= 1){
return null;
}else if(itemSet.size() == 2){
for(String s: itemSet){
Set<String> set = new HashSet<String>();
set.add(s);
if(!subset.contains(s)){
subset.add(set);
} }
return subset;
}else { Iterator<String> it = itemSet.iterator();
String s = it.next();
Set<String> temp = new HashSet<String>(itemSet);
temp.remove(s);
//包含s的子集
Set<Set<String>> subset0 = new HashSet<Set<String>>();
subset0 = this.getProperSubset(i-1, temp);
subset.addAll(subset0);
//不包含s的子集
Set<Set<String>> subset1 = new HashSet<Set<String>>(); subset1 = this.getProperSubset(i, temp);
subset.addAll(subset1);
return subset; } } }

关联规则挖掘之apriori算法的更多相关文章

  1. HAWQ + MADlib 玩转数据挖掘之(七)——关联规则方法之Apriori算法

    一.关联规则简介 关联规则挖掘的目标是发现数据项集之间的关联关系,是数据挖据中一个重要的课题.关联规则最初是针对购物篮分析(Market Basket Analysis)问题提出的.假设超市经理想更多 ...

  2. 关联规则推荐及Apriori算法

    参考这篇文章: http://blog.csdn.net/rongyongfeikai2/article/details/40457827 这条关联规则的支持度:support = P(A并B) 这条 ...

  3. 推荐系统第4周--- 基于频繁模式的推荐系统和关联规则挖掘Apriori算法

    数据挖掘:关联规则挖掘

  4. Apriori算法实现

    Apriori算法原理:http://blog.csdn.net/kingzone_2008/article/details/8183768 import java.util.HashMap; imp ...

  5. 数据挖掘算法之-关联规则挖掘(Association Rule)

    在数据挖掘的知识模式中,关联规则模式是比较重要的一种.关联规则的概念由Agrawal.Imielinski.Swami 提出,是数据中一种简单但很实用的规则.关联规则模式属于描述型模式,发现关联规则的 ...

  6. 数据挖掘算法之-关联规则挖掘(Association Rule)(购物篮分析)

    在各种数据挖掘算法中,关联规则挖掘算是比較重要的一种,尤其是受购物篮分析的影响,关联规则被应用到非常多实际业务中,本文对关联规则挖掘做一个小的总结. 首先,和聚类算法一样,关联规则挖掘属于无监督学习方 ...

  7. 数据挖掘系列(4)使用weka做关联规则挖掘

    前面几篇介绍了关联规则的一些基本概念和两个基本算法,但实际在商业应用中,写算法反而比较少,理解数据,把握数据,利用工具才是重要的,前面的基础篇是对算法的理解,这篇将介绍开源利用数据挖掘工具weka进行 ...

  8. 数据挖掘算法——Apriori算法

    Apriori算法  首先,Apriori算法是关联规则挖掘中很基础也很经典的一个算法. 转载来自:链接:https://www.jianshu.com/p/26d61b83492e 所以做如下补充: ...

  9. Python两步实现关联规则Apriori算法,参考机器学习实战,包括频繁项集的构建以及关联规则的挖掘

    .caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...

随机推荐

  1. spring-boot支持双数据源mysql+mongo

    这里,首先想说的是,现在的web应用,处理的数据对象,有结构化的,也有非结构化的.同时存在.但是在spring-boot操作数据库的时候,若是在properties文件中配置数据源的信息,通过默认配置 ...

  2. javascript 与和非

    || :  在javascript中,返回第一个真值,除非都是假值返回最后一个值(也是假值). 1 || 0; 0 || 1; 0 || 0; 0 || undefined; // undefined ...

  3. systemd的原理和适用方法

    systemd的原理: https://www.linux.com/learn/tutorials/527639-managing-services-on-linux-with-systemd htt ...

  4. Python基础(二) —— 字符串、列表、字典等常用操作

    一.作用域 对于变量的作用域,执行声明并在内存中存在,该变量就可以在下面的代码中使用. 二.三元运算 result = 值1 if 条件 else 值2 如果条件为真:result = 值1如果条件为 ...

  5. JQUERY 拖拽 draggable droppable resizable selectable sortable

    今天用了jq ui的拖动碰撞功能,好不容易看到有详细的API解说,记录如下:   <script language="JavaScript" type="text/ ...

  6. LintCode "Coins in a Line II" !

    Nice one to learn: DP + Game Theoryhttps://lefttree.gitbooks.io/leetcode/content/dynamicProgramming2 ...

  7. 轻量级开源内存数据库SQLite性能测试

    [IT168 专稿]SQLite是一款轻型的数据库,它占用资源非常的低,同时能够跟很多程序语言相结合,但是支持的SQL语句不会逊色于其他开源数据库.它的设计目标是嵌入式的,而且目前已经在很多嵌入式产品 ...

  8. 51nod1313 完美串

    一个N长的字符串S(N<=3000),只由'R','G','B'三种字符组成,即串中不存在除了这3个字符以外的其他字符.字符串S的子串substr(L,R)指S[L]S[L+1]S[L+2].. ...

  9. Understanding and Managing SMTP Virtual Servers

    Simple Mail Transfer Protocol (SMTP) Service Overview The Simple Mail Transfer Protocol (SMTP) servi ...

  10. 黄聪:走进wordpress 详细说说template-loader.php

    再看template-laoder.php,这个文件总共只有45行.它的作用是基于访问的URL装载正确的模板. 文件第六行,也是第一条语句,如下: if ( defined('WP_USE_THEME ...