MySQL的COUNT()函数理解

标签（空格分隔）： MySQL5.7 COUNT()函数探讨

写在前面的话

细心的朋友会在平时工作和学习中，可以看到MySQL的COUNT()函数有多种不同的参数，从而会有不同的统计方式，本文正是出于此目的一探究竟。

主要内容&你能获得什么

辨析COUNT(*)、COUNT(1)、COUNT(0)、COUNT(列名)、COUNT(DISTINCT 列名)的区别和作用。

探讨

基本理解

COUNT()函数用来统计表的行数，也就是统计记录行数，很好理解。查看MySQL5.7官方手册

官方对COUNT(expr)解释：

Returns a count of the number of non-NULL values of expr in the rows retrieved by a SELECT statement. The result is a BIGINT value. If there are no matching rows, COUNT() returns 0.

COUNT(*) is somewhat different in that it returns a count of the number of rows retrieved, whether or not they contain NULL values.

COUNT(DISTINCT expr,[expr...])Returns a count of the number of rows with different non-NULL expr values.If there are no matching rows, COUNT(DISTINCT) returns 0.

在SELECT检索语句中，COUNT(expr)统计并返回参数expr为非NULL值的总行数，COUNT(DISTINCT expr)返回的是参数expr为非NULL值且不相同的总行数，结果是一个BIGINT数据类型的值，占8个字节；如果没有匹配到满足条件的行，结果返回0。但是当expr不是具体的列，是COUNT(*)时会统计表中所有的行数，即使某些行是NULL也会被统计在内。

测试

新建测试表users

CREATE TABLE `users` (

  `Id` int(11) NOT NULL AUTO_INCREMENT,

  `LoginName` varchar(50) DEFAULT NULL,

  `LoginPwd` varchar(16) DEFAULT NULL,

  `Name` varchar(16) DEFAULT NULL,

  `Address` varchar(16) DEFAULT NULL,

  `Phone` varchar(16) DEFAULT NULL,

  `Mail` varchar(16) DEFAULT NULL,

  PRIMARY KEY (`Id`)

) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; 

#插入数据

mysql> select * from users;

+----+------------+----------+------+----------+-------------+---------------+

| Id | LoginName  | LoginPwd | Name | Address  | Phone       | Mail          |

+----+------------+----------+------+----------+-------------+---------------+

|  1 | bb1        | 123      | 张三 | 湖北武汉 | 13317190688 | 123@gmail.com |

|  2 | bb3        | 123      | 李四 | 湖北武汉 | 13317190688 | 123@gmail.com |

|  3 | jj4        | 123      | 张三 | 湖北武汉 | 13317190688 | 123@gmail.com |

|  4 | kobeBryant | 123456   | NULL | LA       | NULL        | NULL          |

|  5 | kobe       | 456      | NULL | NULL     | NULL        | NULL          |

|  6 | Jay        | NULL     | NULL | GXI      | NULL        | NULL          |

|  7 | jj4        | NULL     | NULL | NULL     | NULL        | NULL          |

+----+------------+----------+------+----------+-------------+---------------+

7 rows in set

执行查询

mysql> SELECT COUNT(*),COUNT(1),COUNT(0),COUNT(-1), COUNT(LoginPwd),COUNT(Phone),COUNT(DISTINCT Phone) FROM users;

+----------+----------+----------+-----------+-----------------+--------------+------------------------+

| COUNT(*) | COUNT(1) | COUNT(0) | COUNT(-1) | COUNT(LoginPwd) | COUNT(Phone) | COUNT(DISTINCT Phone) |

+----------+----------+----------+-----------+-----------------+--------------+------------------------+

|        7 |        7 |        7 |         7 |               5 |            3 |                      1 |

+----------+----------+----------+-----------+-----------------+--------------+------------------------+

1 row in set

根据上述结果可以有以下结论：

COUNT(*)、COUNT(1)、COUNT(0)统计的是所有行数，结果都是7行。
COUNT(LoginPwd)、COUNT(Phone)分别统计列LoginPwd、列Phone的非NULL的行数，结果分别是5行、3行。
COUNT(DISTINCT Phone)只统计列Phone的非NULL且不相同的行数，结果是1行。

辨析

对 COUNT(LoginPwd)、COUNT(Phone)和COUNT(DISTINCT Phone)的结果我们不难理解，关键是要弄清楚COUNT(*)、COUNT(1)、COUNT(0)这三个式子，它们的使用区别是什么，或者是没区别。

查看官方文档：

For MyISAM tables, COUNT(*) is optimized to return very quickly if the SELECT retrieves from one table, no other columns are retrieved, and there is no WHERE clause. 

This optimization only applies to MyISAM tables, because an exact row count is stored for this storage engine and can be accessed very quickly. COUNT(1) is only subject to the same optimization if the first column is defined as NOT NULL.

For transactional storage engines such as InnoDB, storing an exact row count is problematic because multiple transactions may be occurring, each of which may affect the count.

对于使用MyISAM存储引擎的每张表都会为了优化查询，会定义一个变量row count来记录目前表的总行数，我们可以快速获得一张表的总行数，但是想要快速拿到这个变量，查询的时候就要遵循一定的要求，即不能同时查询其他的列且不能有WHERE语句（如“SELECT COUNT(*) FROM student;”）。如果是使用COUNT(1)，这个表的第一列要定义为非NULL才会拿到这个row count变量，否则就是全表扫描统计。
对于事务型存储引擎，如InnoDB，使用COUNT(*)是全表扫描统计（如果有索引会根据索引优化查询）。如果有row count这样的一个变量，因为同一时间可能会有多个事务同时操作，可能会带来并发操作的问题，row count的结果会不一致，所以事务型引擎并没有优化COUNT(*)查询。

COUNT执行细节

执行综合查询

mysql> explain SELECT COUNT(*),COUNT(1),COUNT(0),COUNT(-1), COUNT(LoginPwd),COUNT(Phone),COUNT( DISTINCT Phone) FROM users;

+----+-------------+-------+------+---------------+------+---------+------+------+-------+

| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra |

+----+-------------+-------+------+---------------+------+---------+------+------+-------+

|  1 | SIMPLE      | users | ALL  | NULL          | NULL | NULL    | NULL |    7 | NULL  |

+----+-------------+-------+------+---------------+------+---------+------+------+-------+

1 row in set

执行整条语句的时候，可以看到type字段是ALL，使用了全表扫描，表的行数是rows=7。

执行COUNT(*)

mysql> explain SELECT COUNT(*) FROM users;

+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+

| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |

+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+

|  1 | SIMPLE      | users | index | NULL          | PRIMARY | 4       | NULL |    7 | Using index |

+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+

1 row in set

执行COUNT(*)可以看到type字段是index，没有使用全表扫描，而是使用了索引优化查询，使用了主键PRIMARY索引，表的行数是rows=7。

执行COUNT(1)

mysql> explain SELECT COUNT(1) FROM users;

+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+

| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |

+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+

|  1 | SIMPLE      | users | index | NULL          | PRIMARY | 4       | NULL |    7 | Using index |

+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+

1 row in set

执行COUNT(1)可以看到type字段是index，没有使用全表扫描，而是使用了索引优化查询，使用了主键PRIMARY索引。

执行COUNT(0)

mysql> explain SELECT COUNT(0) FROM users;

+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+

| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |

+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+

|  1 | SIMPLE      | users | index | NULL          | PRIMARY | 4       | NULL |    7 | Using index |

+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+

1 row in set

执行COUNT(0)可以看到type字段是index，没有使用全表扫描，而是使用了索引优化查询，使用了主键PRIMARY索引。

对于InnoDB，查询COUNT(*)和COUNT(1)二者并没有区别，性能效率等效，都是全表扫描（有索引则会优化自动使用索引）。

InnoDB handles SELECT COUNT(*) and SELECT COUNT(1) operations in the same way. There is no performance difference.

结论

|序号|类别|作用|解释说明|使用|

|---|---|---|---|

|1|COUNT(*)|统计总行数，含NULL值|MyISAM引擎，如果没有查询其他列且无WHERE语句会直接返回row count变量，高效。其他情况全表扫描（有索引则用索引），统计表的总行数。|如果仅仅是统计总行数，任何情况先推荐使用“SELECT COUNT(*) FROM 表名”。|

|2|COUNT(n)|统计总行数，可以是COUNT(任何整数或小数)，含NULL值|如COUNT(1)，MyISAM引擎如果没有查询其他列且无WHERE语句且第一列定义为非NULL会直接返回row count变量，高效。其他情况全表扫描（有索引则用索引）|表的第一列定义为非NULL，推荐使用“SELECT COUNT(1) FROM 表名”。|

|3|COUNT(列名)|统计某一列非NULL的行数|纯粹统计指定列的非NULL行数，不区分存储引擎|看业务需求|

|4|COUNT(DISTINCT 列名)|统计某一列非NULL且不相同的行数|纯粹统计指定列的非NULL且不相同的行数，不区分存储引擎|看业务需求|

使用选择：

COUNT(*)和COUNT(n)本质上一样，具体响应时间跟存储引擎和WHERE条件有关。个人习惯使用COUNT(1)。
索引对COUNT()函数很重要，如果要用到索引，MySQL会自动优化使用合适的索引。
COUNT(列名)需要注意统计的是非NULL的列。

使用SUM(1)也可以达到统计表总行数的目的，而且也包含NULL值，但是效率没有COUNT(*)高。

参考：

https://highdb.com/了解-select-count-count1-和-countfield/

官方手册：https://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_count