【ClickHouse】4:clickhouse基本操作二 建库建表导数据
背景介绍:
有三台CentOS7服务器安装了ClickHouse
HostName | IP | 安装程序 | 程序端口 |
centf8118.sharding1.db | 192.168.81.18 | clickhouse-server,clickhouse-client | 9000 |
centf8119.sharding2.db | 192.168.81.19 | clickhouse-server,clickhouse-client | 9000 |
centf8120.sharding3.db | 192.168.81.20 | clickhouse-server,clickhouse-client | 9000 |
1:创建数据库
CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster] [ENGINE = engine(...)]
CREATE DATABASE testdb; //创建数据库
DROP DATABASE testdb; //删除数据库
2:建表
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [compression_codec] [TTL expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [compression_codec] [TTL expr2],
...
) ENGINE = engine
CREATE TABLE test_table(
province String,
province_name String,
create_date date
) ENGINE = MergeTree(create_date, (province), 8192); 如果直接这样执行会报错,有两种方法解决:
1: 在每一行后面加右斜杠,比如:
CREATE TABLE test_table( \
province String, \
province_name String, \
create_date date \
) ENGINE = MergeTree(create_date, (province), 8192); 2: 在登录的时候加 -m参数支持多行模式,比如:
clickhouse-client -m
ENGINE:是表的引擎类型,最常用的MergeTree。还有一个Log引擎也是比较常用。MergeTree要求有一个日期字段,还有主键。Log没有这个限制。
create_date:是表的日期字段,一个表必须要有一个日期字段。
province:是表的主键,主键可以有多个字段,每个字段用逗号分隔
8192:是索引粒度,用默认值8192即可。
3:导入数据
3.1:普通的CSV文件导入。
cat > test_table.csv << EOF
WA,WA_NAME,2020-08-25
CA,CA_NAME,2020-09-25
OR,OR_NAME,2020-10-25
EOF
–导数
clickhouse-client --query "INSERT INTO testdb.test_table FORMAT CSV" < test_table.csv;
–或者用管道的方式
cat test_table.csv | clickhouse-client --query “INSERT INTO testdb.test_table FORMAT CSV”
查看结果数据:


centf8118.sharding1.db :) select * from test_table limit 2 ; SELECT *
FROM test_table
LIMIT 2 ┌─province─┬─province_name─┬─create_date─┐
│ WA │ WA_NAME │ 2020-08-25 │
└──────────┴───────────────┴─────────────┘
┌─province─┬─province_name─┬─create_date─┐
│ CA │ CA_NAME │ 2020-09-25 │
└──────────┴───────────────┴─────────────┘ 2 rows in set. Elapsed: 0.004 sec.
3.2:特殊的CSV文件导入(包含回车换行,转义符等)。
–建表(sql语句)
CREATE TABLE testdb.test_table3 ( id1 UInt32, id2 Float32, name1 String, name2 String, date1 Date, date2 DateTime) ENGINE = Log;
–测试数据(Linux命令)
[root@centf8118 tmp]# cat test_table3.csv
1,123.456,”abc 123”,” abc" "'123”,2020-08-26,2020-08-26 17:08:09
–执行导入(Linux命令)
clickhouse-client --query "INSERT INTO testdb.test_table3 FORMAT CSV" < test_table3.csv
–查看结果(sql语句)


centf8118.sharding1.db :) select * from test_table3; SELECT *
FROM test_table3 ┌─id1─┬─────id2─┬─name1─────┬─name2────────────┬──────date1─┬───────────────date2─┐
│ 1 │ 123.456 │ ”abc 123” │ ” abc" "'123” │ 2020-08-26 │ 2020-08-26 17:08:09 │
└─────┴─────────┴───────────┴──────────────────┴────────────┴─────────────────────┘
3.3:尝试导入官方提供的大数据表
官方文档: https://clickhouse.tech/docs/en/getting-started/tutorial/
1>>下载并提取表数据,提取的文件大小约为10GB。
我这里下载到/data/clickhouse/tmp/目录下。
curl https://clickhouse-datasets.s3.yandex.net/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv
curl https://clickhouse-datasets.s3.yandex.net/visits/tsv/visits_v1.tsv.xz | unxz --threads=`nproc` > visits_v1.tsv
[root@centf8118 tmp]# curl https://clickhouse-datasets.s3.yandex.net/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 802M 100 802M 0 0 8074k 0 0:01:41 0:01:41 --:--:-- 9047k
[root@centf8118 tmp]# curl https://clickhouse-datasets.s3.yandex.net/visits/tsv/visits_v1.tsv.xz | unxz --threads=`nproc` > visits_v1.tsv
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 405M 100 405M 0 0 8790k 0 0:00:47 0:00:47 --:--:-- 9911k
查看目录下文件大小为9.8G。
[root@centf8118 ~]# du -sh /data/clickhouse/tmp
9.8G /data/clickhouse/tmp
2>>创建新的数据库:tutorial
clickhouse-client --query "CREATE DATABASE IF NOT EXISTS tutorial"
3>>创建要导入数据的表:tutorial.hits_v1和tutorial.visits_v1
创建tutorial.hits_v1表:


CREATE TABLE tutorial.hits_v1
(
`WatchID` UInt64,
`JavaEnable` UInt8,
`Title` String,
`GoodEvent` Int16,
`EventTime` DateTime,
`EventDate` Date,
`CounterID` UInt32,
`ClientIP` UInt32,
`ClientIP6` FixedString(16),
`RegionID` UInt32,
`UserID` UInt64,
`CounterClass` Int8,
`OS` UInt8,
`UserAgent` UInt8,
`URL` String,
`Referer` String,
`URLDomain` String,
`RefererDomain` String,
`Refresh` UInt8,
`IsRobot` UInt8,
`RefererCategories` Array(UInt16),
`URLCategories` Array(UInt16),
`URLRegions` Array(UInt32),
`RefererRegions` Array(UInt32),
`ResolutionWidth` UInt16,
`ResolutionHeight` UInt16,
`ResolutionDepth` UInt8,
`FlashMajor` UInt8,
`FlashMinor` UInt8,
`FlashMinor2` String,
`NetMajor` UInt8,
`NetMinor` UInt8,
`UserAgentMajor` UInt16,
`UserAgentMinor` FixedString(2),
`CookieEnable` UInt8,
`JavascriptEnable` UInt8,
`IsMobile` UInt8,
`MobilePhone` UInt8,
`MobilePhoneModel` String,
`Params` String,
`IPNetworkID` UInt32,
`TraficSourceID` Int8,
`SearchEngineID` UInt16,
`SearchPhrase` String,
`AdvEngineID` UInt8,
`IsArtifical` UInt8,
`WindowClientWidth` UInt16,
`WindowClientHeight` UInt16,
`ClientTimeZone` Int16,
`ClientEventTime` DateTime,
`SilverlightVersion1` UInt8,
`SilverlightVersion2` UInt8,
`SilverlightVersion3` UInt32,
`SilverlightVersion4` UInt16,
`PageCharset` String,
`CodeVersion` UInt32,
`IsLink` UInt8,
`IsDownload` UInt8,
`IsNotBounce` UInt8,
`FUniqID` UInt64,
`HID` UInt32,
`IsOldCounter` UInt8,
`IsEvent` UInt8,
`IsParameter` UInt8,
`DontCountHits` UInt8,
`WithHash` UInt8,
`HitColor` FixedString(1),
`UTCEventTime` DateTime,
`Age` UInt8,
`Sex` UInt8,
`Income` UInt8,
`Interests` UInt16,
`Robotness` UInt8,
`GeneralInterests` Array(UInt16),
`RemoteIP` UInt32,
`RemoteIP6` FixedString(16),
`WindowName` Int32,
`OpenerName` Int32,
`HistoryLength` Int16,
`BrowserLanguage` FixedString(2),
`BrowserCountry` FixedString(2),
`SocialNetwork` String,
`SocialAction` String,
`HTTPError` UInt16,
`SendTiming` Int32,
`DNSTiming` Int32,
`ConnectTiming` Int32,
`ResponseStartTiming` Int32,
`ResponseEndTiming` Int32,
`FetchTiming` Int32,
`RedirectTiming` Int32,
`DOMInteractiveTiming` Int32,
`DOMContentLoadedTiming` Int32,
`DOMCompleteTiming` Int32,
`LoadEventStartTiming` Int32,
`LoadEventEndTiming` Int32,
`NSToDOMContentLoadedTiming` Int32,
`FirstPaintTiming` Int32,
`RedirectCount` Int8,
`SocialSourceNetworkID` UInt8,
`SocialSourcePage` String,
`ParamPrice` Int64,
`ParamOrderID` String,
`ParamCurrency` FixedString(3),
`ParamCurrencyID` UInt16,
`GoalsReached` Array(UInt32),
`OpenstatServiceName` String,
`OpenstatCampaignID` String,
`OpenstatAdID` String,
`OpenstatSourceID` String,
`UTMSource` String,
`UTMMedium` String,
`UTMCampaign` String,
`UTMContent` String,
`UTMTerm` String,
`FromTag` String,
`HasGCLID` UInt8,
`RefererHash` UInt64,
`URLHash` UInt64,
`CLID` UInt32,
`YCLID` UInt64,
`ShareService` String,
`ShareURL` String,
`ShareTitle` String,
`ParsedParams` Nested(
Key1 String,
Key2 String,
Key3 String,
Key4 String,
Key5 String,
ValueDouble Float64),
`IslandID` FixedString(16),
`RequestNum` UInt32,
`RequestTry` UInt8
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(EventDate)
ORDER BY (CounterID, EventDate, intHash32(UserID))
SAMPLE BY intHash32(UserID)
SETTINGS index_granularity = 8192;
创建tutorial.visits_v1表:


CREATE TABLE tutorial.visits_v1
(
`CounterID` UInt32,
`StartDate` Date,
`Sign` Int8,
`IsNew` UInt8,
`VisitID` UInt64,
`UserID` UInt64,
`StartTime` DateTime,
`Duration` UInt32,
`UTCStartTime` DateTime,
`PageViews` Int32,
`Hits` Int32,
`IsBounce` UInt8,
`Referer` String,
`StartURL` String,
`RefererDomain` String,
`StartURLDomain` String,
`EndURL` String,
`LinkURL` String,
`IsDownload` UInt8,
`TraficSourceID` Int8,
`SearchEngineID` UInt16,
`SearchPhrase` String,
`AdvEngineID` UInt8,
`PlaceID` Int32,
`RefererCategories` Array(UInt16),
`URLCategories` Array(UInt16),
`URLRegions` Array(UInt32),
`RefererRegions` Array(UInt32),
`IsYandex` UInt8,
`GoalReachesDepth` Int32,
`GoalReachesURL` Int32,
`GoalReachesAny` Int32,
`SocialSourceNetworkID` UInt8,
`SocialSourcePage` String,
`MobilePhoneModel` String,
`ClientEventTime` DateTime,
`RegionID` UInt32,
`ClientIP` UInt32,
`ClientIP6` FixedString(16),
`RemoteIP` UInt32,
`RemoteIP6` FixedString(16),
`IPNetworkID` UInt32,
`SilverlightVersion3` UInt32,
`CodeVersion` UInt32,
`ResolutionWidth` UInt16,
`ResolutionHeight` UInt16,
`UserAgentMajor` UInt16,
`UserAgentMinor` UInt16,
`WindowClientWidth` UInt16,
`WindowClientHeight` UInt16,
`SilverlightVersion2` UInt8,
`SilverlightVersion4` UInt16,
`FlashVersion3` UInt16,
`FlashVersion4` UInt16,
`ClientTimeZone` Int16,
`OS` UInt8,
`UserAgent` UInt8,
`ResolutionDepth` UInt8,
`FlashMajor` UInt8,
`FlashMinor` UInt8,
`NetMajor` UInt8,
`NetMinor` UInt8,
`MobilePhone` UInt8,
`SilverlightVersion1` UInt8,
`Age` UInt8,
`Sex` UInt8,
`Income` UInt8,
`JavaEnable` UInt8,
`CookieEnable` UInt8,
`JavascriptEnable` UInt8,
`IsMobile` UInt8,
`BrowserLanguage` UInt16,
`BrowserCountry` UInt16,
`Interests` UInt16,
`Robotness` UInt8,
`GeneralInterests` Array(UInt16),
`Params` Array(String),
`Goals` Nested(
ID UInt32,
Serial UInt32,
EventTime DateTime,
Price Int64,
OrderID String,
CurrencyID UInt32),
`WatchIDs` Array(UInt64),
`ParamSumPrice` Int64,
`ParamCurrency` FixedString(3),
`ParamCurrencyID` UInt16,
`ClickLogID` UInt64,
`ClickEventID` Int32,
`ClickGoodEvent` Int32,
`ClickEventTime` DateTime,
`ClickPriorityID` Int32,
`ClickPhraseID` Int32,
`ClickPageID` Int32,
`ClickPlaceID` Int32,
`ClickTypeID` Int32,
`ClickResourceID` Int32,
`ClickCost` UInt32,
`ClickClientIP` UInt32,
`ClickDomainID` UInt32,
`ClickURL` String,
`ClickAttempt` UInt8,
`ClickOrderID` UInt32,
`ClickBannerID` UInt32,
`ClickMarketCategoryID` UInt32,
`ClickMarketPP` UInt32,
`ClickMarketCategoryName` String,
`ClickMarketPPName` String,
`ClickAWAPSCampaignName` String,
`ClickPageName` String,
`ClickTargetType` UInt16,
`ClickTargetPhraseID` UInt64,
`ClickContextType` UInt8,
`ClickSelectType` Int8,
`ClickOptions` String,
`ClickGroupBannerID` Int32,
`OpenstatServiceName` String,
`OpenstatCampaignID` String,
`OpenstatAdID` String,
`OpenstatSourceID` String,
`UTMSource` String,
`UTMMedium` String,
`UTMCampaign` String,
`UTMContent` String,
`UTMTerm` String,
`FromTag` String,
`HasGCLID` UInt8,
`FirstVisit` DateTime,
`PredLastVisit` Date,
`LastVisit` Date,
`TotalVisits` UInt32,
`TraficSource` Nested(
ID Int8,
SearchEngineID UInt16,
AdvEngineID UInt8,
PlaceID UInt16,
SocialSourceNetworkID UInt8,
Domain String,
SearchPhrase String,
SocialSourcePage String),
`Attendance` FixedString(16),
`CLID` UInt32,
`YCLID` UInt64,
`NormalizedRefererHash` UInt64,
`SearchPhraseHash` UInt64,
`RefererDomainHash` UInt64,
`NormalizedStartURLHash` UInt64,
`StartURLDomainHash` UInt64,
`NormalizedEndURLHash` UInt64,
`TopLevelDomain` UInt64,
`URLScheme` UInt64,
`OpenstatServiceNameHash` UInt64,
`OpenstatCampaignIDHash` UInt64,
`OpenstatAdIDHash` UInt64,
`OpenstatSourceIDHash` UInt64,
`UTMSourceHash` UInt64,
`UTMMediumHash` UInt64,
`UTMCampaignHash` UInt64,
`UTMContentHash` UInt64,
`UTMTermHash` UInt64,
`FromHash` UInt64,
`WebVisorEnabled` UInt8,
`WebVisorActivity` UInt32,
`ParsedParams` Nested(
Key1 String,
Key2 String,
Key3 String,
Key4 String,
Key5 String,
ValueDouble Float64),
`Market` Nested(
Type UInt8,
GoalID UInt32,
OrderID String,
OrderPrice Int64,
PP UInt32,
DirectPlaceID UInt32,
DirectOrderID UInt32,
DirectBannerID UInt32,
GoodID String,
GoodName String,
GoodQuantity Int32,
GoodPrice Int64),
`IslandID` FixedString(16)
)
ENGINE = CollapsingMergeTree(Sign)
PARTITION BY toYYYYMM(StartDate)
ORDER BY (CounterID, StartDate, intHash32(UserID), VisitID)
SAMPLE BY intHash32(UserID)
SETTINGS index_granularity = 8192;
4>>导入数据
clickhouse-client --query "INSERT INTO tutorial.hits_v1 FORMAT TSV" --max_insert_block_size=100000 < hits_v1.tsv clickhouse-client --query "INSERT INTO tutorial.visits_v1 FORMAT TSV" --max_insert_block_size=100000 < visits_v1.tsv
数据导入完后,优化一下表
clickhouse-client --query "OPTIMIZE TABLE tutorial.hits_v1 FINAL" clickhouse-client --query "OPTIMIZE TABLE tutorial.visits_v1 FINAL"
优化完之后,查看下表数据量
clickhouse-client --query "SELECT COUNT(*) FROM tutorial.hits_v1" clickhouse-client --query "SELECT COUNT(*) FROM tutorial.visits_v1"


[root@centf8118 tmp]# clickhouse-client --query "SELECT COUNT(*) FROM tutorial.hits_v1"
8873898
[root@centf8118 tmp]# clickhouse-client --query "SELECT COUNT(*) FROM tutorial.visits_v1"
1676861
查询数据
SELECT
StartURL AS URL,
AVG(Duration) AS AvgDuration
FROM tutorial.visits_v1
WHERE StartDate BETWEEN '2014-03-23' AND '2014-03-30'
GROUP BY URL
ORDER BY AvgDuration DESC
LIMIT 10 SELECT
sum(Sign) AS visits,
sumIf(Sign, has(Goals.ID, 1105530)) AS goal_visits,
(100. * goal_visits) / visits AS goal_percent
FROM tutorial.visits_v1
WHERE (CounterID = 912887) AND (toYYYYMM(StartDate) = 201403) AND (domain(StartURL) = 'yandex.ru')
【ClickHouse】4:clickhouse基本操作二 建库建表导数据的更多相关文章
- MySQL建库建表
一直使用SQL SERVER 数据库:最近项目使用MY SQL感觉还是有一点不适应.不过熟悉之后就会好很多. MY SQL 安装之后会有一个管理工具MySQL Workbench 感觉不太好用,数据库 ...
- 【ITOO 2】.NET 动态建库建表:使用SQL字符串拼接方式
导读:在最近接手的项目(高效云平台)中,有一个需求是要当企业用户注册时,给其动态的新建一个库和表.刚开始接手的时候,是一点头绪都没有,然后查了一些资料,也问了问上一版本的师哥师姐,终于有了点头绪.目前 ...
- 【ITOO 3】.NET 动态建库建表:实用EF框架提供的codeFirst实现动态建库
导读:在上篇博客中,介绍了使用SQL字符拼接的方式,实现动态建库建表的方法.这样做虽然也能够实现效果,但是,太麻烦,而且,如果改动表结构,字段的话,会对代码修改很多.但是EF给我们提供了一种代码先行的 ...
- SQL Server建库-建表-建约束
----------------------------------------SQL Server建库-建表-建约束创建School数据库------------------------------ ...
- 使用T-sql建库建表建约束
为什么要使用sql语句建库建表? 现在假设这样一个场景,公司的项目经过测试没问题后需要在客户的实际环境中进行演示,那就需要对数据进行移植,现在问题来了:客户的数据库版本和公司开发阶段使用的数据库不兼容 ...
- mysql那些事(4)建库建表编码的选择
mysql建数据库或者建表的时候会遇到选择编码的问题,以前我们都是习惯性的选择utf8,但是在mysql在5.5.3版本后加了utf8mb4的编码,utf8mb4可以存4个字节Unicode,mb4就 ...
- C# 利用*.SQL文件自动建库建表等的类
/// <summary> /// 自动建库建表 /// </summary> public class OperationSqlFile { SqlConnection sq ...
- Mysql建库建用户建表等常用命令
格式: mysql -h主机地址 -u用户名 -p用户密码 1.连接到本机上的MYSQL.首先打开DOS窗口,然后进入目录mysql\bin,再键入命令mysql -u root -p,回车后提示你输 ...
- oracle 11g 建库 建表 增 删 改 查 约束
一.建库 1.(点击左上角带绿色+号的按钮) 2.(进入这个界面,passowrd为密码.填写完后点击下面一排的Test按钮进行测试,无异常就点击Connect) 二.建表 1-1. create t ...
- 【平台开发】— 4.mysql建库建表
本想着把前端脚手架run起来了,然后就可以借着登录来捋一下前后端交互的过程.但是后端导入JPA的时候就发现了,还没有数据库. 既然是本着学习的目的,那咱也不想只在后端写死返回的数据,要做就做全套. 一 ...
随机推荐
- Mac Docker 挂载数据卷失败
问题描述: docker: Error response from daemon: Mounts denied: The path /srv/docker/bind is not shared fro ...
- golang cron定时任务简单实现
目录 星号(*) 斜线(/) 逗号(,) 连字符 (-) 问好 (?) 常用cron举例 使用说明 golang 实现定时服务很简单,只需要简单几步代码便可以完成,不需要配置繁琐的服务器,直接在代码中 ...
- ContextCapture-硬件配置推荐
ContextCapture倾斜摄影的空三计算.三维建模应用.非常耗费硬件资源,适当调整硬件配置,可以显著提高模型处理时间. 硬件常见问题 随着倾斜摄影建模算法成熟,应用越来越广泛,数据量越来越大,需 ...
- 西门子PLC设备如何接入AIRIOT物联网低代码平台 ?
西门子PLC设备广泛应用于工业控制领域,高性能和稳定是它最大的优势.下面我们要把西门子300 1200 1500 PLC设备连接到AIRIOT物联网低代码平台,具体操作如下所示: 西门子驱动配置(配套 ...
- smtplib详解,发送邮件
创建邮箱账号 1.官网登录邮箱. 2.在邮箱的主界面找到"设置",新版的主界面与旧版稍有不同,一般位于上方,齿轮状的即是. 3.点击齿轮状的设置标志,会弹出一个下拉菜单,在最后有我 ...
- 记一次asp.net 8 服务器爆满的解决过程
1.描述一下服务器配置: 一台2c4g的centos,做api接口反代 一台8c16g的windows 2019 作为实际服务器,跑了iis,sql server,mongodb,redis 2.业务 ...
- Base64 java图片工具类
import org.springframework.util.Assert; import javax.imageio.ImageIO; import java.awt.image.Buffered ...
- 关于sass(scss)、less、postcss、stylus的简介与区别
为什么会出现css预处理器 CSS不是一种编程语言,仅仅只能用来编写网站样式,在web初期时,网站的搭建还比较基础,所需要的样式往往也很简单.但是随着用户需求的增加以及网站技术的升级,css一成不 ...
- Ceph配置与认证授权
目录 Ceph配置与认证授权 1. 为什么现在不采用修改配置文件的方式了呢? 2. Ceph元变量 3. 使用命令行修改配置 3.1 全部修改(使用服务名) 3.2 部分修改(修改进程) 3.3 临时 ...
- 授权调用: 介绍 Transformers 智能体 2.0
简要概述 我们推出了 Transformers 智能体 2.0! ⇒ 在现有智能体类型的基础上,我们新增了两种能够 根据历史观察解决复杂任务的智能体. ⇒ 我们致力于让代码 清晰.模块化,并确保最终提 ...