SQL Support and Workarounds
此文章来自官方文档,说明了,对于不支持pg 标准的sql 查询的变通方法,实际使用的时候有很大的指导意义
As Citus provides distributed functionality by extending PostgreSQL, it is compatible with PostgreSQL constructs. This means that users can use the tools and features that come with the rich and extensible PostgreSQL ecosystem for distributed tables created with Citus.
Citus supports all SQL queries on distributed tables, with only these exceptions:
- Correlated subqueries
- Recursive/modifying CTEs
- TABLESAMPLE
- SELECT … FOR UPDATE
- Grouping sets
- Window functions that do not include the distribution column in PARTITION BY
Furthermore, in Multi-tenant Applications when queries are filtered by table Distribution Column to a single tenant then all SQL features work, including the ones above.
To learn more about PostgreSQL and its features, you can visit the PostgreSQL documentation.
For a detailed reference of the PostgreSQL SQL command dialect (which can be used as is by Citus users), you can see the SQL Command Reference.
Workarounds
Before attempting workarounds consider whether Citus is appropriate for your situation. Citus’ current version works well for real-time analytics and multi-tenant use cases.
Citus supports all SQL statements in the multi-tenant use-case. Even in the real-time analytics use-cases, with queries that span across nodes, Citus supports the majority of statements. The few types of unsupported queries are listed in Are there any PostgreSQL features not supported by Citus? Many of the unsupported features have workarounds; below are a number of the most useful.
JOIN a local and a distributed table
Attempting to execute a JOIN between a local table “local” and a distributed table “dist” causes an error:
SELECT * FROM local JOIN dist USING (id); /*
ERROR: relation local is not distributed
STATEMENT: SELECT * FROM local JOIN dist USING (id);
ERROR: XX000: relation local is not distributed
LOCATION: DistributedTableCacheEntry, metadata_cache.c:711
*/
Although you can’t join such tables directly, by wrapping the local table in a subquery or CTE you can make Citus’ recursive query planner copy the local table data to worker nodes. By colocating the data this allows the query to proceed.
-- either SELECT *
FROM (SELECT * FROM local) AS x
JOIN dist USING (id); -- or WITH x AS (SELECT * FROM local)
SELECT * FROM x
JOIN dist USING (id);
Remember that the coordinator will send the results in the subquery or CTE to all workers which require it for processing. Thus it’s best to either add the most specific filters and limits to the inner query as possible, or else aggregate the table. That reduces the network overhead which such a query can cause. More about this in Subquery/CTE Network Overhead.
INSERT…SELECT upserts lacking distribution column
Citus supports INSERT…SELECT…ON CONFLICT statements between co-located tables when the distribution column is among those columns selected and inserted. Also aggregates in the statement must include the distribution column in the GROUP BY clause. Failing to meet these conditions will raise an error:
ERROR: ON CONFLICT is not supported in INSERT ... SELECT via coordinator
If the upsert is an important operation in your application, the ideal solution is to model the data so that the source and destination tables are co-located, and so that the distribution column can be part of the GROUP BY clause in the upsert statement (if aggregating). However if this is not feasible then the workaround is to materialize the select query in a temporary distributed table, and upsert from there.
-- workaround for
-- INSERT INTO dest_table <query> ON CONFLICT <upsert clause> BEGIN;
CREATE UNLOGGED TABLE temp_table (LIKE dest_table);
SELECT create_distributed_table('temp_table', 'tenant_id');
INSERT INTO temp_table <query>;
INSERT INTO dest_table SELECT * FROM temp_table <upsert clause>;
DROP TABLE temp_table;
END;
Temp Tables: the Workaround of Last Resort
There are still a few queries that are unsupported even with the use of push-pull execution via subqueries. One of them is running window functions that partition by a non-distribution column.
Suppose we have a table called github_events, distributed by the column user_id. Then the following window function will not work:
-- this won't work SELECT repo_id, org->'id' as org_id, count(*)
OVER (PARTITION BY repo_id) -- repo_id is not distribution column
FROM github_events
WHERE repo_id IN (8514, 15435, 19438, 21692);
There is another trick though. We can pull the relevant information to the coordinator as a temporary table:
-- grab the data, minus the aggregate, into a local table CREATE TEMP TABLE results AS (
SELECT repo_id, org->'id' as org_id
FROM github_events
WHERE repo_id IN (8514, 15435, 19438, 21692)
); -- now run the aggregate locally SELECT repo_id, org_id, count(*)
OVER (PARTITION BY repo_id)
FROM results;
Creating a temporary table on the coordinator is a last resort. It is limited by the disk size and CPU of the node.
SQL Support and Workarounds的更多相关文章
- Stream Processing for Everyone with SQL and Apache Flink
Where did we come from? With the 0.9.0-milestone1 release, Apache Flink added an API to process rela ...
- OGR SQL (GEOM)
The OGRDataSource supports executing commands against a datasource via the OGRDataSource::ExecuteSQL ...
- OGR SQL
The OGRDataSource supports executing commands against a datasource via the OGRDataSource::ExecuteSQL ...
- One SQL to Rule Them All – an Efficient and Syntactically Idiomatic Approach to Management of Streams and Tables(中英双语)
文章标题 One SQL to Rule Them All – an Efficient and Syntactically Idiomatic Approach to Management of S ...
- About SQLite
About SQLite See Also... Features When to use SQLite Frequently Asked Questions Well-known Users Boo ...
- GUID vs INT Debate【转】
http://blogs.msdn.com/b/sqlserverfaq/archive/2010/05/27/guid-vs-int-debate.aspx I recently read a bl ...
- Dapper的完整扩展(转)
真心想说:其实...我不想用Dapper,如果OrmLite.Net支持参数化的话,也就没Dapper的什么事情了,对于OrmLite.Net只能做后续跟踪...... 这个其实是看了Dapper作者 ...
- 推荐一个第三方Qt库的集合
https://inqlude.org/ Stable libraries | Development versions | Unreleased | Commercial | All attica ...
- [20190416]11g下那些latch是Exclusive的.txt
[20190416]11g下那些latch是Exclusive的.txt --//昨天测试了11g下那些latch是共享的,链接:--//是否反过来剩下的都是Exclusive的.继续测试: 1.环境 ...
随机推荐
- English trip -- MC(情景课)3 C Do you have a sister?
xu言: 学了困难的在去看以前的课程,发现真的容易多了.So 学习的最好方法和提速方式,那就是找困难的不断去挑战.尝试.尝试.在尝试! Grmmar ['græmə] focus ['fəʊk ...
- linux使用flock文件锁
使用linux flock 文件锁实现任务锁定,解决冲突 格式: flock [-sxun][-w #] fd# flock [-sxon][-w #] file [-c] command flock ...
- 基础的shell脚本
#! /bin/sha="hello world"echo "A is " echo $a echo "<br />" ec ...
- Note: further occurrences of HTTP header parsing errors will be logged at DEBUG level
2018-03-23 18:32:21,690 [INFO] [http-nio-11007-exec-2] org.apache.coyote.http11.Http11Processor [Dir ...
- Mac下的nodeJs版本切换和升级
在我们开发多个项目的时候,因为框架支持的node版本不同,所以要切换多个node版本 首先我们要使用npm安装一个模块 n 的全局 1.npm install -g n 2.使用 n 加版本号就 ...
- JavaScript学习总结(十二)——JavaScript编写类
在工作中经常用到JavaScript,今天总结一下JavaScript编写类的几种写法以及这几种写法的优缺点,关于JavaScript编写类的方式,在网上看到很多,而且每个人的写法都不太一样,经常看到 ...
- HDU 1940
//比赛的时候卡了三个点.今天卡了两个点.真心不愿意再看了. // 自己按照直线相交的思路的敲得.题意里说了不是按照final rank 给的.但是.这样就和标程输出不同. //就是觉得AC突然就不那 ...
- [置顶]使用scrapy_redis,自动实时增量更新东方头条网全站新闻
存储使用mysql,增量更新东方头条全站新闻的标题 新闻简介 发布时间 新闻的每一页的内容 以及新闻内的所有图片.项目文件结构. 这是run.py的内容 1 #coding=utf-8 2 from ...
- MyEclipse持续性开发教程:用JPA和Spring管理数据(四)
MyEclipse红运年货节 在线购买低至69折!火爆开抢>> [MyEclipse最新版下载] 本教程介绍了MyEclipse中的一些基于JPA / Spring的功能.有关设置JPA项 ...
- 添加MyEclipse WebSphere Portal Server支持(一)
[周年庆]MyEclipse个人授权 折扣低至冰点!立即开抢>> [MyEclipse最新版下载] 一.支持WebSphere Portal Server 本文档介绍了如何支持和开发 We ...