一、什么是杂项维度

简单地说，杂项维度就是一种包含的数据具有很少可能值的维度。事务型商业过程通常产生一系列混杂的、低基数的标志位或状态信息。与其为每个标志或属性定义不同的维度，不如建立单独的将不同维度合并到一起的杂项维度。这些维度，通常在一个模式中标记为事务型概要维度，一般不需要所有属性可能值的笛卡尔积，但应该至少包含实际发生在源数据中的组合值。
例如，在销售订单中，可能存在有很多离散数据（yes-no这种开关类型的值），如：

verification_ind（如果订单已经被审核，值为yes）
credit_check_flag（表示此订单的客户信用状态是否已经被检查）
new_customer_ind（如果这是新客户的首个订单，值为yes）
web_order_flag（表示一个订单是在线上订单还是线下订单）

这类数据常被用于增强销售分析，其特点是属性可能很多但每种属性的可能值却很少。

二、处理杂项维度的常用方法

在建模复杂的操作型源系统时，经常会遭遇大量五花八门的标志或状态信息，它们包含小范围的离散值。处理这些较低基数的标志或状态位通常可以采用以下几种方法。

1. 忽略这些标志和指标

姑且将这种回避问题的处理方式也算作方法之一吧。在开发ETL系统时，ETL开发小组可以向业务用户询问有关忽略这些标志的必要问题，如果它们是微不足道的。但是这样的方案通常立即就被否决了，因为有人偶尔还需要它们。

2. 保持事实表行中的标志位不变

还以销售订单为例，和源数据库一样，我们可以在事实表中也建立这四个标志位字段。在装载事实表时，除了订单号以外，同时装载这四个字段的数据，这些字段没有对应的维度表，而是作为订单的属性保留在事实表中。
这种处理方法简单直接，装载程不需要做大量的修改，也不需要建立相关的维度表。但是一般我们不希望在事实表中存储难以识别的标志位，尤其是当每个标志位还配有一个文字描述字段时。不要在事实表行中存储包含大量字符的描述符，因为每一行都会有文字描述，它们可能会使表快速地膨胀。在行中保留一些文本标志是令人反感的，比较好的做法是分离出单独的维度表保存这些标志位字段的数据，它们的数据量很小，并且极少改变。事实表通过维度表的代理键引用这些标志。

3. 将每个标志位放入其自己的维度中

例如，为销售订单的四个标志位分别建立四个对应的维度表。在装载事实表数据前先处理这四个维度表，必要时生成新的代理键，然后在事实表中引用这些代理键。这种方法是将杂项维度当做普通维度来处理，多数情况下这也是不合适的。
首先，当类似的标志或状态位字段比较多时，需要建立很多的维度表，其次事实表的外键数也会大量增加。处理这些新增的维度表和外键需要大量修改数据装载脚本，还会增加出错的机会，同时会给ETL的开发、维护、测试过程带来很大的工作量。最后，杂项维度的数据有自己明显的特点，即属性多但每个属性的值少，并且极少修改，这种特点决定了它应该与普通维度的处理区分开。
作为一个经验值，如果外键的数量处于合理的范围中，即不超过20个，则在事实表中增加不同的外键是可以接受的。但是，若外键列表已经很长，则应该避免将更多的外键加入到事实表中。

4. 将标志位字段存储到订单维度中

可以将标志位字段添加到订单维度表中。上一篇我们将订单维度表作为退化维度删除了，因为它除了订单号，没有其它任何属性。与其将订单号当成是退化维度，不如视其为将低基数标志或状态作为属性的普通维度。事实表通过引用订单维度表的代理键，关联到所有的标志位信息。
尽管该方法精确地表示了数据关系，但依然存在前面讨论的问题。在订单维度表中，每条业务订单都会存在对应的一条销售订单记录，该维度表的记录数会膨胀到跟事实表一样多，而在如此多的数据中，每个标志位字段都存在大量的冗余。通常维度表应该比事实表小得多。

5. 使用杂项维度

处理这些标志位的适当替换方法是将它们包装为一个杂项维度，其中放置各种离散的标志或状态数据。
对杂项维度数据量的估算会影响其建模策略。如果某个简单的杂项维度包含10个二值标识，则最多将包含1024（2^10）行。杂项维度可提供所有标识的组合，并用于基于这些标识的约束和报表。事实表与杂项维度之间存在一个单一的、小型的代理键。
另一方面，如果具有高度非关联的属性，包含更多的数量值，则将它们合并为单一的杂项维度是不合适的。如果存在5个标识，每个仅包含3个值，则单一杂项维度是这些属性的最佳选择，因为维度最多仅有243（3^5）行。但是如果5个没有关联的标识，每个具有100个可能值，建议建立不同维度，因为单一杂项维度表最大可能存在1亿（100^5）行。
关于杂项维度的一个微妙的问题是，在杂项维度中行的组合确定并已知的前提下，是应该事先为所有组合的完全笛卡尔积建立行，还是建立杂项维度行，只用于保存那些在源系统中出现的组合情况的数据。答案要看大概有多少可能的组合，最大行数是多少。一般来说，理论上组合的数量较小，比如只有几百行时，可以预装载所有组合的数据，而组合的数量大，那么在数据获取时，当遇到新标志或指标时再建立杂项维度行。当然，如果源数据中用到了全体组合时，那别无选择只能预先装载好全部杂项维度数据。

三、新增销售订单属性杂项维度

图1显示了增加杂项维度表后的数据仓库模式。

图1

给现有的数据仓库新增一个销售订单属性杂项维度。需要新增一个名为sales_order_attribute_dim的杂项维度表，该表包括四个yes-no列：verification_ind、credit_check_flag、new_customer_ind和web_order_flag，各列的含义已经在本篇开头说明。每个列可以有两个可能值中的一个，Y 或 N，因此sales_order_attribute_dim表最多有16（2^4）行。假设这16行已经包含了所有可能的组合，并且不考虑杂项维度修改的情况，则可以预装载这个维度，并且只需装载一次。
执行下面的脚本修改数据库结构。这个脚本做了工作：

给源数据库里的sales_order表增加对应的四个属性列。
重建外部表，增加杂项属性。
给销售订单原始数据存储表增加杂项属性。
建立sales_order_attribute_dim表。
向表中预装载全部16种可能的数据。
给销售订单事实表添加杂项维度代理键字段。

-- 给源库的销售订单表增加对应的属性
use source;
alter table sales_order
  add verification_ind char (1) after product_code,
  add credit_check_flag char (1) after verification_ind,
  add new_customer_ind char (1) after credit_check_flag,
  add web_order_flag char (1) after new_customer_ind ; 

-- 重建外部表，增加杂项属性，列的顺序必须和源表一致
set search_path=ext;
drop external table sales_order;
create external table sales_order
(
  order_number int,
  customer_number int,
  product_code int,
  verification_ind char(1),
  credit_check_flag char(1),
  new_customer_ind char(1),
  web_order_flag char(1),
  order_date timestamp,
  request_delivery_date timestamp,
  entry_date timestamp,
  order_amount decimal(10 , 2 ),
  order_quantity int
)
location ('pxf://mycluster/data/ext/sales_order?profile=hdfstextsimple')
  format 'text' (delimiter=e',', null='null');     

comment on table sales_order is '销售订单外部表';
comment on column sales_order.order_number is '订单号';
comment on column sales_order.customer_number is '客户编号';
comment on column sales_order.product_code is '产品编码';
comment on column sales_order.verification_ind is '审核标志';
comment on column sales_order.credit_check_flag is '信用检查标志';
comment on column sales_order.new_customer_ind is '客户首个订单标志';
comment on column sales_order.web_order_flag is '线上订单标志';
comment on column sales_order.order_date is '订单日期';
comment on column sales_order.request_delivery_date is '请求交付日期';
comment on column sales_order.entry_date is '登记日期';
comment on column sales_order.order_amount is '销售金额';
comment on column sales_order.order_quantity is '销售数量';

-- 给销售订单过渡表增加对应的属性
set search_path=rds;
alter table sales_order add column verification_ind char(1) default null;
alter table sales_order add column credit_check_flag char(1) default null;
alter table sales_order add column new_customer_ind char(1) default null;
alter table sales_order add column web_order_flag char(1) default null;

comment on column sales_order.verification_ind is '审核标志';
comment on column sales_order.credit_check_flag is '信用检查标志';
comment on column sales_order.new_customer_ind is '客户首个订单标志';
comment on column sales_order.web_order_flag is '线上订单标志'; 

set search_path=tds;
-- 建立杂项维度表
create table sales_order_attribute_dim (
    sales_order_attribute_sk int,
    verification_ind char(1),
    credit_check_flag char(1),
    new_customer_ind char(1),
    web_order_flag char(1)
);    

comment on table sales_order_attribute_dim is '杂项维度表';
comment on column sales_order_attribute_dim.sales_order_attribute_sk is '杂项维度代理键';
comment on column sales_order_attribute_dim.verification_ind is '审核标志';
comment on column sales_order_attribute_dim.credit_check_flag is '信用检查标志';
comment on column sales_order_attribute_dim.new_customer_ind is '客户首个订单标志';
comment on column sales_order_attribute_dim.web_order_flag is '线上订单标志';    

-- 生成杂项维度数据，共插入16条记录
insert into sales_order_attribute_dim values (1, 'n', 'n', 'n', 'n');
insert into sales_order_attribute_dim values (2, 'n', 'n', 'n', 'y');
insert into sales_order_attribute_dim values (3, 'n', 'n', 'y', 'n');
insert into sales_order_attribute_dim values (4, 'n', 'n', 'y', 'y');
insert into sales_order_attribute_dim values (5, 'n', 'y', 'n', 'n');
insert into sales_order_attribute_dim values (6, 'n', 'y', 'n', 'y');
insert into sales_order_attribute_dim values (7, 'n', 'y', 'y', 'n');
insert into sales_order_attribute_dim values (8, 'n', 'y', 'y', 'y');
insert into sales_order_attribute_dim values (9, 'y', 'n', 'n', 'n');
insert into sales_order_attribute_dim values (10, 'y', 'n', 'n', 'y');
insert into sales_order_attribute_dim values (11, 'y', 'n', 'y', 'n');
insert into sales_order_attribute_dim values (12, 'y', 'n', 'y', 'y');
insert into sales_order_attribute_dim values (13, 'y', 'y', 'n', 'n');
insert into sales_order_attribute_dim values (14, 'y', 'y', 'n', 'y');
insert into sales_order_attribute_dim values (15, 'y', 'y', 'y', 'n');
insert into sales_order_attribute_dim values (16, 'y', 'y', 'y', 'y');

-- 建立杂项维度外键
alter table sales_order_fact add column sales_order_attribute_sk int default null;
comment on column sales_order_fact.sales_order_attribute_sk is '杂项维度代理键';

四、修改定期数据装载函数

由于有了一个新的维度，必须修改定期数据装载函数。下面显示了修改后的fn_regular_load函数。

create or replace function fn_regular_load ()
returns void as
$$
declare
    -- 设置scd的生效时间
    v_cur_date date := current_date;
    v_pre_date date := current_date - 1;
    v_last_load date;
begin
    -- 分析外部表
    analyze ext.customer;
    analyze ext.product;
    analyze ext.sales_order;          

    -- 将外部表数据装载到原始数据表
    truncate table rds.customer;
    truncate table rds.product;           

    insert into rds.customer select * from ext.customer;
    insert into rds.product select * from ext.product;
    insert into rds.sales_order
    select order_number,
           customer_number,
           product_code,
           order_date,
           entry_date,
           order_amount,
           order_quantity,
           request_delivery_date,
           verification_ind,
           credit_check_flag,
           new_customer_ind,
           web_order_flag
      from ext.sales_order;          

    -- 分析rds模式的表
    analyze rds.customer;
    analyze rds.product;
    analyze rds.sales_order;          

    -- 设置cdc的上限时间
    select last_load into v_last_load from rds.cdc_time;
    truncate table rds.cdc_time;
    insert into rds.cdc_time select v_last_load, v_cur_date;          

    -- 装载客户维度
    insert into tds.customer_dim
    (customer_number,
     customer_name,
     customer_street_address,
     customer_zip_code,
     customer_city,
     customer_state,
     shipping_address,
     shipping_zip_code,
     shipping_city,
     shipping_state,
     isdelete,
     version,
     effective_date)
    select case flag
                when 'D' then a_customer_number
                else b_customer_number
            end customer_number,
           case flag
                when 'D' then a_customer_name
                else b_customer_name
            end customer_name,
           case flag
                when 'D' then a_customer_street_address
                else b_customer_street_address
            end customer_street_address,
           case flag
                when 'D' then a_customer_zip_code
                else b_customer_zip_code
            end customer_zip_code,
           case flag
                when 'D' then a_customer_city
                else b_customer_city
            end customer_city,
           case flag
                when 'D' then a_customer_state
                else b_customer_state
            end customer_state,
           case flag
                when 'D' then a_shipping_address
                else b_shipping_address
            end shipping_address,
           case flag
                when 'D' then a_shipping_zip_code
                else b_shipping_zip_code
            end shipping_zip_code,
           case flag
                when 'D' then a_shipping_city
                else b_shipping_city
            end shipping_city,
           case flag
                when 'D' then a_shipping_state
                else b_shipping_state
            end shipping_state,
           case flag
                when 'D' then true
                else false
            end isdelete,
           case flag
                when 'D' then a_version
                when 'I' then 1
                else a_version + 1
            end v,
           v_pre_date
      from (select a.customer_number a_customer_number,
                   a.customer_name a_customer_name,
                   a.customer_street_address a_customer_street_address,
                   a.customer_zip_code a_customer_zip_code,
                   a.customer_city a_customer_city,
                   a.customer_state a_customer_state,
                   a.shipping_address a_shipping_address,
                   a.shipping_zip_code a_shipping_zip_code,
                   a.shipping_city a_shipping_city,
                   a.shipping_state a_shipping_state,
                   a.version a_version,
                   b.customer_number b_customer_number,
                   b.customer_name b_customer_name,
                   b.customer_street_address b_customer_street_address,
                   b.customer_zip_code b_customer_zip_code,
                   b.customer_city b_customer_city,
                   b.customer_state b_customer_state,
                   b.shipping_address b_shipping_address,
                   b.shipping_zip_code b_shipping_zip_code,
                   b.shipping_city b_shipping_city,
                   b.shipping_state b_shipping_state,
                   case when a.customer_number is null then 'I'
                        when b.customer_number is null then 'D'
                        else 'U'
                    end flag
              from v_customer_dim_latest a
              full join rds.customer b on a.customer_number = b.customer_number
             where a.customer_number is null -- 新增
                or b.customer_number is null -- 删除
                or (a.customer_number = b.customer_number
                    and not
                           (coalesce(a.customer_name,'') = coalesce(b.customer_name,'')
                        and coalesce(a.customer_street_address,'') = coalesce(b.customer_street_address,'')
                        and coalesce(a.customer_zip_code,0) = coalesce(b.customer_zip_code,0)
                        and coalesce(a.customer_city,'') = coalesce(b.customer_city,'')
                        and coalesce(a.customer_state,'') = coalesce(b.customer_state,'')
                        and coalesce(a.shipping_address,'') = coalesce(b.shipping_address,'')
                        and coalesce(a.shipping_zip_code,0) = coalesce(b.shipping_zip_code,0)
                        and coalesce(a.shipping_city,'') = coalesce(b.shipping_city,'')
                        and coalesce(a.shipping_state,'') = coalesce(b.shipping_state,'')
                        ))) t
             order by coalesce(a_customer_number, 999999999999), b_customer_number limit 999999999999;          

    -- 重载PA客户维度
    truncate table pa_customer_dim;
    insert into pa_customer_dim
    select customer_sk,
           customer_number,
           customer_name,
           customer_street_address,
           customer_zip_code,
           customer_city,
           customer_state,
           isdelete,
           version,
           effective_date,
           shipping_address,
           shipping_zip_code,
           shipping_city,
           shipping_state
      from customer_dim
     where customer_state = 'pa';         

    -- 装载产品维度
    insert into tds.product_dim
    (product_code,
     product_name,
     product_category,
     isdelete,
     version,
     effective_date)
    select case flag
                when 'D' then a_product_code
                else b_product_code
            end product_code,
           case flag
                when 'D' then a_product_name
                else b_product_name
            end product_name,
           case flag
                when 'D' then a_product_category
                else b_product_category
            end product_category,
           case flag
                when 'D' then true
                else false
            end isdelete,
           case flag
                when 'D' then a_version
                when 'I' then 1
                else a_version + 1
            end v,
           v_pre_date
      from (select a.product_code a_product_code,
                   a.product_name a_product_name,
                   a.product_category a_product_category,
                   a.version a_version,
                   b.product_code b_product_code,
                   b.product_name b_product_name,
                   b.product_category b_product_category,
                   case when a.product_code is null then 'I'
                        when b.product_code is null then 'D'
                        else 'U'
                    end flag
              from v_product_dim_latest a
              full join rds.product b on a.product_code = b.product_code
             where a.product_code is null -- 新增
                or b.product_code is null -- 删除
                or (a.product_code = b.product_code
                    and not
                           (a.product_name = b.product_name
                        and a.product_category = b.product_category))) t
             order by coalesce(a_product_code, 999999999999), b_product_code limit 999999999999;          

    -- 装载销售订单事实表
    insert into sales_order_fact
    select a.order_number,
           customer_sk,
           product_sk,
           e.date_sk,
           e.year * 100 + e.month,
           order_amount,
           order_quantity,
           f.date_sk,
           g.sales_order_attribute_sk
      from rds.sales_order a,
           v_customer_dim_his c,
           v_product_dim_his d,
           date_dim e,
           date_dim f,
           sales_order_attribute_dim g,
           rds.cdc_time h
     where a.customer_number = c.customer_number
       and a.order_date >= c.effective_date
       and a.order_date < c.expiry_date
       and a.product_code = d.product_code
       and a.order_date >= d.effective_date
       and a.order_date < d.expiry_date
       and date(a.order_date) = e.date
       and date(a.request_delivery_date) = f.date
       and a.verification_ind = g.verification_ind
       and a.credit_check_flag = g.credit_check_flag
       and a.new_customer_ind = g.new_customer_ind
       and a.web_order_flag = g.web_order_flag
       and a.entry_date >= h.last_load and a.entry_date < h.current_load;                        

    -- 分析tds模式的表
    analyze customer_dim;
    analyze product_dim;
    analyze sales_order_fact;          

    -- 更新时间戳表的last_load字段
    truncate table rds.cdc_time;
    insert into rds.cdc_time select v_cur_date, v_cur_date;          

end;
$$
language plpgsql;

函数做了以下两点修改：

装载rds.sales_order时增加了四个杂项属性。
装载事实表时，关联了sales_order_attribute_dim维度表，为事实表中装载杂项维度代理键。

注意，杂项属性维度数据已经预装载，所以在定期装载脚本中只需要修改处理事实表的部分。源数据中有四个属性列，而事实表中只对应一列，因此需要使用四列关联条件的组合确定杂项维度表的代理键值，并装载到事实表中。

五、测试

1. 准备测试数据

使用下面的脚本添加八个销售订单。

use source;
drop table if exists temp_sales_order_data;
create table temp_sales_order_data as select * from sales_order where 1=0;   

set @start_date := unix_timestamp(date_add(current_date, interval -1 day));
set @end_date := unix_timestamp(current_date);       

set @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));
set @request_delivery_date := from_unixtime(unix_timestamp(date_add(current_date, interval 5 day)) + rand() * 86400);
set @amount := floor(1000 + rand() * 9000);
set @quantity := floor(10 + rand() * 90);
insert into temp_sales_order_data
values (1, 1, 1, 'y', 'y', 'n', 'y', @order_date, @request_delivery_date, @order_date, @amount, @quantity);  

set @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));
set @request_delivery_date := from_unixtime(unix_timestamp(date_add(current_date, interval 5 day)) + rand() * 86400);
set @amount := floor(1000 + rand() * 9000);
set @quantity := floor(10 + rand() * 90);
insert into temp_sales_order_data
values (2, 2, 2, 'n', 'n', 'n', 'n', @order_date, @request_delivery_date, @order_date, @amount, @quantity);

set @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));
set @request_delivery_date := from_unixtime(unix_timestamp(date_add(current_date, interval 5 day)) + rand() * 86400);
set @amount := floor(1000 + rand() * 9000);
set @quantity := floor(10 + rand() * 90);
insert into temp_sales_order_data
values (3, 3, 3, 'y', 'y', 'n', 'n', @order_date, @request_delivery_date, @order_date, @amount, @quantity);

set @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));
set @request_delivery_date := from_unixtime(unix_timestamp(date_add(current_date, interval 5 day)) + rand() * 86400);
set @amount := floor(1000 + rand() * 9000);
set @quantity := floor(10 + rand() * 90);
insert into temp_sales_order_data
values (4, 4, 4, 'y', 'n', 'n', 'n', @order_date, @request_delivery_date, @order_date, @amount, @quantity);

set @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));
set @request_delivery_date := from_unixtime(unix_timestamp(date_add(current_date, interval 5 day)) + rand() * 86400);
set @amount := floor(1000 + rand() * 9000);
set @quantity := floor(10 + rand() * 90);
insert into temp_sales_order_data
values (5, 11, 1, 'n', 'y', 'y', 'y', @order_date, @request_delivery_date, @order_date, @amount, @quantity);

set @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));
set @request_delivery_date := from_unixtime(unix_timestamp(date_add(current_date, interval 5 day)) + rand() * 86400);
set @amount := floor(1000 + rand() * 9000);
set @quantity := floor(10 + rand() * 90);
insert into temp_sales_order_data
values (6, 12, 2, 'n', 'y', 'y', 'n', @order_date, @request_delivery_date, @order_date, @amount, @quantity);

set @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));
set @request_delivery_date := from_unixtime(unix_timestamp(date_add(current_date, interval 5 day)) + rand() * 86400);
set @amount := floor(1000 + rand() * 9000);
set @quantity := floor(10 + rand() * 90);
insert into temp_sales_order_data
values (7, 13, 3, 'y', 'y', 'y', 'n', @order_date, @request_delivery_date, @order_date, @amount, @quantity);

set @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));
set @request_delivery_date := from_unixtime(unix_timestamp(date_add(current_date, interval 5 day)) + rand() * 86400);
set @amount := floor(1000 + rand() * 9000);
set @quantity := floor(10 + rand() * 90);
insert into temp_sales_order_data
values (8, 14, 4, 'y', 'n', 'y', 'n', @order_date, @request_delivery_date, @order_date, @amount, @quantity);

insert into sales_order
select null,
       customer_number,
       product_code,
       verification_ind,
       credit_check_flag,
       new_customer_ind,
       web_order_flag,
       order_date,
       request_delivery_date,
       entry_date,
       order_amount,
       order_quantity
  from temp_sales_order_data t1
 order by t1.order_date;
commit;

2. 执行定期装载函数并查看结果

~/regular_etl.sh

可以使用下面的分析性查询确认装载是否正确。该查询分析出检查了信用状态的新用户有所占的比例。

select round(cast(checked as float) / (checked + not_checked) * 100)||' % '
  from (select
sum(case when credit_check_flag='y' then 1 else 0 end) checked,
         sum(case when credit_check_flag='n' then 1 else 0 end) not_checked
          from sales_order_fact a, sales_order_attribute_dim b
         where new_customer_ind = 'y'
          and a.sales_order_attribute_sk = b.sales_order_attribute_sk) t;

查询结果如图2所示。

图2

HAWQ取代传统数仓实践（十）——维度表技术之杂项维度的更多相关文章

HAWQ取代传统数仓实践（十九）——OLAP
一.OLAP简介 1. 概念 OLAP是英文是On-Line Analytical Processing的缩写,意为联机分析处理.此概念最早由关系数据库之父E.F.Codd于1993年提出.OLAP允 ...
HAWQ取代传统数仓实践（十六）——事实表技术之迟到的事实
一.迟到的事实简介数据仓库通常建立于一种理想的假设情况下,这就是数据仓库的度量(事实记录)与度量的环境(维度记录)同时出现在数据仓库中.当同时拥有事实记录和正确的当前维度行时,就能够从容地首先维护维 ...
HAWQ取代传统数仓实践（十三）——事实表技术之周期快照
一.周期快照简介周期快照事实表中的每行汇总了发生在某一标准周期,如一天.一周或一月的多个度量.其粒度是周期性的时间段,而不是单个事务.周期快照事实表通常包含许多数据的总计,因为任何与事实表时间范围一 ...
HAWQ取代传统数仓实践（十一）——维度表技术之维度合并
有一种合并维度的情况,就是本来属性相同的维度,因为某种原因被设计成重复的维度属性.例如,在销售订单示例中,随着数据仓库中维度的增加,我们会发现有些通用的数据存在于多个维度中.客户维度的客户地址相关信息 ...
HAWQ取代传统数仓实践（十八）——层次维度
一.层次维度简介大多数维度都具有一个或多个层次.例如,示例数据仓库中的日期维度就有一个四级层次:年.季度.月和日.这些级别用date_dim表里的列表示.日期维度是一个单路径层次,因为除了年-季度- ...
HAWQ取代传统数仓实践（十二）——维度表技术之分段维度
一.分段维度简介在客户维度中,最具有分析价值的属性就是各种分类,这些属性的变化范围比较大.对某个个体客户来说,可能的分类属性包括:性别.年龄.民族.职业.收入和状态,例如,新客户.活跃客户.不活跃客 ...
HAWQ取代传统数仓实践（八）——维度表技术之角色扮演维度
单个物理维度可以被事实表多次引用,每个引用连接逻辑上存在差异的角色维度.例如,事实表可以有多个日期,每个日期通过外键引用不同的日期维度,原则上每个外键表示不同的日期维度视图,这样引用具有不同的含义.这 ...
HAWQ取代传统数仓实践（十五）——事实表技术之无事实的事实表
一.无事实事实表简介在多维数据仓库建模中,有一种事实表叫做"无事实的事实表".普通事实表中,通常会保存若干维度外键和多个数字型度量,度量是事实表的关键所在.然而在无事实的事实表中 ...
HAWQ取代传统数仓实践（九）——维度表技术之退化维度
退化维度技术减少维度的数量,简化维度数据仓库模式.简单的模式比复杂的更容易理解,也有更好的查询性能. 有时,维度表中除了业务主键外没有其它内容.例如,在本销售订单示例中,订单维度表除了订 ...

随机推荐

C#驱动级模拟按键操作
C#驱动级模拟按键操作 2013-09-26 03:17 ·AB叔 447 3 <- 点击左侧的数字“攒”一个吧昨天遇到一个程序自动输入财付通密码的任务. 因为财付通密码控件是有安全保护的,所 ...
(转)Nginx反向代理设置从80端口转向其他端口
from :http://www.cnblogs.com/wuyou/p/3455381.html Nginx反向代理设置从80端口转向其他端口反向代理(Reverse Proxy)方式是指以 ...
LeetCode：累加数【306】
LeetCode:累加数[306] 题目描述累加数是一个字符串,组成它的数字可以形成累加序列. 一个有效的累加序列必须至少包含 3 个数.除了最开始的两个数以外,字符串中的其他数都等于它之前两个数相 ...
LLServer--》对LevelDB的应用
http://code.google.com/p/llserver/ 查看libs path的路径 LD_DEBUG=libs /usr/bin/llserver -h
nginx2
Nginx的高可用是keeplived,keeplived是为lvs服务的. Nginx上分别安装keepalived,keepalived之间通过心跳交流,主节点宕机备节点起来.keepalived ...
对于C++指针的详细理解
(1)每一个变量都有一个内存位置,每一个内存位置都定义了可使用连字号(&)运算符访问的地址,它表示了在内存中的一个地址. eg: int var1; &var1 表示var1的地址 ...
Python3.x：日期库dateutil简介
Python3.x:日期库dateutil简介安装 pip install python-dateutil 关于parser #字符串可以很随意,可以用时间日期的英文单词,可以用横线.逗号.空格等做 ...
Hadoop:eclipse配置hadoop-eclipse-plugin(版本hadoop2.7.3)
配置hadoop-eclipse-plugin(版本hadoop2.7.3): 1:首先下载我们需要的 hadoop-eclipse-plugin-2.7.3.jar,winutils.exe 和 ...
NOIP 数字游戏
描述丁丁最近沉迷于一个数字游戏之中.这个游戏看似简单,但丁丁在研究了许多天之后却发觉原来在简单的规则下想要赢得这个游戏并不那么容易.游戏是这样的,在你面前有一圈整数(一共n个),你要按顺序将其分为m ...
fabric动态获取远程目录列表
#!/usr/bin/pythonfrom fabric.api import *env.user='root'env.hosts=['172.10.224.183','172.10.224.132' ...

HAWQ取代传统数仓实践（十）——维度表技术之杂项维度