(1). ReplacingMergeTree介绍

该引擎与MergeTree的不同之处在于它会去除区内,主键相同的重复项(仅在合并的时候执行).

(2). 案例

CREATE DATABASE test10;

USE test10;

# 1. 创建:ReplacingMergeTree引擎表(merchant_id+create_time+order_id组合即可去重)
CREATE TABLE IF NOT EXISTS t_order (
   order_id          Int32,
   order_no          String,
   customer_id       Int32,
   customer_name     String,
   customer_phone    String,
   customer_address  String,
   merchant_id        Int32,
   total_money        Decimal(9, 4),
   create_time        DateTime,
   order_status       Enum('S' = 1 , 'E' = 2)
) ENGINE = ReplacingMergeTree(create_time)   
  PARTITION BY merchant_id
  ORDER BY order_id ;
  

# 2. 插入3条数据(以租户ID进行分区)
INSERT INTO t_order(order_id,order_no,customer_id,customer_name,customer_phone,customer_address,merchant_id,total_money,order_status,create_time) 
VALUES (1,'000000000010',1000,'张三','13799999999','广东省深圳市南山区',7,500.50,'S','2021-05-16 12:12:12'),
       (2,'000000000011',1001,'李四','137999999998','广东省广州市',8,600.50,'S','2020-04-01 11:11:11'),
	   (3,'000000000012',1002,'赵六','137999999997','广东省珠海市',9,700.50,'E','2021-05-01 11:11:11');  


# 3. 插入3条数据后,目录结构如下:
root@9e40ca366829:/var/lib/clickhouse/data/test10/t_order# ll
drwxr-x--- 11 root       root       352 May 22 07:56 7_1_1_0/
drwxr-x--- 11 root       root       352 May 22 07:56 8_2_2_0/
drwxr-x--- 11 root       root       352 May 22 07:56 9_3_3_0/

# 4. 尝试再插入一条数所(分区要相同,主键要相同,才会保留最大版本)
INSERT INTO t_order(order_id,order_no,customer_id,customer_name,customer_phone,customer_address,merchant_id,total_money,order_status,create_time) 
VALUES(1,'000000000010',1000,'张三','13799999999','广东省深圳市南山区',7,500.80,'S','2021-05-16 12:12:18');


# 5. 尝试再插入一条数所(分区要相同,主键要相同,才会保留最大版本),的目录结构.
root@9e40ca366829:/var/lib/clickhouse/data/test10/t_order# ll
drwxr-x--- 11 root       root       352 May 22 07:56 7_1_1_0/
drwxr-x--- 11 root       root       352 May 22 07:59 7_4_4_0/
drwxr-x--- 11 root       root       352 May 22 07:56 8_2_2_0/
drwxr-x--- 11 root       root       352 May 22 07:56 9_3_3_0/


# 6. 优化表
9e40ca366829 :) OPTIMIZE TABLE t_order;


# 7. 查看目录(7_1_1_0/7_4_4_0) --> 7_1_4_1
root@9e40ca366829:/var/lib/clickhouse/data/test10/t_order# ll
drwxr-x--- 11 root       root       352 May 22 07:56 7_1_1_0/
drwxr-x--- 11 root       root       352 May 22 08:01 7_1_4_1/
drwxr-x--- 11 root       root       352 May 22 07:59 7_4_4_0/
drwxr-x--- 11 root       root       352 May 22 07:56 8_2_2_0/
drwxr-x--- 11 root       root       352 May 22 07:56 9_3_3_0/

(3). 总结

通过:ReplacingMergeTree实际是可以对数据进行更新的,更新的条件是:数据在同一个区,并且,是根据主键进行更新.
从这么多次的INSERT和目录对比来看,ClickHouse比较适合于批量的插入数据,而不是,单行单行的插入(因为:单行数据会产生一个目录).