行列转置一般由由行转为列,或者由列转为行。
CREATE TABLE datasets.t_city
(
`province` String,
`city` String,
`createtime` DateTime,
`city_level` Int8
)
ENGINE = MergeTree()
ORDER BY province
SETTINGS index_granularity = 8192;
insert into t_city values('Hubei','Wuhan',now(),2),('Hubei','Xiangyang',now(),3),('Shanghai','Shanghai',now(),1),('Guangdong','Guangzhou',now(),1),('Guangdong','Shenzhen',now(),1),('Guangdong','DOngguan',now(),2),('Guangdong','Zhuhai',now(),3);
Clickhouse> select * from t_city;
SELECT *
FROM t_city
┌─province──┬─city──────┬──────────createtime─┬─city_level─┐
│ Guangdong │ Guangzhou │ 2020-07-07 14:02:53 │ 1 │
│ Guangdong │ Shenzhen │ 2020-07-07 14:02:53 │ 1 │
│ Guangdong │ DOngguan │ 2020-07-07 14:02:53 │ 2 │
│ Guangdong │ Zhuhai │ 2020-07-07 14:02:53 │ 3 │
│ Hubei │ Wuhan │ 2020-07-07 14:02:53 │ 2 │
│ Hubei │ Xiangyang │ 2020-07-07 14:02:53 │ 3 │
│ Shanghai │ Shanghai │ 2020-07-07 14:02:53 │ 1 │
└───────────┴───────────┴─────────────────────┴────────────┘
7 rows in set. Elapsed: 0.002 sec.
列转行:
查询每个省份用户的城市:groupArray
Clickhouse> select province, groupArray(city) from t_city group by province;
SELECT
province,
groupArray(city)
FROM t_city
GROUP BY province
┌─province──┬─groupArray(city)─────────────────────────────┐
│ Shanghai │ ['Shanghai'] │
│ Hubei │ ['Wuhan','Xiangyang'] │
│ Guangdong │ ['Guangzhou','Shenzhen','Dongguan','Zhuhai'] │
└───────────┴──────────────────────────────────────────────┘
插入一条重复的记录:
insert into t_city values('Hubei','Wuhan',now(),2);
可以看到Hubei有一个重复的wuhan
Clickhouse> select province, groupArray(city) from t_city group by province;
SELECT
province,
groupArray(city)
FROM t_city
GROUP BY province
┌─province──┬─groupArray(city)─────────────────────────────┐
│ Shanghai │ ['Shanghai'] │
│ Hubei │ ['Wuhan','Xiangyang','Wuhan'] │
│ Guangdong │ ['Guangzhou','Shenzhen','Dongguan','Zhuhai'] │
└───────────┴──────────────────────────────────────────────┘
3 rows in set. Elapsed: 0.002 sec.
可以使用函数groupUniqArray进行去重:
Clickhouse> select province, groupUniqArray(city) from t_city group by province;
SELECT
province,
groupUniqArray(city)
FROM t_city
GROUP BY province
┌─province──┬─groupUniqArray(city)─────────────────────────┐
│ Shanghai │ ['Shanghai'] │
│ Hubei │ ['Wuhan','Xiangyang'] │
│ Guangdong │ ['Zhuhai','Dongguan','Guangzhou','Shenzhen'] │
└───────────┴──────────────────────────────────────────────┘
3 rows in set. Elapsed: 0.003 sec.
字符串拆分:splitByChar
Clickhouse> select splitByChar('#','Hubei_Wuhan#Hubei_Xiangyang#Guangdong_Shenzhen') col;
SELECT splitByChar('#', 'Hubei_Wuhan#Hubei_Xiangyang#Guangdong_Shenzhen') AS col
┌─col────────────────────────────────────────────────────┐
│ ['Hubei_Wuhan','Hubei_Xiangyang','Guangdong_Shenzhen'] │
└────────────────────────────────────────────────────────┘
1 rows in set. Elapsed: 0.002 sec.
可以使用ArrayJoin 函数将一行转为列:
Clickhouse> select arrayJoin(splitByChar('#', 'Hubei_Wuhan#Hubei_Xiangyang#Guangdong_Shenzhen')) col;
SELECT arrayJoin(splitByChar('#', 'Hubei_Wuhan#Hubei_Xiangyang#Guangdong_Shenzhen')) AS col
┌─col────────────────┐
│ Hubei_Wuhan │
│ Hubei_Xiangyang │
│ Guangdong_Shenzhen │
└────────────────────┘
3 rows in set. Elapsed: 0.002 sec.
实际示例:
SELECT
province,
groupArray(city)
FROM t_city
GROUP BY province
┌─province──┬─groupArray(city)─────────────────────────────┐
│ Shanghai │ ['Shanghai'] │
│ Hubei │ ['Wuhan','Xiangyang','Wuhan'] │
│ Guangdong │ ['Guangzhou','Shenzhen','Dongguan','Zhuhai'] │
└───────────┴──────────────────────────────────────────────┘
3 rows in set. Elapsed: 0.003 sec.
Clickhouse> SELECT province, arrayJoin(groupArray(city)) new_city FROM t_city GROUP BY province;
SELECT
province,
arrayJoin(groupArray(city)) AS new_city
FROM t_city
GROUP BY province
┌─province──┬─new_city──┐
│ Shanghai │ Shanghai │
│ Hubei │ Wuhan │
│ Hubei │ Xiangyang │
│ Hubei │ Wuhan │
│ Guangdong │ Guangzhou │
│ Guangdong │ Shenzhen │
│ Guangdong │ Dongguan │
│ Guangdong │ Zhuhai │
└───────────┴───────────┘
8 rows in set. Elapsed: 0.003 sec.
结论:在clickhouse中主要使用两个函数groupArray 和ArrayJoin。
原文链接:https://blog.csdn.net/u010002184/article/details/124782457
最新评论: