Clickhouse 字符串函数

2022-10-25 leiting (8909阅读)

标签数据库

1.基本字符串操作

1.1判断是否为空

clickhouse> select empty(''),empty(' '),notEmpty(''),notEmpty(' ');

┌─empty('')─┬─empty(' ')─┬─notEmpty('')─┬─notEmpty(' ')─┐

│ 1 │ 0 │ 0 │ 1 │

└───────────┴────────────┴──────────────┴───────────────┘

1 rows in set. Elapsed: 0.007 sec.

clickhouse的函数名严格区分大小写。

'' 表示空字符串

' ' 表示的一个空格

1.2拼接

concat

clickhouse> select concat('China',' Huawei',' Shenzhen') str;

┌─str───────────────────┐

│ China Huawei Shenzhen │

└───────────────────────┘

clickhouse> select concat('China', null) str;

┌─str──┐

│ ᴺᵁᴸᴸ │

└──────┘

1.3截取：

substring(s, offset, length), mid(s, offset, length), substr(s, offset, length)

三个函数的用法和含义是一样的。

clickhouse> select substring('china',1,2) str;

┌─str─┐

│ ch │

└─────┘

1.4字符串长度

length

char_length, CHAR_LENGTH

character_length, CHARACTER_LENGTH

clickhouse> select length('China'),char_length('China'),character_length('China');

┌─length('China')─┬─char_length('China')─┬─character_length('China')─┐

│ 5 │ 5 │ 5 │

└─────────────────┴──────────────────────┴───────────────────────────┘

clickhouse> select length('中国'),char_length('中国'),character_length('中国');

┌─length('中国')─┬─char_length('中国')─┬─character_length('中国')─┐

│ 6 │ 2 │ 2 │

└────────────────┴─────────────────────┴──────────────────────────┘

1.5字符串重复:

repeat(s, n)

clickhouse> select repeat('China ',5)

┌─repeat('China ', 5)────────────┐

│ China China China China China │

└────────────────────────────────┘

1.6字符串反转：

reverse

1.7追加：

appendTrailingCharIfAbsent(s, c)

1.8获取字符串的开头结尾:

endsWith(s, suffix) startsWith(str, prefix)

1.9 字符串大小写转换:

lower, lcase

upper, ucase

1.10 字符串trim函数:trim

trim([[LEADING|TRAILING|BOTH] trim_character FROM] input_string)

trimLeft(input_string) ==ltrim(input_string)

trimRight(input_string) ==rtrim(input_string)

trimBoth(input_string) ==trim(input_string)

1.11 字符串加密和解密:

base64Encode(s)

base64Decode(s)

tryBase64Decode(s)

CRC32(s)

CRC32IEEE(s)

CRC64(s)

1.12 转换函数：

convertCharset(s, from, to)

1.13其他函数：

concatAssumeInjective

format(pattern, s0, s1, …)

isValidUTF8

toValidUTF8

normalizeQuery

normalizedQueryHash

2.字符串查找

2.1 字符串定位：

position(haystack, needle), locate(haystack, needle),positionCaseInsensitive

语法：

position(haystack, needle[, start_pos])

locate(haystack, needle[, start_pos])

positionCaseInsensitive(haystack, needle[, start_pos])

2.2 多个字符串定位：

multiSearchAllPositions,multiSearchAllPositionsCaseInsensitive

multiSearchAllPositions(haystack, [needle1, needle2, ..., needlen])

multiSearchFirstPosition(haystack, [needle1, needle2, …, needlen])

multiSearchFirstIndex(haystack, [needle1, needle2, …, needlen])

multiSearchAny(haystack, [needle1, needle2, …, needlen])

2.3 字符串匹配：

match(haystack, pattern)

multiMatchAny(haystack, [pattern1, pattern2, …, patternn])

multiMatchAnyIndex(haystack, [pattern1, pattern2, …, patternn])

multiMatchAllIndices(haystack, [pattern1, pattern2, …, patternn])

multiFuzzyMatchAny(haystack, distance, [pattern1, pattern2, …, patternn])

multiFuzzyMatchAnyIndex(haystack, distance, [pattern1, pattern2, …, patternn])

multiFuzzyMatchAllIndices(haystack, distance, [pattern1, pattern2, …, patternn])

2.4 字符串提取：

extract(haystack, pattern)

extractAll(haystack, pattern)

extractAllGroupsHorizontal

extractAllGroupsVertical

注意：extractAllGroupsHorizontal function is slower than extractAllGroupsVertical.

2.5字符串模糊查询：

like(haystack, pattern), haystack LIKE pattern operator

notLike(haystack, pattern), haystack NOT LIKE pattern operator

ilike

ilike(haystack, pattern)

2.6

ngramDistance(haystack, needle)

ngramSearch(haystack, needle)

2.7字符串出现的次数：

countSubstrings(haystack, needle)

countSubstrings(haystack, needle[, start_pos])

类似的函数：

countSubstrings()/countSubstringsCaseInsensitive()/countSubstringsCaseInsensitiveUTF8()

该功能在clickhouse 20.12.+版本实现。

clickhouse> select countSubstrings('I love China,he love china too,she also love china','china') china_count;

3.字符串替换

replaceOne(haystack, pattern, replacement)

replaceAll(haystack, pattern, replacement), replace(haystack, pattern, replacement)

replaceRegexpOne(haystack, pattern, replacement)

replaceRegexpAll(haystack, pattern, replacement)

regexpQuoteMeta(s)

Predefined characters: \0, \\, |, (, ), ^, $, ., [, ], ?, *, +, {, :, -.

4.字符串切割

splitByChar(separator, s)

splitByString(separator, s)

arrayStringConcat(arr[, separator])

alphaTokens(s)

extractAllGroups(text, regexp)

5.UTF8 相关的函数：

Clickhouse> select * from system.functions where name like '%UTF%';

SELECT *

FROM system.functions

WHERE name LIKE '%UTF%'

┌─name────────────────────────────────────────┬─is_aggregate─┬─case_insensitive─┬─alias_to─┐

│ ngramSearchCaseInsensitiveUTF8 │ 0 │ 0 │ │

│ ngramDistanceCaseInsensitiveUTF8 │ 0 │ 0 │ │

│ multiSearchFirstIndexCaseInsensitiveUTF8 │ 0 │ 0 │ │

│ multiSearchAnyUTF8 │ 0 │ 0 │ │

│ positionCaseInsensitiveUTF8 │ 0 │ 0 │ │

│ positionUTF8 │ 0 │ 0 │ │

│ reverseUTF8 │ 0 │ 0 │ │

│ lengthUTF8 │ 0 │ 0 │ │

│ ngramSearchUTF8 │ 0 │ 0 │ │

│ randomStringUTF8 │ 0 │ 0 │ │

│ javaHashUTF16LE │ 0 │ 0 │ │

│ multiSearchFirstIndexUTF8 │ 0 │ 0 │ │

│ isValidUTF8 │ 0 │ 0 │ │

│ multiSearchAllPositionsCaseInsensitiveUTF8 │ 0 │ 0 │ │

│ multiSearchFirstPositionCaseInsensitiveUTF8 │ 0 │ 0 │ │

│ lowerUTF8 │ 0 │ 0 │ │

│ multiSearchAllPositionsUTF8 │ 0 │ 0 │ │

│ multiSearchFirstPositionUTF8 │ 0 │ 0 │ │

│ substringUTF8 │ 0 │ 0 │ │

│ multiSearchAnyCaseInsensitiveUTF8 │ 0 │ 0 │ │

│ toValidUTF8 │ 0 │ 0 │ │

│ ngramDistanceUTF8 │ 0 │ 0 │ │

│ upperUTF8 │ 0 │ 0 │ │

└─────────────────────────────────────────────┴──────────────┴──────────────────┴──────────┘

23 rows in set. Elapsed: 0.016 sec.

参考：

https://clickhouse.tech/docs/en/sql-reference/functions/string-functions/

https://clickhouse.tech/docs/en/sql-reference/functions/string-search-functions/

https://clickhouse.tech/docs/en/sql-reference/functions/string-replace-functions/

https://clickhouse.tech/docs/en/sql-reference/functions/splitting-merging-functions/

————————————————

原文链接：https://blog.csdn.net/vkingnew/article/details/108229090

IT PHP 编程语言开发编程 Linux 科技 Elasticsearch HTML/CSS/XML 面试数据库网络 JAVA NoSQL C/C++ Golang 操作系统 Git 算法正则表达式 Redis 互联网 MySql 软件运维 JavaScript 国际架构设计 Mac OS TCP/IP Excel Windows Oracle Socket VR Vim MongoDB 运营 Python MemCache 商业硬件电子娱乐设计摄影 nginx WordPress 游戏 HTTP 团建数码电器 Docker

环境监测中液相色谱与固相萃取装置的关键作用塑料改性技术赋能传统材料，重构工程未来私人飞机旅游路线如何安排？方便吗？预订私人飞机的流程与费用全解析乘坐商务飞机体验感好吗？价格贵吗？气相色谱-质谱联用技术是现代化学分析的利器上下级之间的沟通技巧

略微加速

略速 - 互联网笔记