1.基本字符串操作
1.1判断是否为空
clickhouse> select empty(''),empty(' '),notEmpty(''),notEmpty(' ');
┌─empty('')─┬─empty(' ')─┬─notEmpty('')─┬─notEmpty(' ')─┐
│ 1 │ 0 │ 0 │ 1 │
└───────────┴────────────┴──────────────┴───────────────┘
1 rows in set. Elapsed: 0.007 sec.
clickhouse的函数名严格区分大小写。
'' 表示空字符串
' ' 表示的一个空格
1.2拼接
concat
clickhouse> select concat('China',' Huawei',' Shenzhen') str;
┌─str───────────────────┐
│ China Huawei Shenzhen │
└───────────────────────┘
clickhouse> select concat('China', null) str;
┌─str──┐
│ ᴺᵁᴸᴸ │
└──────┘
1.3截取:
substring(s, offset, length), mid(s, offset, length), substr(s, offset, length)
三个函数的用法和含义是一样的。
clickhouse> select substring('china',1,2) str;
┌─str─┐
│ ch │
└─────┘
1.4字符串长度
length
char_length, CHAR_LENGTH
character_length, CHARACTER_LENGTH
clickhouse> select length('China'),char_length('China'),character_length('China');
┌─length('China')─┬─char_length('China')─┬─character_length('China')─┐
│ 5 │ 5 │ 5 │
└─────────────────┴──────────────────────┴───────────────────────────┘
clickhouse> select length('中国'),char_length('中国'),character_length('中国');
┌─length('中国')─┬─char_length('中国')─┬─character_length('中国')─┐
│ 6 │ 2 │ 2 │
└────────────────┴─────────────────────┴──────────────────────────┘
1.5字符串重复:
repeat(s, n)
clickhouse> select repeat('China ',5)
┌─repeat('China ', 5)────────────┐
│ China China China China China │
└────────────────────────────────┘
1.6字符串反转:
reverse
1.7追加:
appendTrailingCharIfAbsent(s, c)
1.8获取字符串的开头结尾:
endsWith(s, suffix) startsWith(str, prefix)
1.9 字符串大小写转换:
lower, lcase
upper, ucase
1.10 字符串trim函数:trim
trim([[LEADING|TRAILING|BOTH] trim_character FROM] input_string)
trimLeft(input_string) ==ltrim(input_string)
trimRight(input_string) ==rtrim(input_string)
trimBoth(input_string) ==trim(input_string)
1.11 字符串加密和解密:
base64Encode(s)
base64Decode(s)
tryBase64Decode(s)
CRC32(s)
CRC32IEEE(s)
CRC64(s)
1.12 转换函数:
convertCharset(s, from, to)
1.13其他函数:
concatAssumeInjective
format(pattern, s0, s1, …)
isValidUTF8
toValidUTF8
normalizeQuery
normalizedQueryHash
2.字符串查找
2.1 字符串定位:
position(haystack, needle), locate(haystack, needle),positionCaseInsensitive
语法:
position(haystack, needle[, start_pos])
locate(haystack, needle[, start_pos])
positionCaseInsensitive(haystack, needle[, start_pos])
2.2 多个字符串定位:
multiSearchAllPositions,multiSearchAllPositionsCaseInsensitive
multiSearchAllPositions(haystack, [needle1, needle2, ..., needlen])
multiSearchFirstPosition(haystack, [needle1, needle2, …, needlen])
multiSearchFirstIndex(haystack, [needle1, needle2, …, needlen])
multiSearchAny(haystack, [needle1, needle2, …, needlen])
2.3 字符串匹配:
match(haystack, pattern)
multiMatchAny(haystack, [pattern1, pattern2, …, patternn])
multiMatchAnyIndex(haystack, [pattern1, pattern2, …, patternn])
multiMatchAllIndices(haystack, [pattern1, pattern2, …, patternn])
multiFuzzyMatchAny(haystack, distance, [pattern1, pattern2, …, patternn])
multiFuzzyMatchAnyIndex(haystack, distance, [pattern1, pattern2, …, patternn])
multiFuzzyMatchAllIndices(haystack, distance, [pattern1, pattern2, …, patternn])
2.4 字符串提取:
extract(haystack, pattern)
extractAll(haystack, pattern)
extractAllGroupsHorizontal
extractAllGroupsVertical
注意:extractAllGroupsHorizontal function is slower than extractAllGroupsVertical.
2.5字符串模糊查询:
like(haystack, pattern), haystack LIKE pattern operator
notLike(haystack, pattern), haystack NOT LIKE pattern operator
ilike
ilike(haystack, pattern)
2.6
ngramDistance(haystack, needle)
ngramSearch(haystack, needle)
2.7字符串出现的次数:
countSubstrings(haystack, needle)
countSubstrings(haystack, needle[, start_pos])
类似的函数:
countSubstrings()/countSubstringsCaseInsensitive()/countSubstringsCaseInsensitiveUTF8()
该功能在clickhouse 20.12.+版本实现。
clickhouse> select countSubstrings('I love China,he love china too,she also love china','china') china_count;
3.字符串替换
replaceOne(haystack, pattern, replacement)
replaceAll(haystack, pattern, replacement), replace(haystack, pattern, replacement)
replaceRegexpOne(haystack, pattern, replacement)
replaceRegexpAll(haystack, pattern, replacement)
regexpQuoteMeta(s)
Predefined characters: \0, \\, |, (, ), ^, $, ., [, ], ?, *, +, {, :, -.
4.字符串切割
splitByChar(separator, s)
splitByString(separator, s)
arrayStringConcat(arr[, separator])
alphaTokens(s)
extractAllGroups(text, regexp)
5.UTF8 相关的函数:
Clickhouse> select * from system.functions where name like '%UTF%';
SELECT *
FROM system.functions
WHERE name LIKE '%UTF%'
┌─name────────────────────────────────────────┬─is_aggregate─┬─case_insensitive─┬─alias_to─┐
│ ngramSearchCaseInsensitiveUTF8 │ 0 │ 0 │ │
│ ngramDistanceCaseInsensitiveUTF8 │ 0 │ 0 │ │
│ multiSearchFirstIndexCaseInsensitiveUTF8 │ 0 │ 0 │ │
│ multiSearchAnyUTF8 │ 0 │ 0 │ │
│ positionCaseInsensitiveUTF8 │ 0 │ 0 │ │
│ positionUTF8 │ 0 │ 0 │ │
│ reverseUTF8 │ 0 │ 0 │ │
│ lengthUTF8 │ 0 │ 0 │ │
│ ngramSearchUTF8 │ 0 │ 0 │ │
│ randomStringUTF8 │ 0 │ 0 │ │
│ javaHashUTF16LE │ 0 │ 0 │ │
│ multiSearchFirstIndexUTF8 │ 0 │ 0 │ │
│ isValidUTF8 │ 0 │ 0 │ │
│ multiSearchAllPositionsCaseInsensitiveUTF8 │ 0 │ 0 │ │
│ multiSearchFirstPositionCaseInsensitiveUTF8 │ 0 │ 0 │ │
│ lowerUTF8 │ 0 │ 0 │ │
│ multiSearchAllPositionsUTF8 │ 0 │ 0 │ │
│ multiSearchFirstPositionUTF8 │ 0 │ 0 │ │
│ substringUTF8 │ 0 │ 0 │ │
│ multiSearchAnyCaseInsensitiveUTF8 │ 0 │ 0 │ │
│ toValidUTF8 │ 0 │ 0 │ │
│ ngramDistanceUTF8 │ 0 │ 0 │ │
│ upperUTF8 │ 0 │ 0 │ │
└─────────────────────────────────────────────┴──────────────┴──────────────────┴──────────┘
23 rows in set. Elapsed: 0.016 sec.
参考:
https://clickhouse.tech/docs/en/sql-reference/functions/string-functions/
https://clickhouse.tech/docs/en/sql-reference/functions/string-search-functions/
https://clickhouse.tech/docs/en/sql-reference/functions/string-replace-functions/
https://clickhouse.tech/docs/en/sql-reference/functions/splitting-merging-functions/
————————————————
版权声明:本文为CSDN博主「vkingnew」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/vkingnew/article/details/108229090
最新评论: