clickhouse实时大数据分析引擎的SQL写法详解

绝地灬酷狼 2022-03-26 04:18 1036阅读 0赞

ClickHouse的优点

  • 单个查询的并行处理(利用多个核心)
  • 多个服务器上的分布式处理
  • 非常快速的扫描(参见下面的基准测试),可用于实时查询
  • 列存储非常适合使用“宽”/“非规范化”表(许多列)
  • 压缩性好
  • SQL支持(有限制)
  • 良好的功能集,包括支持近似计算
  • 不同的存储引擎(磁盘存储格式)
  • 非常适合结构日志/事件数据以及时间序列数据(引擎MergeTree需要日期字段)
  • 索引支持(仅限主键,不是所有存储引擎)
  • 漂亮的命令行界面,具有用户友好的进度条和格式

以下是ClickHouse功能的完整列表

ClickHouse的缺点

  • 没有真正的删除/更新支持,也没有事务(与Spark和大多数大数据系统相同)
  • 没有二级密钥(与Spark和大多数大数据系统相同)
  • 自己的协议(没有MySQL协议支持)
  • 有限的SQL支持,以及连接实现是不同的。如果要从MySQL或Spark迁移,则可能必须使用连接重新编写所有查询。
  • 没有窗口功能

此处为clickhouse的sql写法详解:

clickhouse的SQL查询语句与mysql,presto的SQL大致相同,但是也有少许不同的地方,此文仅记录与我日常编写的clickhouseSQL查询语句,包含计算函数,聚合函数,关联语句写法,和一些需要注意的地方。

clickhouse官方网站链接地址:https://clickhouse.yandex/

一:简单查询语句

简单查询语句与mysql等数据库并无差异

select * from 库名.表名 where 条件

ex:select snow,sname,sage from student where sno= 1

需要注意的地方:clickhouse 是用于实时大数据分析引擎,所以,一般使用clickhouse表内数据很大,也有很多分区,所以,查询时如果查询数据太大,where条件限制作用很小的情况下需要使用 limit做查询限制。否则会提示查询数据超过XXGB的报错提示。

二:关联查询语句

首先:clickhouse查询不支持大于两个表以上的直接join,像如下这种mysql等常用的多表关联写法

第二:关联条件从on改为 using ,using 字段必须在各表中名称一致,如果不一致可以通过select 字段 as 别名,将字段名统一

第三:关联 关键字 常用的有一下几种

1.ALL LEFT JOIN

2.ANY LEFT JOIN

3.ALL FULL JOIN

一般的sql关联查询语句如下

ex:select * from table as a left join table as b on a.id=b.aid left join table as c on a.id=c.aid

这种写法在clickhouse中会报错 ,两张表以上的表关联,可以通过子查询的方式来处理

ex: 先将TABLEA表与TABLEB表通过left join 关联之后的结果再left join TABLEC 关联条件从on改为 using using 字段必须在各表中名称一致,如果不一致可以通过select 字段 as 别名,将字段名统一

  1. SELECT
  2. *
  3. FROM
  4. ( SELECT
  5. *
  6. FROM
  7. (
  8. (select *
  9. FROM
  10. TABLEA )
  11. ALL LEFT JOIN
  12. (select *
  13. FROM
  14. TABLEB )using aid
  15. )
  16. ALL left JOIN
  17. (select *
  18. FROM
  19. TABLEC) USING aid
  20. )

三:常用函数或表达式

1.sum(字段) 求和

2.avg(字段) 求平均

3.round(字段/sum(字段)/(计算公式a/b),2) 四舍五入取2位小数

4.case when 字段B= 0 then null/0 else round( 字段A/字段B ,4) 判断语句 如果被除数为0 那么返回null或者0中的一个

5.toString()转化为字符串

6.concat(字段值,’要拼接的内容如%’)

示例:concat(toString(round(round(a/b,4) * 100 ,2)),’%’)

四:相同部分

在where条件与group by ,order by,limit 这些的使用上同mysql的SQL写法

五:实际示例:

  1. SELECT
  2. account_id,
  3. '2018-12-12~2018-12-15' AS date,
  4. account,
  5. ad_click
  6. FROM
  7. (
  8. SELECT
  9. account_id,
  10. fr,
  11. fr_name,
  12. account,
  13. account_balance,
  14. account_budget,
  15. account_exclude_ip,
  16. account_budget_offline_time,
  17. account_status
  18. FROM
  19. marketing.sem_account_type
  20. WHERE
  21. 1 = 1
  22. AND lower(fr) IN ('bd_sem')
  23. AND account_id IN (
  24. '18091503',
  25. '18091505',
  26. '18091501'
  27. )
  28. ) ALL
  29. LEFT JOIN (
  30. SELECT
  31. account_id,
  32. fr,
  33. round(sum(ad_cost) / 3, 2) AS ad_cost,
  34. round(sum(ad_cost_real) / 3, 2) AS ad_cost_real,
  35. round(sum(ad_impression) / 3, 2) AS ad_impression,
  36. round(sum(ad_click) / 3, 2) AS ad_click,
  37. round(sum(clue_all) / 3, 2) AS clue_all,
  38. round(sum(clue_all_new) / 3, 2) AS clue_all_new,
  39. round(
  40. sum(
  41. c1_kpi_daily_new_customer_amount
  42. ) / 3,
  43. 2
  44. ) AS c1_kpi_daily_new_customer_amount,
  45. round(
  46. sum(c1_kpi_new_customer_amount) / 3,
  47. 2
  48. ) AS c1_kpi_new_customer_amount,
  49. round(
  50. sum(
  51. c2_kpi_daily_new_customer_amount
  52. ) / 3,
  53. 2
  54. ) AS c2_kpi_daily_new_customer_amount,
  55. round(
  56. sum(c2_kpi_new_customer_amount) / 3,
  57. 2
  58. ) AS c2_kpi_new_customer_amount,
  59. round(sum(c2c_c1_create) / 3, 2) AS c2c_c1_create,
  60. round(sum(c2b_c1_create) / 3, 2) AS c2b_c1_create,
  61. round(sum(c2c_c1_onsite) / 3, 2) AS c2c_c1_onsite,
  62. round(sum(c2b_c1_onsite) / 3, 2) AS c2b_c1_onsite,
  63. round(sum(c2c_c1_onsale) / 3, 2) AS c2c_c1_onsale,
  64. round(sum(c2b_c1_onsale) / 3, 2) AS c2b_c1_onsale,
  65. round(sum(c2c_c2_appoint) / 3, 2) AS c2c_c2_appoint,
  66. round(sum(b2c_c2_appoint) / 3, 2) AS b2c_c2_appoint,
  67. round(sum(ssss_c2_appoint) / 3, 2) AS ssss_c2_appoint,
  68. round(
  69. sum(c2c_c2_finish_appoint) / 3,
  70. 2
  71. ) AS c2c_c2_finish_appoint,
  72. round(
  73. sum(b2c_c2_finish_appoint) / 3,
  74. 2
  75. ) AS b2c_c2_finish_appoint,
  76. round(
  77. sum(ssss_c2_finish_appoint) / 3,
  78. 2
  79. ) AS ssss_c2_finish_appoint,
  80. round(sum(c2c_c2_order) / 3, 2) AS c2c_c2_order,
  81. round(sum(b2c_c2_order) / 3, 2) AS b2c_c2_order,
  82. round(sum(weighting_number) / 3, 2) AS weighting_number,
  83. round(sum(ssss_c2_order) / 3, 2) AS ssss_c2_order
  84. FROM
  85. (
  86. SELECT
  87. fr,
  88. keyword_id,
  89. account_id,
  90. cost AS ad_cost,
  91. cost_real AS ad_cost_real,
  92. impression AS ad_impression,
  93. click AS ad_click
  94. FROM
  95. marketing.sem_keyword_report
  96. WHERE
  97. 1 = 1
  98. AND the_day >= '2018-12-12'
  99. AND the_day <= '2018-12-15'
  100. AND lower(fr) IN ('bd_sem')
  101. AND account_id IN (
  102. '18091503',
  103. '18091505',
  104. '18091501'
  105. )
  106. AND (
  107. campaign_city IN (
  108. '上海',
  109. '东莞',
  110. '中山',
  111. '临沂',
  112. '乌鲁木齐',
  113. '伊犁',
  114. '佛山',
  115. '保定',
  116. '全国',
  117. '兰州',
  118. '包头',
  119. '北京',
  120. '南京',
  121. '南宁',
  122. '南昌',
  123. '南通',
  124. '南阳',
  125. '厦门',
  126. '合肥',
  127. '呼和浩特',
  128. '咸阳',
  129. '哈尔滨',
  130. '唐山',
  131. '嘉兴',
  132. '大同',
  133. '大连',
  134. '天津',
  135. '太原',
  136. '宁波',
  137. '宜昌',
  138. '宿迁',
  139. '常州',
  140. '广州',
  141. '廊坊',
  142. '徐州',
  143. '惠州',
  144. '成都',
  145. '扬州',
  146. '新乡',
  147. '无锡',
  148. '昆明',
  149. '杭州',
  150. '武汉',
  151. '沈阳',
  152. '泉州',
  153. '泰州',
  154. '泸州',
  155. '洛阳',
  156. '济南',
  157. '济宁',
  158. '淮安',
  159. '深圳',
  160. '温州',
  161. '澳门',
  162. '烟台',
  163. '珠海',
  164. '盐城',
  165. '石家庄',
  166. '福州',
  167. '绵阳',
  168. '芜湖',
  169. '苏州',
  170. '襄阳',
  171. '西安',
  172. '许昌',
  173. '贵阳',
  174. '达州',
  175. '郑州',
  176. '重庆',
  177. '金华',
  178. '银川',
  179. '镇江',
  180. '长春',
  181. '长沙',
  182. '青岛'
  183. )
  184. )
  185. ) ALL
  186. FULL JOIN (
  187. SELECT
  188. fr,
  189. keyword_id,
  190. account_id,
  191. clue_all,
  192. clue_all_new,
  193. c1_kpi_daily_new_customer_amount,
  194. c1_kpi_new_customer_amount,
  195. c2_kpi_daily_new_customer_amount,
  196. c2_kpi_new_customer_amount,
  197. c2c_c1_create,
  198. c2b_c1_create,
  199. c2c_c1_onsite,
  200. c2b_c1_onsite,
  201. c2c_c1_onsale,
  202. c2b_c1_onsale,
  203. c2c_c2_appoint,
  204. b2c_c2_appoint,
  205. ssss_c2_appoint,
  206. c2c_c2_finish_appoint,
  207. b2c_c2_finish_appoint,
  208. ssss_c2_finish_appoint,
  209. c2c_c2_order,
  210. b2c_c2_order,
  211. weighting_number,
  212. ssss_c2_order
  213. FROM
  214. (
  215. SELECT
  216. fr,
  217. kid AS keyword_id,
  218. sum(clue_all) AS clue_all,
  219. sum(clue_all_new) AS clue_all_new,
  220. sum(
  221. c1_kpi_daily_new_customer_amount
  222. ) AS c1_kpi_daily_new_customer_amount,
  223. sum(c1_kpi_new_customer_amount) AS c1_kpi_new_customer_amount,
  224. sum(
  225. c2_kpi_daily_new_customer_amount
  226. ) AS c2_kpi_daily_new_customer_amount,
  227. sum(c2_kpi_new_customer_amount) AS c2_kpi_new_customer_amount,
  228. sum(c2c_c1_create) AS c2c_c1_create,
  229. sum(c2b_c1_create) AS c2b_c1_create,
  230. sum(c2c_c1_onsite) AS c2c_c1_onsite,
  231. sum(c2b_c1_onsite) AS c2b_c1_onsite,
  232. sum(c2c_c1_onsale) AS c2c_c1_onsale,
  233. sum(c2b_c1_onsale) AS c2b_c1_onsale,
  234. sum(c2c_c2_appoint) AS c2c_c2_appoint,
  235. sum(b2c_c2_appoint) AS b2c_c2_appoint,
  236. sum(ssss_c2_appoint) AS ssss_c2_appoint,
  237. sum(c2c_c2_finish_appoint) AS c2c_c2_finish_appoint,
  238. sum(b2c_c2_finish_appoint) AS b2c_c2_finish_appoint,
  239. sum(ssss_c2_finish_appoint) AS ssss_c2_finish_appoint,
  240. sum(c2c_c2_order) AS c2c_c2_order,
  241. sum(b2c_c2_order) AS b2c_c2_order,
  242. sum(weighting_number) AS weighting_number,
  243. sum(ssss_c2_order) AS ssss_c2_order
  244. FROM
  245. marketing.market_kid_stat_new_v5
  246. WHERE
  247. 1 = 1
  248. AND dts >= '2018-12-12'
  249. AND dts <= '2018-12-15'
  250. AND lower(fr) IN ('bd_sem')
  251. AND (
  252. city IN (
  253. '上海',
  254. '东莞',
  255. '中山',
  256. '临沂',
  257. '乌鲁木齐',
  258. '伊犁',
  259. '佛山',
  260. '保定',
  261. '全国',
  262. '兰州',
  263. '包头',
  264. '北京',
  265. '南京',
  266. '南宁',
  267. '南昌',
  268. '南通',
  269. '南阳',
  270. '厦门',
  271. '合肥',
  272. '呼和浩特',
  273. '咸阳',
  274. '哈尔滨',
  275. '唐山',
  276. '嘉兴',
  277. '大同',
  278. '大连',
  279. '天津',
  280. '太原',
  281. '宁波',
  282. '宜昌',
  283. '宿迁',
  284. '常州',
  285. '广州',
  286. '廊坊',
  287. '徐州',
  288. '惠州',
  289. '成都',
  290. '扬州',
  291. '新乡',
  292. '无锡',
  293. '昆明',
  294. '杭州',
  295. '武汉',
  296. '沈阳',
  297. '泉州',
  298. '泰州',
  299. '泸州',
  300. '洛阳',
  301. '济南',
  302. '济宁',
  303. '淮安',
  304. '深圳',
  305. '温州',
  306. '澳门',
  307. '烟台',
  308. '珠海',
  309. '盐城',
  310. '石家庄',
  311. '福州',
  312. '绵阳',
  313. '芜湖',
  314. '苏州',
  315. '襄阳',
  316. '西安',
  317. '许昌',
  318. '贵阳',
  319. '达州',
  320. '郑州',
  321. '重庆',
  322. '金华',
  323. '银川',
  324. '镇江',
  325. '长春',
  326. '长沙',
  327. '青岛'
  328. )
  329. )
  330. GROUP BY
  331. keyword_id,
  332. fr
  333. ) ANY
  334. LEFT JOIN (
  335. SELECT
  336. fr,
  337. keyword_id,
  338. account_id
  339. FROM
  340. marketing.sem_keyword_type
  341. ) USING keyword_id,
  342. fr
  343. WHERE
  344. 1 = 1
  345. AND lower(fr) IN ('bd_sem')
  346. AND account_id IN (
  347. '18091503',
  348. '18091505',
  349. '18091501'
  350. )
  351. ) USING keyword_id,
  352. fr
  353. GROUP BY
  354. account_id,
  355. fr
  356. ) USING account_id,
  357. fr
  358. ORDER BY
  359. account_id
  360. LIMIT 0,
  361. 50

发表评论

表情:
评论列表 (有 0 条评论,1036人围观)

还没有评论,来说两句吧...

相关阅读