數據分析工作中常用的3類SQL開窗函數詳解
當前位置:點晴教程→知識管理交流
→『 技術文檔交流 』
在日常工作中,我們經常要處理各種數據排名、累計求和、環比等問題。SQL窗口函數為我們提供了強大的工具來處理這些任務。 接下來,我們一起看看SQL常見窗口函數及其應用案例吧。 窗口函數中必須有一個over子句。 01 1.1 函數說明
1.2 案例說明 這里通過表data_learning.product_order(商品銷量表)、data_learning.product(商品信息表)、data_learning.product_category(商品二級分類信息表)進行舉例,兩張表建表語句和示例數據插入語句可以查看我之前的文章。 數據表示例數據分別如下: data_learning.product_order(商品銷量表): data_learning.product(商品信息表): data_learning.product_category(商品二級分類信息表):
問題:請將數據集按照商品二級類別(category_id)字段進行分組后,根據上面列舉的窗口函數并按照sales_volume字段降序返回排名。 SQL: SELECT a.product_id ,b.product_name ,a.category_id ,c.category_name ,a.sales_volume ,ROW_NUMBER() OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS ro_result ,RANK() OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS rank_result ,DENSE_RANK() OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS dense_rank_result ,NTILE(10) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS ntile_resultFROM(SELECT *FROM data_learning.product_order)aLEFT JOIN(SELECT *FROM data_learning.product )bon a.product_id = b.product_idLEFT JOIN(SELECT *FROM data_learning.product_category)con a.category_id = c.category_id; 結果如下(部分截圖):
上面的案例顯示,童裝類別分組下,商品毛衣針織衫和兒童羽絨服銷量都是1617,開窗函數row_number的排序結果是順延,rank的排序結果是相同值并列、后續排序序號不連續,dense_rank的排序結果是相同值并列、后續排序序號連續。 開窗函數NTILE(n) OVER(PARTITION BY 分組列 ORDER BY 排序列)用于將一個結果集分割成指定數量的桶(buckets),并分配一個桶號給每個值。它通常用于在分組數據中將數據等分為多個組。 在上面的案例中,我們對商品分類分組數據集分割成10個桶,桶號從1開始。以童裝類別為例,童裝類別共17行數據,17不能被10整除,數據集無法均分成10個桶,所以結果集中前面7個桶包含2行數據,后面3個桶包含1行數據。
02 2.1 函數說明
2.2 案例說明 問題:請將數據集按照商品二級類別(category_id)字段進行分組、sales_volume字段進行降序后,根據上面列舉的窗口函數對sales_volume進行聚合。 SQL: SELECT a.product_id ,b.product_name ,a.category_id ,c.category_name ,a.sales_volume ,COUNT(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS cnt_result ,SUM(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS sum_result ,AVG(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS avg_result ,MAX(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS max_result ,MIN(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS min_result FROM(SELECT *FROM data_learning.product_order)aLEFT JOIN(SELECT *FROM data_learning.product)bon a.product_id = b.product_idLEFT JOIN(SELECT *FROM data_learning.product_category )con a.category_id = c.category_id; 結果如下(部分截圖): 從案例查詢結果看,數據集按照category_id進行分組后,按照sales_volume降序排序,將每組中的sales_volume依次聚合;
這涉及到窗口函數的窗口幀。窗口幀(Window Frame)是在SQL中用于定義開窗函數計算時要考慮的行的范圍,也就是開窗函數作用的范圍。窗口幀使用在OVER子句中,與PARTITION BY和ORDER BY聯合使用來分組,對組內的數據進行排序和聚合。語句緊接著寫在ORDER BY之后。 窗口幀包含以下類型:
窗口幀表達式的語法如下:
具體x、y可取值如下:
若ORDER BY 后未指定框架,那么默認框架將采用 range unbounded preceding and current row,表示從開窗后的第一行到當前行。 若分組后不加ORDER BY 或者在ORDER BY 之后加上語句rows between unbounded preceding and unbounded following,也就是對分組中所有的數據進行聚合運算。具體示例和結果如下: SELECT a.product_id ,b.product_name ,a.category_id ,c.category_name ,a.sales_volume ,COUNT(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS cnt_result ,COUNT(a.sales_volume) OVER(PARTITION BY a.category_id) AS cnt_result1 ,COUNT(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC rows between unbounded preceding and unbounded following) AS cnt_result2 ,SUM(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS sum_result ,SUM(a.sales_volume) OVER(PARTITION BY a.category_id ) AS sum_result1 ,SUM(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC rows between unbounded preceding and unbounded following) AS sum_result2 ,AVG(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS avg_result ,AVG(a.sales_volume) OVER(PARTITION BY a.category_id) AS avg_result1 ,AVG(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC rows between unbounded preceding and unbounded following) AS avg_result2 ,MAX(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS max_result ,MAX(a.sales_volume) OVER(PARTITION BY a.category_id) AS max_result1 ,MAX(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC rows between unbounded preceding and unbounded following) AS max_result2 ,MIN(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS min_result ,MIN(a.sales_volume) OVER(PARTITION BY a.category_id) AS min_result1 ,MIN(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC rows between unbounded preceding and unbounded following) AS min_result2FROM(SELECT *FROM data_learning.product_order)aLEFT JOIN (SELECT *FROM data_learning.product)bon a.product_id = b.product_idLEFT JOIN(SELECT *FROM data_learning.product_category)con a.category_id = c.category_id; 運行結果如下(部分截圖): 可以看到,不加ORDER BY的字段如sum_result1和 ORDER BY 后+rows between unbounded preceding and unbounded following的字段sum_result2結果一樣。 03 3.1 函數說明
3.2 案例說明 問題:請將數據集按照商品二級類別(category_id)字段進行分組、并利用其他常見窗口函數對sales_volume字段進行處理(主要是為了理解上述函數的作用效果)。 SQL: SELECT a.product_id ,b.product_name ,a.category_id ,c.category_name ,a.sales_volume ,LAG(a.sales_volume,1) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS lag_result ,LEAD(a.sales_volume,1) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS lead_result ,FIRST_VALUE(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS firstvalue_result ,LAST_VALUE(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS lastvalue_result FROM(SELECT *FROM data_learning.product_order)aLEFT JOIN(SELECT *FROM data_learning.product)bon a.product_id = b.product_idLEFT JOIN(SELECT *FROM data_learning.product_category)con a.category_id = c.category_id; 結果如下(部分截圖): 其他常見窗口函數同聚合類型的窗口函數一樣,也可以結合窗口幀實現不同的數據統計效果。 04 簡單總結下今天分享的內容。 我們詳細介紹了SQL中的三類窗口函數,并且在每個函數下使用了相關的示例演示函數的使用效果。同時,介紹了窗口幀的使用,讓我們可以在SQL查詢中進行更靈活和精確的數據分析和聚合。 該文章在 2024/1/13 17:33:11 編輯過 |
關鍵字查詢
相關文章
正在查詢... |