Recursive WITH, part III: IS_LEAF

Natalka Roshak's picture
articles: 

The CONNECT BY syntax provides a useful pseudocolumn, CONNECT_BY_ISLEAF, which identifies leaf nodes in the data: it’s 1 when a row has no further children, 0 otherwise. In this post, I’ll look at emulating this pseudocolumn using recursive WITH.

Let’s continue with the example from my previous posts about hierarchical data: the skeleton from the old song “Dem Dry Bones”.

UPDATE skeleton SET connected_to_the=NULL WHERE bone='head';
SELECT * FROM skeleton;

BONE                                     CONNECTED_TO_THE
---------------------------------------- ----------------------------------------
shoulder                                 neck
back                                     shoulder
hip                                      back
thigh                                    hip
knee                                     thigh
leg                                      knee
foot                                     heel
head
neck                                     head
toe                                      foot
arm                                      shoulder
wrist                                    arm
ankle                                    leg
heel                                     ankle
finger                                   wrist
a rib                                    back
b rib                                    back
c rib                                    back

With CONNECT BY, we can use the CONNECT_BY_ISLEAF pseudocolumn to identify leaf nodes:

SELECT bone, level, 
ltrim(sys_connect_by_path(bone,' -> '),' -> ') AS path
FROM skeleton
WHERE connect_by_isleaf=1
START WITH connected_to_the IS NULL
CONNECT BY prior bone=connected_to_the 
ORDER siblings BY 1;

BONE      LEVEL PATH                                                                                            
--------- ----- ----------------------------------------------------------------------------------------------- 
finger        6 head -> neck -> shoulder -> arm -> wrist -> finger                                              
a rib         5 head -> neck -> shoulder -> back -> a rib                                                       
b rib         5 head -> neck -> shoulder -> back -> b rib                                                       
c rib         5 head -> neck -> shoulder -> back -> c rib                                                       
toe          12 head -> neck -> shoulder -> back -> hip -> thigh -> knee -> leg -> ankle -> heel -> foot -> toe

This pseudocolumn takes a little more thought to replicate using recursive WITH than the LEVEL pseudocolumn and the SYS_CONNECT_BY_PATH, which, as we saw in my last post, fall naturally out of the recursion.

We can imitate CONNECT_BY_ISLEAF by searching DEPTH FIRST and using the LEAD function to peek at the next row’s the_level value. If the next row’s level is higher than the current row, then it’s a child of the current row; otherwise, it’s not a child. Since, with DEPTH FIRST, all the children of a row come out before any siblings, if the next row isn’t a child, then the current row is a leaf.

WITH skellarchy (bone, parent, the_level) AS
 ( SELECT bone, connected_to_the, 0  FROM skeleton 
   WHERE bone = 'head'                         
 UNION ALL
   SELECT s.bone, s.connected_to_the , r.the_level + 1
   FROM skeleton s, skellarchy r
   WHERE r.bone = s.connected_to_the           
 )
SEARCH DEPTH FIRST BY bone SET bone_order
CYCLE bone SET is_a_cycle TO 'Y' DEFAULT 'N'
SELECT lpad(' ',2*the_level, ' ') || bone AS bone_tree , the_level,
  lead(the_level) OVER (ORDER BY bone_order) AS next_level,
  CASE 
    WHEN the_level < lead(the_level) OVER (ORDER BY bone_order) THEN NULL
    ELSE 'LEAF'
  END is_leaf
FROM skellarchy
ORDER BY bone_order;

BONE_TREE                                      THE_LEVEL NEXT_LEVEL IS_L
--------------------------------------------- ---------- ---------- ----
head                                                   0          1
  neck                                                 1          2
    shoulder                                           2          3
      arm                                              3          4
        wrist                                          4          5
          finger                                       5          3 LEAF
      back                                             3          4
        a rib                                          4          4 LEAF
        b rib                                          4          4 LEAF
        c rib                                          4          4 LEAF
        hip                                            4          5
          thigh                                        5          6
            knee                                       6          7
              leg                                      7          8
                ankle                                  8          9
                  heel                                 9         10
                    foot                              10         11
                      toe                             11            LEAF

Watch out for Cycles

The first point of caution about this solution concerns cycles. In my last post, I had created a cycle by making the ‘head’ node’s parent the ‘toe’ node. If I’d left the cycle in the data, the toe node wouldn’t be a leaf any more, but this query would falsely identify the head as a leaf:

UPDATE skeleton SET connected_to_the='toe' WHERE bone='head';

BONE_TREE                                      THE_LEVEL NEXT_LEVEL IS_L
--------------------------------------------- ---------- ---------- ----
head                                                   0          1
  neck                                                 1          2
    shoulder                                           2          3
      arm                                              3          4
        wrist                                          4          5
          finger                                       5          3 LEAF
      back                                             3          4
        a rib                                          4          4 LEAF
        b rib                                          4          4 LEAF
        c rib                                          4          4 LEAF
        hip                                            4          5
          thigh                                        5          6
            knee                                       6          7
              leg                                      7          8
                ankle                                  8          9
                  heel                                 9         10
                    foot                              10         11
                      toe                             11         12
                        head                          12            LEAF
 
19 rows selected.

This can be corrected for by adding WHERE IS_A_CYCLE=’N’ to the query.

Respect the order of evaluation…

A second point of caution: if I add a WHERE clause to the query that limits the number of levels, the last line of the resultset will always be identified as a leaf.

WITH skellarchy (bone, parent, the_level) AS
 ( SELECT bone, connected_to_the, 0  FROM skeleton 
   WHERE bone = 'head'                         
 UNION ALL
   SELECT s.bone, s.connected_to_the , r.the_level + 1
   FROM skeleton s, skellarchy r
   WHERE r.bone = s.connected_to_the           
 )
SEARCH DEPTH FIRST BY bone SET bone_order
CYCLE bone SET is_a_cycle TO 'Y' DEFAULT 'N'
SELECT lpad(' ',2*the_level, ' ') || bone AS bone_tree , the_level,
  lead(the_level) OVER (ORDER BY bone_order) AS next_level,
  CASE 
    WHEN the_level < lead(the_level) OVER (ORDER BY bone_order) THEN NULL
    ELSE 'LEAF'
  END is_leaf
FROM skellarchy
WHERE the_level < 8 
ORDER BY bone_order;

BONE_TREE                                                     THE_LEVEL NEXT_LEVEL IS_L
------------------------------------------------------------ ---------- ---------- ----
head                                                                  0          1
  neck                                                                1          2
    shoulder                                                          2          3
      arm                                                             3          4
        wrist                                                         4          5
          finger                                                      5          3 LEAF
      back                                                            3          4
        a rib                                                         4          4 LEAF
        b rib                                                         4          4 LEAF
        c rib                                                         4          4 LEAF
        hip                                                           4          5
          thigh                                                       5          6
            knee                                                      6          7
              leg                                                     7            LEAF      <<<=====

The leg is falsely identified as a leaf, and NEXT_LEVEL comes out as NULL, even though the ‘leg’ row has a child row. Why is that? It’s because this solution uses the LEAD analytic function. With analytic functions, WHERE clauses are evaluated before the analytic functions.

Highlighting the relevant bits from the query:

WITH skellarchy AS ...[recursive WITH subquery]...
SELECT ... LEAD(the_level) OVER (ORDER BY bone_order) AS next_level ... --analytic function
FROM skellarchy
WHERE the_level < 8 ...                                                 --where clause

To quote the documentation:

Analytic functions compute an aggregate value based on a group of rows…. The group of rows is called a window and is defined by the analytic_clause. For each row, a sliding window of rows is defined. The window determines the range of rows used to perform the calculations for the current row…. Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. All joins and all WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are processed.

In the query above, “where the_level < 8" will be evaluated before LEAD(the_level). The EXPLAIN PLAN shows this very clearly:

-----------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |          |     2 |    76 |     8  (25)| 00:00:01 |
|   1 |  WINDOW BUFFER                           |          |     2 |    76 |     8  (25)| 00:00:01 |  <<=== LEAD
|*  2 |   VIEW                                   |          |     2 |    76 |     8  (25)| 00:00:01 |  <<=== filter("THE_LEVEL"<8)
|   3 |    UNION ALL (RECURSIVE WITH) DEPTH FIRST|          |       |       |            |          |
|*  4 |     TABLE ACCESS FULL                    | SKELETON |     1 |    24 |     2   (0)| 00:00:01 |
|*  5 |     HASH JOIN                            |          |     1 |    49 |     5  (20)| 00:00:01 |
|   6 |      RECURSIVE WITH PUMP                 |          |       |       |            |          |
|   7 |      TABLE ACCESS FULL                   | SKELETON |    18 |   432 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter("THE_LEVEL"<8)
   4 - filter("BONE"='head')
   5 - access("R"."BONE"="S"."CONNECTED_TO_THE")

The WINDOW BUFFER (analytic window) is evaluated after the VIEW which filters on “THE_LEVEL”<8. So, "lead(the_level) over (order by bone_order)" will be null where the_level=7, and the 'leg' wrongly identified as a leaf node. What we actually want is for the analytic function LEAD to run over the whole resultset, and only then limit the results to show the levels 0-7. The obvious way to do this is to wrap the query in a second SELECT statement:

SELECT * FROM (
  WITH skellarchy (bone, parent, the_level) AS
   ( SELECT bone, connected_to_the, 0  FROM skeleton 
     WHERE bone = 'head'                         
   UNION ALL
     SELECT s.bone, s.connected_to_the , r.the_level + 1
     FROM skeleton s, skellarchy r
     WHERE r.bone = s.connected_to_the           
   )
  SEARCH DEPTH FIRST BY bone SET bone_order
  CYCLE bone SET is_a_cycle TO 'Y' DEFAULT 'N'
  SELECT lpad(' ',2*the_level, ' ') || bone AS bone_tree , the_level,
    lead(the_level) OVER (ORDER BY bone_order) AS next_level,
    CASE 
      WHEN the_level < lead(the_level) OVER (ORDER BY bone_order) THEN NULL
      ELSE 'LEAF'
    END is_leaf
  FROM skellarchy
  ORDER BY bone_order
) WHERE the_level < 8;

BONE_TREE                                                     THE_LEVEL NEXT_LEVEL IS_L
------------------------------------------------------------ ---------- ---------- ----
head                                                                  0          1
  neck                                                                1          2
    shoulder                                                          2          3
      arm                                                             3          4
        wrist                                                         4          5
          finger                                                      5          3 LEAF
      back                                                            3          4
        a rib                                                         4          4 LEAF
        b rib                                                         4          4 LEAF
        c rib                                                         4          4 LEAF
        hip                                                           4          5
          thigh                                                       5          6
            knee                                                      6          7
              leg                                                     7          8

Now, the analytic function in the inner query is evaluated first, before the WHERE clause in the outer query. We can see this in the EXPLAIN PLAN too, of course. Now the WINDOW BUFFER (analytic window) is evaluated before the VIEW with filter(“THE_LEVEL”<8) :

------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |          |     2 |  4068 |     8  (25)| 00:00:01 |
|*  1 |  VIEW                                     |          |     2 |  4068 |     8  (25)| 00:00:01 |  <<=== filter("THE_LEVEL"<8)
|   2 |   WINDOW BUFFER                           |          |     2 |    76 |     8  (25)| 00:00:01 |  <<=== LEAD
|   3 |    VIEW                                   |          |     2 |    76 |     8  (25)| 00:00:01 |
|   4 |     UNION ALL (RECURSIVE WITH) DEPTH FIRST|          |       |       |            |          |
|*  5 |      TABLE ACCESS FULL                    | SKELETON |     1 |    24 |     2   (0)| 00:00:01 |
|*  6 |      HASH JOIN                            |          |     1 |    49 |     5  (20)| 00:00:01 |
|   7 |       RECURSIVE WITH PUMP                 |          |       |       |            |          |
|   8 |       TABLE ACCESS FULL                   | SKELETON |    18 |   432 |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("THE_LEVEL"<8)
   5 - filter("BONE"='head')
   6 - access("R"."BONE"="S"."CONNECTED_TO_THE")

This is one case of the general point that, as Tom Kyte explains in this Ask Tom answer,“select analytic_function from t where CONDITION” is NOT THE SAME AS “select * from (select analytic_function from t) where CONDITION”.

So, to sum up my last few posts, we can do everything that CONNECT BY can do with the 11g recursive WITH syntax. Plus, the recursive WITH syntax makes it easy to express simple recursive algorithms in SQL.


Republished with permission. Original URL: http://rdbms-insight.com/wp/?p=135