2010-04-07

More looping

For context, see Nathan's blog. [Edit: I've just learned on IRC from Maciej Katafiasz that iterate also has a feature built in which renders this hack unnecessary. Even more iterate awesomeness! I'll leave this post up though, because I still think it' s pretty cute that SBCL optimizes this case so nicely.]
MY-USER> (defun outer-collect-in-inner-loop ()
           (declare (optimize speed (safety 0) (debug 0)))
           (iter (for outer from 0 below 10)
                 (declare (type (integer -1 10) outer))
                 (flet ((outer-collect (x) (collect x)))
                   (iter (for inner from 0 below 10)
                         (declare (type (integer -1 10) inner))
                         (outer-collect (+ (* 10 outer) inner))))))
OUTER-COLLECT-IN-INNER-LOOP
Note that the COLLECT in the outer loop is ITERATE's collecting feature. The trick is to use it in a FLET before starting the inner loop, thereby making it available further down. Declarations added only to get a nicer disassembly.
MY-USER> (OUTER-COLLECT-IN-INNER-LOOP)
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99)
Okay, it works. But how is it being compiled?
MY-USER> (disassemble 'OUTER-COLLECT-IN-INNER-LOOP)
; disassembly for OUTER-COLLECT-IN-INNER-LOOP
; 069222FF:       31C9             XOR ECX, ECX               ; no-arg-parsing entry point
;      301:       BE17001020       MOV ESI, 537919511
;      306:       BF17001020       MOV EDI, 537919511
;      30B:       BA17001020       MOV EDX, 537919511
;      310:       48C7C1F8FFFFFF   MOV RCX, -8
;      317:       E977000000       JMP L5
;      31C:       90               NOP
;      31D:       90               NOP
;      31E:       90               NOP
;      31F:       90               NOP
;      320: L0:   31DB             XOR EBX, EBX
;      322:       48C7C3F8FFFFFF   MOV RBX, -8
;      329:       EB5B             JMP L4
;      32B:       90               NOP
;      32C:       90               NOP
;      32D:       90               NOP
;      32E:       90               NOP
;      32F:       90               NOP
;      330: L1:   488BD1           MOV RDX, RCX
;      333:       486BD20A         IMUL RDX, RDX, 10
;      337:       4C8BCB           MOV R9, RBX
;      33A:       4E8D040A         LEA R8, [RDX+R9]
;      33E:       4989AC24B8000000 MOV [R12+184], RBP
;      346:       4D8B5C2468       MOV R11, [R12+104]
;      34B:       498D5310         LEA RDX, [R11+16]
;      34F:       4939542470       CMP [R12+112], RDX
;      354:       765C             JBE L7
;      356:       4989542468       MOV [R12+104], RDX
;      35B:       498D5307         LEA RDX, [R11+7]
;      35F: L2:   4931AC24B8000000 XOR [R12+184], RBP
;      367:       7402             JEQ L3
;      369:       CC09             BREAK 9                    ; pending interrupt trap
;      36B: L3:   4C8942F9         MOV [RDX-7], R8
;      36F:       48C7420117001020 MOV QWORD PTR [RDX+1], 537919511
;      377:       4881FE17001020   CMP RSI, 537919511
;      37E:       7529             JNE L6
;      380:       488BF2           MOV RSI, RDX
;      383:       488BFA           MOV RDI, RDX
;      386: L4:   4883C308         ADD RBX, 8
;      38A:       488BD3           MOV RDX, RBX
;      38D:       4883FA50         CMP RDX, 80
;      391:       7C9D             JL L1
;      393: L5:   4883C108         ADD RCX, 8
;      397:       488BD1           MOV RDX, RCX
;      39A:       4883FA50         CMP RDX, 80
;      39E:       7C80             JL L0
;      3A0:       488BD6           MOV RDX, RSI
;      3A3:       488BE5           MOV RSP, RBP
;      3A6:       F8               CLC
;      3A7:       5D               POP RBP
;      3A8:       C3               RET
;      3A9: L6:   48895701         MOV [RDI+1], RDX
;      3AD:       488BFA           MOV RDI, RDX
;      3B0:       EBD4             JMP L4
;      3B2: L7:   6A10             PUSH 16
;      3B4:       4C8D1C25601B4200 LEA R11, [#x421B60]        ; alloc_tramp
;      3BC:       41FFD3           CALL R11
;      3BF:       5A               POP RDX
;      3C0:       488D5207         LEA RDX, [RDX+7]
;      3C4:       EB99             JMP L2

2 comments:

Levente Mészáros said...

IMO this one looks somewhat better:

(defun outer-collect-in-inner-loop ()
(declare (optimize speed (safety 0) (debug 0)))
(iter outer
(for outer from 0 below 10)
(declare (type (integer -1 10) outer))
(iter (for inner from 0 below 10)
(declare (type (integer -1 10) inner))
(in outer (collect (+ (* 10 outer) inner))))))

Attila Lendvai said...

bah, i was too slow... :)