For context, see Nathan's blog.
[
Edit: I've just learned on IRC from Maciej Katafiasz that iterate also has a
feature built in which renders this hack unnecessary. Even more iterate awesomeness! I'll leave this post up though, because I still think it' s pretty cute that SBCL optimizes this case so nicely.]
MY-USER> (defun outer-collect-in-inner-loop ()
(declare (optimize speed (safety 0) (debug 0)))
(iter (for outer from 0 below 10)
(declare (type (integer -1 10) outer))
(flet ((outer-collect (x) (collect x)))
(iter (for inner from 0 below 10)
(declare (type (integer -1 10) inner))
(outer-collect (+ (* 10 outer) inner))))))
OUTER-COLLECT-IN-INNER-LOOP
Note that the COLLECT in the outer loop is ITERATE's collecting feature. The trick is to use it in a FLET before starting the inner loop, thereby making it available further down.
Declarations added only to get a nicer disassembly.
MY-USER> (OUTER-COLLECT-IN-INNER-LOOP)
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99)
Okay, it works. But how is it being compiled?
MY-USER> (disassemble 'OUTER-COLLECT-IN-INNER-LOOP)
; disassembly for OUTER-COLLECT-IN-INNER-LOOP
; 069222FF: 31C9 XOR ECX, ECX ; no-arg-parsing entry point
; 301: BE17001020 MOV ESI, 537919511
; 306: BF17001020 MOV EDI, 537919511
; 30B: BA17001020 MOV EDX, 537919511
; 310: 48C7C1F8FFFFFF MOV RCX, -8
; 317: E977000000 JMP L5
; 31C: 90 NOP
; 31D: 90 NOP
; 31E: 90 NOP
; 31F: 90 NOP
; 320: L0: 31DB XOR EBX, EBX
; 322: 48C7C3F8FFFFFF MOV RBX, -8
; 329: EB5B JMP L4
; 32B: 90 NOP
; 32C: 90 NOP
; 32D: 90 NOP
; 32E: 90 NOP
; 32F: 90 NOP
; 330: L1: 488BD1 MOV RDX, RCX
; 333: 486BD20A IMUL RDX, RDX, 10
; 337: 4C8BCB MOV R9, RBX
; 33A: 4E8D040A LEA R8, [RDX+R9]
; 33E: 4989AC24B8000000 MOV [R12+184], RBP
; 346: 4D8B5C2468 MOV R11, [R12+104]
; 34B: 498D5310 LEA RDX, [R11+16]
; 34F: 4939542470 CMP [R12+112], RDX
; 354: 765C JBE L7
; 356: 4989542468 MOV [R12+104], RDX
; 35B: 498D5307 LEA RDX, [R11+7]
; 35F: L2: 4931AC24B8000000 XOR [R12+184], RBP
; 367: 7402 JEQ L3
; 369: CC09 BREAK 9 ; pending interrupt trap
; 36B: L3: 4C8942F9 MOV [RDX-7], R8
; 36F: 48C7420117001020 MOV QWORD PTR [RDX+1], 537919511
; 377: 4881FE17001020 CMP RSI, 537919511
; 37E: 7529 JNE L6
; 380: 488BF2 MOV RSI, RDX
; 383: 488BFA MOV RDI, RDX
; 386: L4: 4883C308 ADD RBX, 8
; 38A: 488BD3 MOV RDX, RBX
; 38D: 4883FA50 CMP RDX, 80
; 391: 7C9D JL L1
; 393: L5: 4883C108 ADD RCX, 8
; 397: 488BD1 MOV RDX, RCX
; 39A: 4883FA50 CMP RDX, 80
; 39E: 7C80 JL L0
; 3A0: 488BD6 MOV RDX, RSI
; 3A3: 488BE5 MOV RSP, RBP
; 3A6: F8 CLC
; 3A7: 5D POP RBP
; 3A8: C3 RET
; 3A9: L6: 48895701 MOV [RDI+1], RDX
; 3AD: 488BFA MOV RDI, RDX
; 3B0: EBD4 JMP L4
; 3B2: L7: 6A10 PUSH 16
; 3B4: 4C8D1C25601B4200 LEA R11, [#x421B60] ; alloc_tramp
; 3BC: 41FFD3 CALL R11
; 3BF: 5A POP RDX
; 3C0: 488D5207 LEA RDX, [RDX+7]
; 3C4: EB99 JMP L2
2 comments:
IMO this one looks somewhat better:
(defun outer-collect-in-inner-loop ()
(declare (optimize speed (safety 0) (debug 0)))
(iter outer
(for outer from 0 below 10)
(declare (type (integer -1 10) outer))
(iter (for inner from 0 below 10)
(declare (type (integer -1 10) inner))
(in outer (collect (+ (* 10 outer) inner))))))
bah, i was too slow... :)
Post a Comment