Skeetendo

’Cause all games were better on the GBC

You are not logged in.

#1 2016-09-19 17:05:50

MechanicalPen
Member
Registered: 2016-06-22
Post 13/29

[z80] Fast Square Root

If calculating your Stat Exp takes too long (maybe you are trying to add Stat Exp to enemy trainers like me!) then you can use this function to speed it up. It is roughly 8x faster than Gen1's method in the worst case and maybe 3x-2x faster than Gen2's method (I didn't test it yet). Keep in mind that the default Stat Exp formula actually 'rounds up' where as this one gives you an exact number.

Special thanks to the TI-83 and ZX Spectrum communities for pointing me in the right direction.

;Square Root
;Inputs:
;DE = number to be square rooted
;Outputs:
;E  = square root
;111 bytes
;555 t-states worst case
FastSquareRoot:
;zero some registers
    ld h, d
    ld l, e
    xor a
    ld c,a
    ld d,a

;move the LSB of the input into E for later use, then shift the LSB into L and load H with 0.
;H will be a carry register, where the bits in L are rotated in
    ld e,l
    ld l,h
    ld h,c

;Iteration 1 is optimised
; C is treated as the accumulator
    add hl,hl
    add hl,hl
    sub h
    jr nc,.iteration2
    inc c
    cpl
    ld h,a

.iteration2
; rotate in 2 more bits from the MSB of the input into H
    add hl,hl
    add hl,hl
; shift the accumulator
    rl c
    ld a,c
    rla
; A is now double the shifted accumulator
    sub h
; doubles as a comparison of the carry register (H) to double the accumulator
    jr nc,.iteration3
; If the carry is > 2*accumulator, the bit in the accumulator needs to be 1:
    inc c
; We need to perform H-(2C+1), but A=2C-H.
; We could do NEG to get A=H-2C, then DEC A, but NEG = CPL \ INC A
; NEG \ DEC A  =  CPL \ INC A \ DEC A
; So just use CPL, saving 8 t-states, 1 byte
    cpl
    ld h,a

.iteration3
    add hl,hl
    add hl,hl
    rl c
    ld a,c
    rla
    sub h
    jr nc,.iteration4
    inc c
    cpl
    ld h,a

;Iteration 4
.iteration4
    add hl,hl
    add hl,hl
    rl c
    ld a,c
    rla
    sub h
    jr nc,.iteration5
    inc c
    cpl
    ld h,a

.iteration5
;L is 0, H is the current carry
;E is the lower 8 bits
; Load the next set of bits (LSB of input) into L so that they can be rotated into H
    ld l,e

    add hl,hl
    add hl,hl
    rl c
    ld a,c
    rla
    sub h
    jr nc,.iteration6
    inc c
    cpl
    ld h,a

.iteration6
    add hl,hl
    add hl,hl
    rl c
    ld a,c
    rla
    sub h
    jr nc,.iteration7
    inc c
    cpl
    ld h,a

.iteration7
; Now we need to start worrying about 8 bit overflow.
; In particular, the carry register, H should be ideally 9 bits for this iteration, 10 for the last.
; The accumulator, C, is 8 bits, but we need to compare H to 2*C, and 2*C is up to 9 bits on the last iteration.
;l has 4 more bits to rotate into h

    sla c
    ld a,c
    add a,a
    add hl,hl
    add hl,hl
    jr nc,.subOnce;$+6
    sub h
    jp .accumulate;$+6
.subOnce
    sub h
    jr nc,.iteration8
.accumulate
    inc c
    cpl
    ld h,a

.iteration8
;Iteration 8
; A lot of fancy stuff here
; D is 0, from way back at the beginning
; now I put H->E so that DE can hold the potentially 10-bit number
; Now C->A, L->H
; H thus has the last two bits of the input that need to be rotated into DE
; L has the value of the accumualtor which needs to be multiplied by 4 for a comparison to DE
; So 2 shifts of HL into DE results in DE holding the carry, HL holding 4*accumulated result!    
    ld e,h
    ld h,l
    ld l,c
       ld a,l
    add hl,hl
    rl e
    rl d
    add hl,hl
    rl e
    rl d
    
    push af
    ld a, l
    sub e
    ld a, h
    sbc d
    pop bc
    ld a, b
;the c flag now has the state of the last bit of the result, HL does not need to be restored.
    rla
    ld e, a ;output in e.
    ret

Offline

#2 2016-10-08 21:20:36

Rangi
Member
Registered: 2016-05-09
Post 243/870

Re: [z80] Fast Square Root

Interesting! Gen 2 uses a lookup table, which I would expect to be the fastest method.

Since SquareRoot is only used for calculating stat experience, at least in Crystal, another option (more effort though) is to replace it with EVs, which are kind of "pre-sqrt'd", being 0–255 instead of 0–65536. Which also saves space, taking up half as many bytes. You would have to add EV gains to Pokémon's base data.

From pokecrystal:

GetSquareRoot:
; Return the square root of de in b.

; Rather than calculating the result, we take the index of the
; first value in a table of squares that isn't lower than de.

    ld hl, Squares
    ld b, 0
.loop
; Make sure we don't go past the end of the table.
    inc b
    ld a, b
    cp $ff
    ret z

; Iterate over the table until b**2 >= de.
    ld a, [hli]
    sub e
    ld a, [hli]
    sbc d

    jr c, .loop
    ret

Squares:
root    set 1
    rept $ff
    dw root*root
root    set root+1
    endr

Which is slower for large square roots, so depending on the data yours could end up being faster. Anyway, it's a nice algorithm.


Pokémon Polished Crystal (GitHub) — version 2.2.0 released
Pokémon Red★ and Blue★: Space World Edition (GitHub) — updated August 19!
Polished Map: pokered+pokecrystal map, tileset, and palette editor — version 3.5.1 released!

Offline

Board footer

Powered by FluxBB