# Skeetendo

’Cause all games were better on the GBC

You are not logged in.

## #1 2016-09-19 17:05:50

MechanicalPen
Member
Registered: 2016-06-22
Post 13/29

### [z80] Fast Square Root

If calculating your Stat Exp takes too long (maybe you are trying to add Stat Exp to enemy trainers like me!) then you can use this function to speed it up. It is roughly 8x faster than Gen1's method in the worst case and maybe 3x-2x faster than Gen2's method (I didn't test it yet). Keep in mind that the default Stat Exp formula actually 'rounds up' where as this one gives you an exact number.

Special thanks to the TI-83 and ZX Spectrum communities for pointing me in the right direction.

``````;Square Root
;Inputs:
;DE = number to be square rooted
;Outputs:
;E  = square root
;111 bytes
;555 t-states worst case
FastSquareRoot:
;zero some registers
ld h, d
ld l, e
xor a
ld c,a
ld d,a

;move the LSB of the input into E for later use, then shift the LSB into L and load H with 0.
;H will be a carry register, where the bits in L are rotated in
ld e,l
ld l,h
ld h,c

;Iteration 1 is optimised
; C is treated as the accumulator
sub h
jr nc,.iteration2
inc c
cpl
ld h,a

.iteration2
; rotate in 2 more bits from the MSB of the input into H
; shift the accumulator
rl c
ld a,c
rla
; A is now double the shifted accumulator
sub h
; doubles as a comparison of the carry register (H) to double the accumulator
jr nc,.iteration3
; If the carry is > 2*accumulator, the bit in the accumulator needs to be 1:
inc c
; We need to perform H-(2C+1), but A=2C-H.
; We could do NEG to get A=H-2C, then DEC A, but NEG = CPL \ INC A
; NEG \ DEC A  =  CPL \ INC A \ DEC A
; So just use CPL, saving 8 t-states, 1 byte
cpl
ld h,a

.iteration3
rl c
ld a,c
rla
sub h
jr nc,.iteration4
inc c
cpl
ld h,a

;Iteration 4
.iteration4
rl c
ld a,c
rla
sub h
jr nc,.iteration5
inc c
cpl
ld h,a

.iteration5
;L is 0, H is the current carry
;E is the lower 8 bits
; Load the next set of bits (LSB of input) into L so that they can be rotated into H
ld l,e

rl c
ld a,c
rla
sub h
jr nc,.iteration6
inc c
cpl
ld h,a

.iteration6
rl c
ld a,c
rla
sub h
jr nc,.iteration7
inc c
cpl
ld h,a

.iteration7
; Now we need to start worrying about 8 bit overflow.
; In particular, the carry register, H should be ideally 9 bits for this iteration, 10 for the last.
; The accumulator, C, is 8 bits, but we need to compare H to 2*C, and 2*C is up to 9 bits on the last iteration.
;l has 4 more bits to rotate into h

sla c
ld a,c
jr nc,.subOnce;\$+6
sub h
jp .accumulate;\$+6
.subOnce
sub h
jr nc,.iteration8
.accumulate
inc c
cpl
ld h,a

.iteration8
;Iteration 8
; A lot of fancy stuff here
; D is 0, from way back at the beginning
; now I put H->E so that DE can hold the potentially 10-bit number
; Now C->A, L->H
; H thus has the last two bits of the input that need to be rotated into DE
; L has the value of the accumualtor which needs to be multiplied by 4 for a comparison to DE
; So 2 shifts of HL into DE results in DE holding the carry, HL holding 4*accumulated result!
ld e,h
ld h,l
ld l,c
ld a,l
rl e
rl d
rl e
rl d

push af
ld a, l
sub e
ld a, h
sbc d
pop bc
ld a, b
;the c flag now has the state of the last bit of the result, HL does not need to be restored.
rla
ld e, a ;output in e.
ret``````

Offline

## #2 2016-10-08 21:20:36

Rangi Registered: 2016-05-09
Post 243/977

### Re: [z80] Fast Square Root

Interesting! Gen 2 uses a lookup table, which I would expect to be the fastest method.

Since SquareRoot is only used for calculating stat experience, at least in Crystal, another option (more effort though) is to replace it with EVs, which are kind of "pre-sqrt'd", being 0–255 instead of 0–65536. Which also saves space, taking up half as many bytes. You would have to add EV gains to Pokémon's base data.

From pokecrystal:

``````GetSquareRoot:
; Return the square root of de in b.

; Rather than calculating the result, we take the index of the
; first value in a table of squares that isn't lower than de.

ld hl, Squares
ld b, 0
.loop
; Make sure we don't go past the end of the table.
inc b
ld a, b
cp \$ff
ret z

; Iterate over the table until b**2 >= de.
ld a, [hli]
sub e
ld a, [hli]
sbc d

jr c, .loop
ret

Squares:
root    set 1
rept \$ff
dw root*root
root    set root+1
endr``````

Which is slower for large square roots, so depending on the data yours could end up being faster. Anyway, it's a nice algorithm.

Offline