’Cause all games were better on the GBC

You are not logged in.

- Index
- → Tools/Resources
- →
**[z80] Fast Square Root**

Pages: **1**

**MechanicalPen****Member**- Registered: 2016-06-22
- Post 13/29

If calculating your Stat Exp takes too long (maybe you are trying to add Stat Exp to enemy trainers like me!) then you can use this function to speed it up. It is roughly 8x faster than Gen1's method in the worst case and maybe 3x-2x faster than Gen2's method (I didn't test it yet). Keep in mind that the default Stat Exp formula actually 'rounds up' where as this one gives you an exact number.

Special thanks to the TI-83 and ZX Spectrum communities for pointing me in the right direction.

```
;Square Root
;Inputs:
;DE = number to be square rooted
;Outputs:
;E = square root
;111 bytes
;555 t-states worst case
FastSquareRoot:
;zero some registers
ld h, d
ld l, e
xor a
ld c,a
ld d,a
;move the LSB of the input into E for later use, then shift the LSB into L and load H with 0.
;H will be a carry register, where the bits in L are rotated in
ld e,l
ld l,h
ld h,c
;Iteration 1 is optimised
; C is treated as the accumulator
add hl,hl
add hl,hl
sub h
jr nc,.iteration2
inc c
cpl
ld h,a
.iteration2
; rotate in 2 more bits from the MSB of the input into H
add hl,hl
add hl,hl
; shift the accumulator
rl c
ld a,c
rla
; A is now double the shifted accumulator
sub h
; doubles as a comparison of the carry register (H) to double the accumulator
jr nc,.iteration3
; If the carry is > 2*accumulator, the bit in the accumulator needs to be 1:
inc c
; We need to perform H-(2C+1), but A=2C-H.
; We could do NEG to get A=H-2C, then DEC A, but NEG = CPL \ INC A
; NEG \ DEC A = CPL \ INC A \ DEC A
; So just use CPL, saving 8 t-states, 1 byte
cpl
ld h,a
.iteration3
add hl,hl
add hl,hl
rl c
ld a,c
rla
sub h
jr nc,.iteration4
inc c
cpl
ld h,a
;Iteration 4
.iteration4
add hl,hl
add hl,hl
rl c
ld a,c
rla
sub h
jr nc,.iteration5
inc c
cpl
ld h,a
.iteration5
;L is 0, H is the current carry
;E is the lower 8 bits
; Load the next set of bits (LSB of input) into L so that they can be rotated into H
ld l,e
add hl,hl
add hl,hl
rl c
ld a,c
rla
sub h
jr nc,.iteration6
inc c
cpl
ld h,a
.iteration6
add hl,hl
add hl,hl
rl c
ld a,c
rla
sub h
jr nc,.iteration7
inc c
cpl
ld h,a
.iteration7
; Now we need to start worrying about 8 bit overflow.
; In particular, the carry register, H should be ideally 9 bits for this iteration, 10 for the last.
; The accumulator, C, is 8 bits, but we need to compare H to 2*C, and 2*C is up to 9 bits on the last iteration.
;l has 4 more bits to rotate into h
sla c
ld a,c
add a,a
add hl,hl
add hl,hl
jr nc,.subOnce;$+6
sub h
jp .accumulate;$+6
.subOnce
sub h
jr nc,.iteration8
.accumulate
inc c
cpl
ld h,a
.iteration8
;Iteration 8
; A lot of fancy stuff here
; D is 0, from way back at the beginning
; now I put H->E so that DE can hold the potentially 10-bit number
; Now C->A, L->H
; H thus has the last two bits of the input that need to be rotated into DE
; L has the value of the accumualtor which needs to be multiplied by 4 for a comparison to DE
; So 2 shifts of HL into DE results in DE holding the carry, HL holding 4*accumulated result!
ld e,h
ld h,l
ld l,c
ld a,l
add hl,hl
rl e
rl d
add hl,hl
rl e
rl d
push af
ld a, l
sub e
ld a, h
sbc d
pop bc
ld a, b
;the c flag now has the state of the last bit of the result, HL does not need to be restored.
rla
ld e, a ;output in e.
ret
```

Offline

**Rangi****Administrator**- Registered: 2016-05-09
- Post 243/984

Interesting! Gen 2 uses a lookup table, which I would expect to be the fastest method.

Since SquareRoot is *only* used for calculating stat experience, at least in Crystal, another option (more effort though) is to replace it with EVs, which are kind of "pre-sqrt'd", being 0–255 instead of 0–65536. Which also saves space, taking up half as many bytes. You would have to add EV gains to Pokémon's base data.

From pokecrystal:

```
GetSquareRoot:
; Return the square root of de in b.
; Rather than calculating the result, we take the index of the
; first value in a table of squares that isn't lower than de.
ld hl, Squares
ld b, 0
.loop
; Make sure we don't go past the end of the table.
inc b
ld a, b
cp $ff
ret z
; Iterate over the table until b**2 >= de.
ld a, [hli]
sub e
ld a, [hli]
sbc d
jr c, .loop
ret
Squares:
root set 1
rept $ff
dw root*root
root set root+1
endr
```

Which is slower for large square roots, so depending on the data yours *could* end up being faster. Anyway, it's a nice algorithm.

My projects on GitHub:

• **Polished Map** 4.7.1 or 2.7.1++

• **Tilemap Studio** 4.0.1

• **Pokémon Polished Crystal** 2.2.0 or 3.0.0 beta

• **Pokémon Red★/Blue★:** Space World Edition 2020-11-01

Offline

Pages: **1**

- Index
- → Tools/Resources
- →
**[z80] Fast Square Root**