I used Ramsofts floating bus and interrupt test program to test the new contention model: measurements were good, but not perfect.
The Harlequin boots okay, but the first video byte during uncontended floating bus read is returned at T-state 14343, which is four T-states earlier than a real ZX Spectrum.
The contended floating bus read (say port 16639) does not return the usual 67, 255 x 7, 69, 255 x 7, 71 ... sequence but instead returns 67 x 8, 69 x 8, 71 x 8, etc ( where V x N means V repeats N times).
Looking closely at the signals generated, I notice that the CPU clock showed a variable phase during the video fetch, and most probably continuously. All previous tests had seen this being a low to high transition. See the ULA contention diagram.
I suspected that holding the CPU clock high instead of activating the WAIT CPU signal was the cause of this instability. I know from previous tests that I want the CPU clock to make an upwards transition during the video fetch latches AL1/2 to get an effective floating bus read (ie the downward T3 clock transition at the end of the fetch).
During an IO instruction, IOREQ goes low shortly after the rising edge of the CPU clock at the start of the second T-state. It is at this point that we decide whether or not to pause the CPU. Until now I've been labelling the start of the T-states so that they line up with the video latch pulses. This is not correct, as a T-state starts with the rising edge of the CPU clock. Shifting the labelling one half CPU cycle left leaves a diagram that looks odd, even if it is technically correct:
This may well explain the unstable CPU clock. If our 'wait' period begins half way through a T-state (as it does at '7' above), then as IOREQ goes low at the start of a T-state, we will have lost half a period worth of 'wait' and thus won't hold the clock for an whole number of clock cycles. Therefore the clock transitions will become inconsistent and the clock phase will be seen to flip. Notice that in the diagram above, TW and T3 do not line up with the markers at 7 and 8.
The CLKcpu shows the effect of responding to the IOREQ during T2. CLKcpu is created by dividing CLK7 by 2 via a D-Type flip-flop, which we hold high during our 'wait'. Notice that CLKcpu is held high for an extra half cycle after IO WAIT is removed (position 5 above). This is because the output of the CLK3.5 D-Type changes state at the next positive edge of CLK7, which will be half a CLK3.5 period later.
The following scope picture shows IO WAIT aligned to the second AL2 latch signal:
In order to guarantee a whole number of clock cycles are held high during a 'wait', the WAIT signal should be synchronised to the start of a T-state. We can do this by delaying the wait by half a cycle. The following diagram shows the effect of this on the generated CLKcpu:
Notice how the phase of CLKcpu is maintained, and the clock is held high for an exact number of cycles.
This picture shows the WAIT signal delayed by half a CPU clock cycle:
There are two methods of aligning the WAIT signals (including MEM WAIT) with the start of a T-state:
- CLK3.5 starting low, WAIT delayed half a cycle to bring in line with rising edge of CLK3.5. (Solution above).
- CLK3.5 inverted (starting high), WAIT not delayed as it will already be in line with rising edge of clock.
I tried both these options, with interesting results:
- The first video byte on floating bus read occurs at T-state 14346.
The contended floating bus read (say port 16639) 65, 255 x 1, 67, 255 x 5, 69, 255 x 7, 71, 255 x 7, 73, 255 x 7 ... starting at 14345. - The first video byte on floating bus read occurs at T-state 14343 (ie early), and the read returns intermittent values.
The contended floating bus read returns repeating attribute values with no 255 bytes between them.
With solution 1, the floating bus auto test failed on a few reads, which was a surprise, however the downward edge of CLKcpu lags behind AL2 by approximately 17ns.
This bothered me a bit, as any fluctuation in timing would potentially cause a miss-read of the floating bus, which is probably what is happening. The Z80 performs an IO read slightly before the downward clock transition of T3. It is essential that the value on the data bus is stable by this time, so if our clock transitions are too early, we will read intermittent values. I've not as yet been able to pin-point at what point exactly the IO read takes place in the Harlequin, but it must be happening soon after the data bus is stable during the video memory read as doing the read 17ns earlier is enough to occasionally miss the byte.
To fix this, instead of dividing CLK7 by 2 to give CLK3.5, I derived CLK3.5 from HC0 (which oscillates at 3.5MHz) and passed it through the old dividing D-Type which I clocked on the downwards edge of CLK14. This delays CLK3.5 slightly and brings is in line with the video byte fetches.
Note: Because CLK3.5 is now being generated from a stable 3.5MHz source (HC0) it will always maintain phase, so the half cycle delay of IOWait discussed on this page no longer performs this function. See "Improving Stability" for a more accurate analysis of IOWait alignment.
It is important that the memory access, clock and clock-effecting signals are synchronised if spurious timing issues are to be avoided. For instance if the WAIT signal and CLK are out of sync, our CLKcpu will be full of glitches!
This new delayed CLK3.5 cleared up the occasional miss-reads completely, and the auto test succeeds every time! And I'm much happier about the relationship between CLK3.5 and AL1 and 2.
Note: The missreads are infact due to sampling the first idle floating bus value after a video fetch whilst the bus is stabalising. It is more appropriate and stable to pull the databus up to VCC harder instead of delaying the clock. See Clock Alignment Stability for details.
Examining the combined memory and IO wait implemented in waitGen version 1.11 there was a very visible glitch.
This was caused by the memory wait occurring slightly before the IO wait. This was easily resolved by buffering the memory wait through a spare gate before combining with the IO wait:
All of these features have been incorporated into the Clock Generator and Wait Generator schematics version 1.12.