The issue of there seemingly being two contention models within the ZX Spectrum, one for IO and one for memory, was raised on the World Of Spectrum Forums, as was the question of the documented first contended T-state.
After much testing and discussion it transpires that the Spectrum memory contention model is not as I first thought. Thank you to Patrik Rak for working this out, and other members of the WOS forum (click here to see who they are) for helping to identify exactly what is happening at a timing level.
Our analysis has shown that only T1 of an instruction fetch, memory read or write cycle is contended, the other cycles are not. Previously I was under the impression that contention occurred whenever there was a valid address on the address bus, and was therefore contending T1 and T2 of an instruction fetch, T1 to T3 of a memory read or write.
I blankly assumed that the entire duration that the ULA was fetching video bytes would need to be contended, with the six contended cycles spanning the four byte fetch cycles quite nicely. So if only the first cycle is contended, how does the ZX Spectrum avoid conflicting with the video byte fetches?
I've always been bothered by a number of issues with the current design, and that having two contention models seemed rather complex. The ZX Spectrum's behaviour arises from its simplicity, so I've had a little "mission statement" that I've tried to adhere to in reverse-engineering the ULA: "If it's complex, it's probably wrong".
The troublesome issues are:
- Why does all existing documentation point to a single contention model, equivalent to the Harlequin IOWait pattern, and how could this work?
- Why is the video bytefetch so late (4th cycle), wasting all those cycles before it?
- Why does it look like there are two contention models?
- Why is the IO contention so clearly based on what's happening at a T-state level, and memory contention does not appear to be so?
- Why is there a crossover between contention patterns, IO and memory, when MREQ becomes active?
I was particularly nervous of the IO and memory contention cross-over. This was unusually complicated and subject to timing problems, and all the existing documentation supports a single contention pattern.
Single T-state Contention
If the ZX Spectrum indeed contends just T1 of a memory access, I needed to look at the datasheet to see where in a fetch or memory read/write operation the memory access really takes place. The datasheet shows this to be at the end of T2 / beginning of T3 for the instruction fetch cycle (M1), and halfway through T3 for a memory read write cycle.
In order for a memory access clash to be avoided, the ZX Spectrum ULA must arrange for the Z80 cycle that actually reads or writes data to occur after it has finished it's byte fetches, and the buses are free. The ULA releases the memory at cycle 8, followed by cycles 1, 2 and 3 of the next pass, so to ensure that T2 or T3 of a memory access execute no earlier than cycle 8, (once a ULA video bytefetch has begun) T1 must be delayed until cycle 7.
This is identical to the IOWait contention model identified earlier. By contending T1 until cycle 7, the ULA gives the Z80 4 consecutive cycles within which to perform a memory read or write. Should T1 however occur after cycle 8 (ie cycle 1 of the next pass or later), there will be at most 2 cycles to complete a memory access, which may not be enough so the ULA holds T1 until cycle 7.
Just as the ZX Spectrum designers contend IO in order for the read or write to occur when the ULA had released the bus, they have cleverly worked out how to use the same contention timing to avoid a Z80 memory access whilst the ULA is fetching bytes. And, as with the IO contention, they contend just the right cycle of a memory operation to achieve this. It may have been the other way around, but again demonstrates the ingenuity of the ZX Spectrum designers and their knowledge of the Z80!
In order for the Harlequin to contend cycle T1, a look at the relative timings is required. Here we're showing an instruction fetch operation:
The important points to note are that the address bus becomes active just after the start of T1, and MREQ is active shortly after the downward clock transition of T1.
In order to contend just T1, we have nothing on which to act but the address on the address bus. This is active almost immediately, and well before the downwards clock transition, so if it is in the range 0x4000 to 0x7FFF then we can hold the clock in a high state.
The clock hold signal (memWait) must be released at the start of T2. We can identify this by looking for the low transition of MREQ, which occurs halfway through T1, and delay it until the next upward clock transition which will signify the start of T2. This is almost exactly as we have done with the ULAIO signal in order to contend T2 of an IO operation. See here for IO contention details.
The Snow Effect
This is pretty much what the ULA must be doing, as there is very little else to use. Interestingly, cycles T3 and T4 will be incorrectly processed by this arrangement and never be contended, even if a refresh address between 0x4000 and 0x7FFF appears on the bus.
The reason for this is that MREQ is low at the start of both T3 and T4, and as MREQ being low at the start of a T-state cancels the contention, the waitReq signal is continually cancelled before it gets a chance to be reactivated. This is the cause of the famous "snow effect" ULA bug.
The Z80 address bus and the ULA address bus are separated by resistors, and the data bus is in a high impedance state, so it is not two addresses appearing on the bus that causes the ULA byte fetch to get corrupted, as the ULA never sees the Z80 address.
My suspicions for the cause of the "snow" is as follows:
The ULA is responsible for generating the CAS and RAS dynamic RAM control signals for itself and the Z80 when the lower 16K of memory is being accessed. It does this by watching the address bus and for MREQ going low. Because the CPU clock is not held during the refresh cycle, MREQ will be allowed to go low. If the ULA is performing byte fetches during this time, and the Z80 refresh address happens to be between 0x4000 and 0x7FFF, the ULA will get confused by MREQ and try to generated CAS and RAS signals for the Z80. This will prevent the ULA address being latched into the dynamic memory successfully, resulting in a read from a bad address and "snow". It cannot be the CPU address alone that causes the "snow" because it would be seen every time the lower 16K was accessed.
The Harlequin will not suffer from this problem as it currently uses static memory and there is no shared address multiplexer.