I've had difficulty explaining the results of the Shock MegaDemo 3 test. All my calculations, careful timing and reasoning says that the Harlequin timings should be exactly the same as a 48K ZX Spectrum. It is very puzzling.
I've been going back to basics and trying different things for over a week now, trying to make sense of the problem.
Previously I've spent considerable time focusing on the floating bus behaviour, as that is so often exploited by Spectrum software. The one area I've not concentrated on is the effect and timing of the memory contention, as tests indicated that all was right in this area. Perhaps a closer look is in order.
The C.S.S FAQ on memory contention describes a 6 cycle delay beginning at T-state 14335. These figures do not match the Harlequin design, which should theoretically look like:
This table assumes that the Z80 responds to the interrupt immediately, which cannot be the case. If however the Z80 takes 3 cycles to acknowledge the interrupt, then we would loose 3 T-states between the interrupt generation and the first contended cycle, making the above table start at 14333, and the first contended cycle at 14335 - exactly as the FAQ and test programs report. This is pure speculation, but as the interrupt is honoured at the end of the current instruction, probable.
25 Sep 2007: Whether the timings reflect a start at 14335 or 14336 depends on whether test programs start their timing from the issue of the interrupt, or it's receipt.
Note: The very different timings shown by the Harlequin are due to it's memory contention start cycle being incorrect, and that there is one contention model not two. Even though it appears to give the correct results, it does not accurately mirror the ZX Spectrum. See "Memory Contention Revision" for further details.
In order to analyse the memory contention displayed by the Harlequin and compare that to a ZX Spectrum I modified Jan Bobrowski's ULA Test 3 to show both IO and Memory contention on the same screen, along with the floating bus read.
I also modified the display of the IO contention to be 2 T-states earlier than would normally be reported by test programs, so that it shows me the contention experienced at IO Instruction Cycle T2, where the Harlequin is actually applying contention, and not where the IO instruction finally performs a read or write (which is usually what test programs show). The modified version may be downloaded here.
The following screen shot needs some explaining. It is a grid of T-states for which each T-state may show three things.
- The value on the floating bus if read at that T-state
- Whether that T-state is IO contended (Blue)
- Whether that T-state is Memory contended (Red)
This result is very revealing!
If you count the sequence of IO Contention (Blue or Black) cells in a row starting at 14336 and ending at 14341, we see that there are six of them, followed by two non-IO contended (non Blue) T-states at 14342 and 14343. Then the 6,2 sequence then starts again.
Do the same for the Memory Contention (Red or Black) cells starting at 14338 and ending at 14344, we see that we have seven of them, followed by one non-memory contended T-state at 14345.
There should not be seven contended memory T-states. This is wrong. Checking the Harlequin schematic confirms that there are indeed two uncontended T-states produced by IC U6, so either they are not being applied correctly to the clock, or they are not being detected correctly by the Z80.
Why is the Z80 clock being held high for seven T-States? I rigged up the oscilloscope to find out.
Measuring memWait against the clock shows that it is active for six cycles, as expected.
I tried measuring the clock wait signal whilst MREQ was active, but it was difficult to pin down a stable trace, so I turned to the Z80 data sheet (Memory Read And Write timing diagram), and was immediately stuck by something.
It appears that during a memory read/write MREQ goes low half a cycle earlier than IOREQ doing an equivalent IO instruction, and stays active for two complete cycles. IOREQ on the other hand, stays active for two and a half cycles.
|Memory Access||IO Access|
- MREQ goes low halfway through T1 at the fall edge of the clock, and stays low for four half cycles (2 clock cycles).
- IOREQ goes low at the start of T2 at the rising edge of the clock, and stays low for five half cycles (2.5 clock cycles).
We've already seen that the IO period being a partial number of clock cycles causes problems when generating the WAIT signal, and that we had to delay it by half a cycle so that the entire of T2, Tw and T3 were held high during the WAIT. Basically, the onset of the IOWait or MemWait requests should be aligned with the point IOREQ or MREQ become active within an instruction if the clock phase is to be maintained, and a whole number of cycles held.
Note: This paragraph is no longer true. See "Improving Stability" for a more accurate analysis of MemWait alignment.
IOWait and MemWait go low with the downwards edge of the clock. MREQ also goes low with the downwards edge of the clock, IOREQ on the other hand does not. It goes low with the upwards edge of the clock. By delaying IOWait by half a cycle, we brought it in line with IOREQ, and maintained an even number of cycles held during a wait. However the current implementation delays both IOWait and MemWait by half a cycle.
This miss-aligns MemWait and MREQ, and because a wait period of IOWait or MemWait is chosen based on MREQ, we would see the first half cycle of wait being requested from IOWait instead of MemWait - increasing the wait request length by half a cycle.
The diagram above illustrates the problem. WAIT is lengthened by half a cycle, the first coming from IOWait, the rest from MemWait half a cycle after MREQ goes low. This extended WAIT produces the clock seen in CPU CLK, which is held high for seven cycles.
If we do not delay MemWait then it will be chosen precisely as MREQ goes low, and for the correct number of cycles. The modification of delaying only IOWait by half a cycle so that it is synchronised with the active edge of IOREQ, is shown in the waitGen schematic version 1.14.
Note: This modification works, but causes stability problems. See "Improving Stability" for a more accurate analysis.
After making this simple modification, ULATest3 was again run giving the following output:
Notice that there are now six contented memory T-states (blue or black) and six contended IO T-states (red or black).
There is one final test. Shock MegaDemo 3.....
A perfect test! It would be interesting to see what Floatspy reports the floating bus timings to be.
Floatspy returns the first floating bus value returned at T-state 14347, and the contended floating bus timings are now:
These are exactly as reported by a real ZX Spectrum! Good news indeed!
Now I need to proceed with the long awaited sound and Flash attribute mode....