Woah! It's all got a bit mathematical! Hopefully it makes a bit of sense. I've been reading it over and it looks okay I think. However, I've realised that there's a bit of a chicken-and-egg situation at the very start of the very first screen bytes.... The comp.sys.sinclair FAQ section memory contention gives the observed timing for this first read.
The problem is this: We need to read the screen and attribute bytes just before we're ready to shift them onto the screen. For the very first screen byte this means reading from memory whilst we're still outputting the border. Okay, fine. This means that we need to issue all reads and relevent control signals 8 pixel clock cycles earlier. This is possible in theory, but complicated by the fact that all our key counter points have been calculated to be in multiples of 16 (i.e. lower 4 HC0 - HC3 zero) to make the decoding simpler. We won't be able to make this simplification if we need to adjust our timing to be earlier by 8 clock cycles.
I'd been pondering this problem for ages until I realised that the one signal we won't want to generate early is the signal that tells video output circuit that we're outputting display pixels. My flash of inspiration was just to delay this signal until we've read our first screen bytes of the row, so the output circuit continues to output the border until we're ready with the pixel data.
The net effect is that we don't have to do anything early, at the expense of the left hand border being a tiny bit wider (probably a pixel or less) - not that you'd notice anyway.
Our control timing diagram will need a bit of adjustment as we want to read the first screen and attribute byte as soon as possible. Our initial timing diagram had the reads taking place at the very end of one 8 cycle period ready for the next. This is no longer suitable. We can shift the read over to the left so that it happens sooner and then we only have to slightly delay the pixel enable signal (when HC8 is low).
Next Up! Schematic implementing the horizontal counter and signal generation......