The CPU and the MMU share access to the RAM bus: 2 cycles each every 4
cycles.

The wake up state is randomly determined at cold reset, with a state being
more probable than another, depending on temperature too.

It affects the order of cycles sharing for each chunk of 4 clock cycles.

The reference is cycle 0 at cold reset.

The more practical reference is the start of scanline in the 'shifter trick 
reckoning', the latter meaning the cycles that are targeted to obtain certain
effects. EG: cycle 376 for right border removal. The important thing is of 
course that wake up state has an influence on those tricks.

We call the wake up states '1' and '2'.
To reduce confusion (if possible), we will try to define wake up states so
that the demo 'Forest' by Paolo gives us the same numbers as those we use.

Problem: later Paolo identified more wake-up states.


Different concepts
====================

ijor first identified 2 wake-up states in which the ST can be at power-up.

State 1 (cold)
+-----+
| CPU |
+-----+
| MMU | 
+-----+


State 2 (warm)
+-----+
| MMU |
+-----+
| CPU |
+-----+

A simplistic schema like this explains why the timings of CPU instructions
generally need to be rounded up to 4. The CPU only gets 2 cycles of bus access
every 4 cycles.

He created a test program, wakeup.tos, that checks if top border removal at 
timing 496/504 (WU1) or 504/512 (WU2) works.

This doesn't test writing on a cycle multiple of four or not. This tests when
the HBLANK starts?

LJBK (Paolo), based on multiple tests on a STF, identified 4 wake-up states (WS
 to avoid confusion with WU), and found the timings for shifter tricks to work
 in those 4 states.

Dio, based on actual traces, explained those wake-up states by a latency
between the DE signal and the LOAD signal.


[Dio:


There's two data buses in the ST, the CPU bus and the DRAM / Shifter bus. 
The two are bridged by a bus gateway - a buffer and a latch, controlled by 
RDAT, WDAT and LATCH on the MMU.


The DRAM bus is segmented by MMU into two phases each of two 8MHz cycles,
 one of which is a CPU phase and the other of which is a video / refresh
 (and sound on the STE) phase. If the CPU tries to access DRAM or the Shifter
 on a video phase, DTACK is withheld for two clock cycles to insert a pair of
 wait states and align the access onto the CPU phase. But anything only on 
the CPU bus (including Glue and the MMU itself, at least in the STFM) can 
be accessed on any phase.

On both STE and STFM writes to the resolution register are always delayed 
until a video phase, since they need to be transferred through the bus gateway 
onto the DRAM/Shifter bus which is only available on video phases.


On the STFM writes to the sync register go only to the Glue, and can happen 
on either CPU phases or video phases. Similarly, reads from screencurrent go
 only to MMU and can happen on either phase.

On the STE reads from screencurrent can only happen on the video phases. 
I observed this when I was writing the instruction timing tester. It's easy
 to write a program that detects this behaviour.]


Troed:
The known detectable wakestates are the result of GLUE being offset 0-3 cycles
 compared to the CPU - which fits with theory (unsynchronized initialisation 
at same clock) as well as observation (DE-to-LOAD, visible pixel position on 
screen) and empirical testing of when changes to FREQ and RES have to be made 
for GLUE to detect them.
Default RES values are for WS3/4. 
In WS1 all RES state checks happen 2 cycles earlier, in WS2 2 cycles later. 
Default FREQ values are for WS1/3, in WS2/4 all FREQ state checks happen 2 
cycles later.


  DL Latency       WU             WS       RES    FREQ      
     (Dio)        (ijor)        (LJBK)
-----------------------------------------------------------
        3           2 (warm)       2        +2     +2  
        4           2              4               +2
        5           1 (cold)       3  
        6           1              1        -2


SS:
So even if the GLUE is on the CPU bus, wake-up state applies to both mode and
sync registers, both in the GLUE.
But where does the DE-to-LOAD latency come from? DE is sent by GLUE, LOAD
is sent by MMU, and what's been observed is the latency between both, 
corresponding to WS1-4.

Is this compatible with the first WU theory (CPU/MMU cycles)?
When the MMU raises LOAD, we're in a MMU pair of cycles, it fetched the video
 data, put it on the bus and tells the Shifter to take it now. 
All of this must happen within 2 cycles.
Question: does the MMU write on the address bus to fetch video?

Certain:
DL 4 -> DE MMU, LOAD MMU  (WU2,WS4)
DL 6 -> DE CPU, LOAD MMU  (WU1,WS1)

We suppose:
DL 3 -> DE MMU, LOAD MMU (WU1,WS3)
DL 5 -> DE CPU, LOAD MMU (WU2,WS2)

According to the traces, only LOAD varies, DE is always the same.
The question is, at which emulator cycle?
If we say 56, that would mean that cycle 56 is a CPU cycle in WU1, MMU in WU2.
This aspect would agree with the first wake-up theory.


Timing adjustment (for Steem):

RES:  5-DL
FREQ: 6-DL


Shifter tricks
================


Sync 0-byte line
------------------

In Steem in STF mode, 'Forest' will return state 1 or 2 according to the
following rule:

In wake up state 1, changing frequency to 60Hz at cycle 56 then back to
50Hz at cycle 64 will confuse the GLUE and produce a 0 byte line (no
memory fetching, a black line is drawn).

In wake up state 2, changing frequency to 60Hz at cycle 58 then back to
50Hz at cycle 68 will confuse the GLUE and produce a 0 byte line (no
memory fetching, a black line is drawn).

State (STF)            1                      2
Switch to 60          56                     58                      
Switch to 50          64                     68
CPU cycles         56-57                  58-59
Shifter cycles     58-59                  56-57

We derive the schemas that show how every 4 cycles are shared.
They start at cycle 0 and all multiples of 4 after that.


State 1
+-----+
| CPU |
+-----+
| MMU | 
+-----+


State 2
+-----+
| MMU |
+-----+
| CPU |
+-----+


Right border off
------------------

The 'magic cycle' to remove the right border is 376.

In wake-up state 2, a write on cycle 374 will not work.
But in wake-up state 1, it will be delayed by the MMU and hit cycle 376.

-> Nostalgia menu


State 1
+-----+
| CPU |   372
+-----+
| MMU |   374 -> delayed to 376, right border trick will work
+-----+


State 2
+-----+
| MMU |   372
+-----+
| CPU |   374 -> taken, right border trick missed
+-----+


Update:
But if writes to the Sync register aren't delayed, how do we explain this?