SpeA-FPGA FPGA design for SPE Phy layer 10 Mbit

1. Approach

This FPGA design is usable for SPE communication with 10 Mbit/s and Manchester-2-coding either for Ring Topology or also for Star or Bus (Drop) Topology. Especially the Ring Topology is supported. The Bus or Star Topology is able to use too. It is the physical layer.

The data are not stored in this physical layer. Instead the data are immediately read and written via SPI communication. The SPI communication should be work with 10 Mbit/s.

For Ring Topology the frequencies and the phase of the Bits of the Telegram are tuned, so that a minimal jitter is occurred. This is also necessary for a exact time synchronization of all stations in the ring. The time synchronization is done with the time of the Start Frame Delimiter (SFD) after sync bits on start of the data. This timing signal is outputed as frame_out.

For the Ring Topology the FPGA can act als Master or Slave.

The Master starts the communication with the frame_in signal. It requests data via SPI after them. The SPI should be initialized before frame_in comes.
The Slave should be initiated also with the frame_in signal. It reads two data words (32 bit) via SPI from the controller as command and sender identification. Then the FPGA waits for receive data. If a telegram is receiving, it is transmitting similar forward via the second SPE output. In that time the SPI is used to write the receiving data and also to read data which can replace data for the forward transmitted telegram. The replacement can be controlled for any byte of data by a configuration word.
The Master receives data via its second SPE pins. That is the answer in the closed Ring.

2. Meaning of data via SPI in the controller

The first data word (16 bit) is general a command word. It is not transmitted. This command word determines whether the FPGA is master or slave, if it is not set as master per hardware wiring (master input).

The second data word is transmitted. For the Slave in Ring topology the hi-byte is determined before receiving bytes are evaluated. Hence is is never echoed from receiving data.

All other words are user-free. The data content of the telegram is determined immediately by the RAM content in the controller.

For the Slave in Ring topology a configuration phase determines which bytes are gotten from the RAM of the controller, and which bytes are forwarded from received data.

If the FPGA works in Star or Bus Topology then it is always Master. The time of answer is determined by the controller, which initializes the SPI for the data transfer and outputs the frame_in signal to start the telegram.

2.1. cmd word

Bit 15: should be 1 if SPI provide this.
Bit 14: 1=Special meaning, configuration word (not yet ready)
Bit 13: 1=master
Bit 12: 1=continues carrier for Ring, 0=no tx bits in gap, for Bus topology
Bit 11: yet free
Bit 10..0: Length of the data (without CRC, without first 3 words) in 16-bit words. It means for Bit 10..0 == 0 three data words + CRC (4 bytes) are transmitted.

2.2. first data word

The first data word will be transmitted always from the master. For non-Ring Topology the station is always master.

Only for Ring Topology a Station (FPGA) can be used as slave. Then the following is valid:

Bit 15..8: Sender identification, it should be evaluated by the next station.
Bit 7..0: Data length for the next station to forward the data. This data length information is transmitted from the Master from RAM, and forwarded from all slaves. It is necessary to determine the data length independent of the slave programming, because in the initial phase a or on fault a slave controller may not work.

3. Moduls

3.1. All moduls

todo Graphic

3.2. RxSpe preparing the receive data input from SPE

Receives data, offers the data to both modules #SpiA and #TxSpe

3.2.1. Inputs and Outputs

Input Pin:

=>source: main/vhdl/modules/common/speA/RxSpe_SpeA.vhd[tag=InOutPins]
  --=============================== input from pin ==================================
  rxDin : IN STD_LOGIC;  -- The data pin port (difference input on FPGA)
  --=============================== outputs of pin:
  -- nothing

The module has the difference input on FPGA for the both SPE signals immediately as module input.

Inputs from other module:

=>source: main/vhdl/modules/common/speA/RxSpe_SpeA.vhd[tag=Inputs]
  --=============================== inputs of module:
  cmdSpi_Inp: IN BIT_VECTOR(15 downto 0); --command word

Only the command word is used as input information. There the Slave state (Bit 13) and the expected length of the telegram (Bits 9..0) are used.

Outputs to other module:

=>source: main/vhdl/modules/common/speA/RxSpe_SpeA.vhd[tag=InOutputs]
  --=============================== output to for other module ======================
  dataBit_Out:   OUT BIT; --one data bit either to send in ring or write in the controller via Spi
  dataCE_Out:    OUT BIT; --clock enable to validate the dataBit_Out
  carrier_Out:   OUT BIT; --1 = there are coming in bits. 0= no receive signal, gap.
  dataState_Out: OUT BIT; --1 if the dataState_Out pattern 5d is detected, data bits.
  bytemark_Out:  OUT BIT; --marks the end of a byte
  rxNeg_Out:     OUT BIT; --shows negated inputs

dataBit_Out is the received data bit after Manchester decoding. It is immediately forwarded to the TxSpe transmit data input to SPE module if the data are forwarded for Ring Topology. It is used also for the SpiA - interface to the controller module to store the data. Shifting is done in SpiA. The data are valid with the following dataCE_Out.
dataCE_Out is the clock of the received data. It is used used to synchronize the Tx Clock of this module if it is the slave in Ring Topology. Then the central CE0 clock can be used for all. For the Master in Ring this clock is not synchronized to the central CE clock. It means time delays to process the received data needs the originally 10 ns 100 MHz system clock.
carrier_Out: This signal indicates, a valid data stream is receiving on the SPE input. It is 0 if the SPE input stream is invalid. An valid stream has edges at least in the 100 ns bit width, with possible inaccuracies because of disturbances on the line. But no edges in a time of 150 ns and too much disturbances are indicated as carrier_Out =0, it means no valid data. Note that a non driven line can force non deterministic data inputs, because the difference Input for SPE signals are sensitive. After a disturbance which forces carrier_Out =0 at least some correct Sync Pattern bits should be come to indicate carrier_Out =1.

Note that in Ring communication a non interrupted data flow is transmitted. It means carrier_Out =1 is valid for the whole time of communication. This signal cannot be used as end detection of a telegram. See next.
dataState_Out: This signal indicates valid received telegram data.
- It comes dataState_Out =1 after detection of the SFD (Start Frame Delimiter) as end of the synchronization phase, it is synchronized. It needs at least 32 Sync Bits.
  
  It goes (dataState_Out =0) either
- if a gap is detected (no data) in the received SPE data stream, also indicated with carrier_Out =0. This is used for Bus Topology especially for Multi Drop Communication
- or if the given number of Data words in the cmdSpi_Inp is expired. This is used for transmitting without gap in Ring Topology. To switch off this functionality for Bus topology the number of expected data words should be set to 0x3FE. This is longer as the longest Telegram in Ethernet specification (0x3Fe = 1022 data words = 2044 Bytes).
Hence this signal is deterministic for the data phase.
bytemark_Out: This signal is only used as test output.
rxNeg_Out: This signal is only used as test output, especially to show the negated data state on a Led. The SPE wires can generally connected also swapped. It means a transmitted Sync signal ends not with ..0101011 but with ..010100. The SFD is not a 11 pattern but a repeating of the same bit value. Hence the swapped wiring is admissible. It is detect and regarded in the RxSpe module itself. It means the dataBit_Out comes with the correct value. But this state of swapped wiring may be interesting for the wiring itself, hence it is shown.

3.2.2. Detection of Bit edges and Databits

This is done in main/vhdl/modules/common/speA/RxSpe_SpeA.vhd by the

=>source: main/vhdl/modules/common/speA/RxSpe_SpeA.vhd[tag=detectInputBit_Prc_start]
-- == This process detects the bits received from SPE.
--
detectInputBit_Prc: PROCESS (clk100)
  VARIABLE d: QRx_REC;
  VARIABLE rxDpin: BIT;
  VARIABLE rxDchg: BIT; --Impulse for 1 clk100 if rxD changes its state.
BEGIN IF (rising_edge(clk100)) THEN
  d := qRx;
  IF ( rxDin= '1') THEN rxDpin := '1'; ELSE rxDpin := '0'; END IF; --convert only from STD_LOGIC to BIT
  d.rxD0 := rxDpin;           --clocked input unconditional
  IF(rxDpin = qRx.rxD0) THEN  --clock input only if at least two times the same.
    d.rxD1 := qRx.rxD0;       --Note: Use FF output to clock, use rxDpin only for Clock Enable
  END IF;
  d.rxD2 := qRx.rxD1;         --shifted ckock input for edge detection
  rxDchg := qRx.rxD1 XOR qRx.rxD2;  --edge detection for change on input

Firstly all processes use a RECORD Type to store the values of this process in states respectively Flipflops (FF) in the real hardware. This record type is here QRx_REC, the Flipflop-Instance (Output values of the FF) is qRx. The variable d is internally used to prepare the state for the next clock edge for this process. This is an obvious and proven style in VHDL. It means the default value for d is the given FF state, line d := qRx;.

The next IF ( rxDin= '1') …. line converts only the STD_LOGIC to BIT, which is sufficient and sometimes better to handle.

The qRx.rxD0 is always the clocked input data pin. But: It is possible that the FPGA input pin has very short spikes. There are gathered by the clock and elongated to 10 ns bit time. This is not necessary to process.
Hence in the next line IF(rxDpin = qRx.rxD0) an equate is built with the immediately input pin and the stored pin of the clock edge before. Only if both are equal, then the stored pin before is valid. It means a short spike is suppressed already here. It does not come to qRx.rxD1.
The equate block rxDpin = qRx.rxD0 is an asynchronous logical combination which is used to clock enable of only one FF, the qRx.rxD1.
If this signal is hazarding but in the moment of the clock edge detected as 1, it is nevertheless valid and used. It should be use only for exact one input. All other is a bad design.
Hence for the D-input the stable signal of the qRx.rxD0 is used. No spike differences can occur. That is the first important spike filter.

RxSpeA Din Clk

This is a simulation result (Simulink) with a hard disturb signal.

Track 1 is the input with disturbance as anlaog value.
Track 2 shows the situation after the digital input. Blue is the 10 ns clock. The signal itself seems to be unusable. All analog signal changes around here 1.65 V (in the simulation there is no difference input) produces binary signal changes.
Track 3 is the digital input after the first Flipflop, clocked with the blue 10 ns clock. The result in disturbance phases are accidently.
Track 4 is the CE building with the comparison RxD0 and Input.
Track 5 is the binary signal after the correct RxD1. It may be usable, because comparison of the input with the clocked input on any clock edge.
Track 6 is the binary signal after the faulty RxD1_bad, which uses the not clocked input on D. The problem is the different delays, here simulated with 1.3 ns and 2.3 ns. The simulation resolution here for this simple model is 0.1 ns.

The associated model is shown below:

RxSpeA Din Clk mdl

Furthermore in this VHDL code snippet (not in the SImulink), a second qRx.rxD2 is used to store the same signal as in Rx.rxD1 but one 10 ns clock later. With both FF the edge of the input is detected.

3.2.3. Detection of Carrier, Sync pattern and Start Frame Delimiter

This is done in main/vhdl/modules/common/speA/RxSpe_SpeA.vhd by the

=>source: main/vhdl/modules/common/speA/RxSpe_SpeA.vhd[tag=detectInputBit_Prc_carrier]
    IF ( rxDchg='1') THEN           --========== rxDchg indicates an edge on RxD input
      IF (qRx.syncBitStartTime='1' ) THEN   --rxDchg rxD changes the state, during less time, it is an error.
        d.expectBit := '0';         -- start newly with the next edge,
        d.syncBitStartTime := '1';
        d.ctBitTime :="10000";      -- but only after 3/4 of a bit time.
      ELSIF ( qRx.expectBit = '0') THEN   --rxD changes the state outside startTime, as first bit edge:
        d.syncBitStartTime := '1';  --the next edge of a valid bit should be change after the start time.
        d.ctBitTime :="10000";      --counts bit length
        d.expectBit := '1';         --to detect the next valid edge as second.
      ELSIF ( qRx.expectBit = '1') THEN  --rxD changes the state outside startTime, as second bit edge:
        d.ctBitTime :="00000";     --then the next bit begins
        d.carrier := '1';          --the second valid edge is carrier <=1, telegram starts with dataState_Out bits.
        d.expectBit := '0';        --from here: normal bit edge detection
      END IF;
    ELSIF ( (qRx.expectBit OR qRx.syncBitStartTime) = '1' ) THEN   -- after edge, during stable bit
      d.ctBitTime := qRx.ctBitTime +1; -- increment till "11101"
      IF qRx.ctBitTime  = "10110" THEN -- if bittime = 7 then the next edge is a valid bit
        d.syncBitStartTime := '0';     -- the next edge is not an error
      ELSIF( qRx.ctBitTime >="11101") THEN ------------ --ct till 1101 = D = 140 ns
        d.expectBit :='0';         -- reaching 140 ns after edge, it is not a valid start bit.
      END IF;
    END IF;

This is the detectInputBit_Prc continued. Note that this document contains not all details, refer to the original source. Only the contexts are explained. The source contains maybe sufficient comments.

But some test results should be shown. The test results are done with the Simulation project i src\test\Lattice_pj\Test_RxClkSync_SpeA.

The following image shows the functionality:

RxSpeA DinA wave

The both input edges left are not accepted by the first input filter, see chapter above. The simulation has only input stimuli for 10 ns step. But it is sufficient also to show the effect of the small glitch suppression.
Right side of the blue cursor the inputs are accepted by the glitch filter, hence visible on qRx.rxD1/2. After the edge detection the qRx.syncBitStartTime and also the qRx.expectBit is set with the next clock. Now it expects a next edge after expiring of the start time.
But unfortunately this is also a bad input signal. It changes inside the minimal expected duration. After such edges it is checked that the minimum bit width is observed. This is done by clearing expectBit and retrigger of syncBitStartTime of all edges inside this minimal bit time. It means at least the 70 ns value without edges should be occurring.
This minimal bit time is observed on the red cursor. Hence the next edge switches to qRx.syncBitStartTime and qRx.expectBit in expecting of a correct bit.
But this bit is too long. The signal for qRx.expectBit stops on reaching 1D of the internal ctBitTime. It is 140 ns. It means a valid bit should be in range 70 ..140 ns, nominal value is 100 ns. This is the first bit of a sync pattern. All other is not accepted.

In this case the elongation is caused also of the flickering on the end of the bit.

The continuation of this simulation run shows a second too long data bit. Later it starts with the following sequence:

RxSpeA Din760 wave

This is also a flickering input, but accepted

The left edges are ignored by the glitch filter.
Then a valid bit with 100 ns length comes. The first bit switches only to expectBit.
The next bit right side of the red cursor is a valid second bit. After one correct bit width the next edge forces a qRx.carrier, though this bit (and all following) are not tested in width. But this is the task of the synchronization detection, see next chapter.
The one glitch in this next bit is filtered on input.
The next bit right side on the blue cursor is interesting. It has a longer glitch which is not filtered by the input. But this glitch is accepted, because in the carrier state inside the minimal time of a bit (70 ns) signal changes are accepted. This is a difference: Before carrier such signals are seen as error. But if the carrier is one time detected, such disturbances are accepted. This disturbance can occur during transmission. It should not unnecessary abort the transmission.

This example is short before short. The disturbance can especially occur in the start phase of a bit. Here it is shortly before end of the minimal time.

The disturbances on the bits may be present in the real world, especially as oscillation on the line after the necessary edges. The stronger observation before carrier is detected forces no faulty carrier detection on accidently inputs. The more relaxed observation while carrier state prevents unnecessary aborting of the communication on disturbed signals. Last not least the CRC data check safes the data consistent.

3.2.4. Detect the bit value

In continuation of the detectInputBit_Prc the time of the bit is observed:

=>source: main/vhdl/modules/common/speA/RxSpe_SpeA.vhd[tag=detectInputBit_Prc_Bitvalue1/2]
  ELSE -- carrier ='1'--======================= --inside a telegram
    d.ctBitTime := qRx.ctBitTime +1;              -- increment countBitTime till max "01100", not wrapping arround!
    IF (qRx.ctBitTime(3 downto 1) = "111"  ) THEN --ctBitTime reaches 01110, = 14, then too long, end of telegram.
      d.carrier := '0';
      d.dataState := '0';
      d.expectBit := '0';
      d.ce := '1';               -- it produces one rxCE to output afterwards one time to detect dataState =0
      d.ctBitTime :="10000";
    ELSIF (qRx.ctBitTime(2 downto 0) = "110" ) THEN  --ctBitTime reaches 00110, next = 7, then new edge is valid as bit edge.
      d.expectBit := '1';
      d.ce := '1';               -- this is the CE for validation data. It is 1 exact from 70 to 80 ns after edge detection.
    END IF;                      --Note: This ce is essential for clock synhronization. Should be come evenly, only with jitter of data edge

Inside a bit the ctBitTime is incremented.
If no edge was detected and the bit is too long (reaches 15 in this cycle, value before is 14 = 1110), then the bit is 140 ns long, it is faulty. It is a gap or a longer bit as "end of data frame" Hence both the dataState and carrier is set to 0. A last ce signal is nec6essary to process this end state.
if the bitTime of the last cycle is == 6 (this cycle counts to 7), it is the time 70 ns after a bit edge. This is the time of a stable bit. Hence the ce signal for the bit is set for 1 clock (10 ns) to process the bit value. The bit value itself was stored by a first edge while expectedBit=1. This is also the time to expect a next valid bit edge. It is indicated by expectBit := '1'.

    -- Note: a short bit edge time is not detected as error to abort, it is accepted, but count.
    IF ( rxDchg='1' ) THEN         --rxDchg rxD changes the state, after 2 clocks
      IF (qRx.expectBit='0' ) THEN      --==== rxDin changes before the 3/4 bit width, > 7 clock periods:
        d.ctError := qRx.ctError +1;    -- one edge is expectable, it is the small bit width for repeated same bit values.
        IF (qRx.ctError(2) = '1') THEN  -- two edges restores the old value, should be accepted as possible disturbance.
          d.carrier := '0';             -- but more then 3 edges are an error. It ends the carrier detection.
        END IF;
      ELSE                              --==== rxDin changes after the 3/4 bit width, > 7 clock periods:
        d.ctBitTime :="00000";          --then the next bit begins
        d.expectBit := '0';             --Note: expectBit has another meaning in carrier as outside. 0 = before 3/4 bit time.
        d.rxDs := qRx.rxD1;
        d.rxD := qRx.rxD1  XOR qRx.negBit; -- databit with edge, the rxD1 is the state of the new bit value.

In continuation an edge is checked. It it occurs

If a edge comes, then the essential state is: expectBit = 0 of = 1. The last one is a bit edge after 70 ns, it means a new bit.
If expectBit = 0 then this is either a necessary bit edge to offer twice the same bit on the active bit edge (on expectBit = 1 later), or it is an disturbance.
If it is an disturbance, a second edge to the original level should come also in this time. The disturbance is accepted, but counted. Too much disturbances switches off the carrier.
If expectBit = 1, then this edge is gathered as bit. The count starts from 0.
This bit value itself is saved from the current bit value after the edge.
Additional the information of 'negate' is considerate. The qRx.negBit was set on detection of the Start frame delimiter. It is 1 if the wires of the SPE line are turned around. This is admissible. The negBit state is also active in the sync phase of the second telegram if negbit is one time detected on a telegram before. It is set on each SFD bit, of course usual with the same value. Only if the wires are connected by reconnecting the the hardware wiring, this bit may change.

3.2.5. Start frame delimiter detection

The start frame delimiter detection is essential for the frame_out signal also for time synchronization. This is done in the continuation of detectInputBit_Prc:

=>source: main/vhdl/modules/common/speA/RxSpe_SpeA.vhd[tag=detectInputBit_Prc_SFD]
        IF ((qRx.rxDs XOR qRx.rxD1) ='1') THEN --Note: compare the old with the new databit, if  is changed
          IF(qRx.ctSyncBits(6)='0') THEN      --then count up the syncBits counter to detect sync pattern
            d.ctSyncBits := qRx.ctSyncBits +1;
          ELSIF ( qRx.dataState ='1') THEN    --ctSyncBits is >=32
            d.dataState := '0';               --then sync pattern in data state is detected, removes dataState
          END IF;
        ELSE                                  --same old as new bit, set ctSync to 0 in next state, but test of SFD:
          IF( d.dataState ='0' AND (qRx.ctSyncBits(5) = '1' OR qRx.ctSyncBits(6) = '1')) THEN
            d.sfd := '1';               --it is the start frame delimiter.
            d.negBit := NOT qRx.rxD1;         -- if 0101011 is received, no negate, 1010100 then negate.
          END IF;
          d.ctSyncBits := "0000000";
        END IF;
        --
        IF(qRx.sfd = '1') THEN
          d.dataState := '1';
          d.sfd := '0';
        ELSIF (rxCE.ctData(10) = '1') THEN
          d.dataState := '0';
        END IF;
        --

This is the detectInputBit_Prc continued, after detecting a new bit value after edge.

The new bit value is compared with the older one. The new value is contained in qRx.rxD1, the value after the edge. The old value is yet contained in qRx.rxDs. This is before the clock edge of this process. Hence it is the old value.
If the bit toggles, then independent of dataState the ctSyncBits is incremented. A value >=32 indicates that four sync bytes are received. On a currently active data phase this is two 16 bit values 0xAAAA or 0x5555 one after another. This should not occurring. But this may be the pattern after a telegram with missing data end detection. Hence the data phase ends here.
The ctSyncBits stops at 1000000 (32), to prevent overflow on more sync bits.
If the bit does not toggle, two same values, and the dataState is not reachend, it means it is the sync-preamble of a telegram, or the time between telegrams, the it is the *Start Frame Delimiter* SFD.
The next IF(qRx.sfd = '1') is independent of the bit value, it starts the data state.

3.3. TxSpe transmit data input to SPE

Transmits data, take the data either from #SpiA and #RxSpe

Inputs:

=>source: main/vhdl/modules/common/speA/TxSpe_SpeA.vhd[tag=Inputs]
  --=============== inputs of module:
  txReq_Inp: in BIT;     --control transmission of bits as master, 0 as slave
  carrierRx_Inp: IN BIT; --echo and replace transmission of bits as slave, 0 as master
  dataStateRx_Inp: IN BIT; --1 then the SFD start frame delimiter will be send.
  rxDataBit_Inp: in BIT;    --one data bit to transmit as forward from Rx module. It comes with the RxDataCE clock, sync ~ to CE4
  rxDataCE_Inp: in BIT;    --one data bit to transmit as forward from Rx module. It comes with the RxDataCE clock, sync ~ to CE4
  txDataBitSpi_Inp: IN BIT; --one data bit to transmit read from Spi module or from the data1 register
  cmdSpi_Inp: IN BIT_VECTOR(15 downto 0); --command word

The transmission starts

either with txReq_Inp connected with txMaster in Main_SpeA.vhd
or with carrierRx_Inp or dataStateRx_Inp connected with txSlave in Main_SpeA.vhd.

The first case is for the master of communication. The txReq_Inp comes from the frame_Inp Signal. The second case is, if a slave FPGA receives data and should be forwarded it.

See code snippet in the Main_SpeA:

=>source: main/vhdl/modules/common/speA/TxSpe_SpeA.vhd[tag=txMasterSlave]
-- handle the frame_in signal from pin and build txMaster, txSlave: 
FrameInpQ_Prc: PROCESS(clk100, shCE(0)) BEGIN IF(shCE(0)='1' AND rising_edge(clk100)) THEN
  frameInpQ <= NOT frame_Inp;   -- negate (it is lo active from controller) and sync it with the clk100 edge as CE0 time.
  IF(cmdMasterBit ='1' OR master_Pin ='1') THEN  -- if master_Pin is pulled to hi, anytime master.
    setMaster <= '1';
	ELSE
    setMaster <= '0';
  END IF;	
END IF; END PROCESS;

--txMaster and txSlave are used as inputs of the TxSpe module txReq_Inp and carrierRx_Inp
txMaster <= '1' WHEN frameInpQ='1' AND setMaster='1' ELSE '0';
txSlave <= '1' WHEN carrier='1' AND setMaster='0' ELSE '0';

dataStateRx_Inp: This comes from RxSpe preparing the receive data input from SPE and indicates, that the rx module receives data after the Sync Phase and before and of data. If the station is slave, then the data on txDataBitRx_Inp should be immediately forwarded with less delay, but also firstly the transmission of the SDF (start frame delimiter) should be transmitted as end of synchronization phase.
txDataBitRx_Inp and txDataBitSpi_Inp: This are two possibilities of data bits.
- Either the data come from the rx module, to forward received data in slave. Then it is important to forward immediately with less delay from rx to tx. Hence only the current bit is inputed.
- Or the data come from the SPI module read from RAM of the controller. In this case the bit received by SPI should be shifted in the SPI order (MS bit first) to the Ethernet order (LS bit first) in the SPI module. Hence reading SPI should be done a little moment before to read one byte, see SpiA - interface to the controller.

Outputs:

=>source: main/vhdl/modules/common/speA/TxSpe_SpeA.vhd[tag=Outputs]
  --=============== output to control the other module:
  txAct_Out: OUT BIT;   -- 1 if tx is active, and after active phase till gap-time end
  reqDataTx_Out: OUT BIT;

Pin-Outputs:

=>source: main/vhdl/modules/common/speA/TxSpe_SpeA.vhd[tag=OutputPins]
  --=============== Output pins to outside.
  SpeTxDlo, SpeTxDhi : OUT STD_LOGIC;
	SpeTxTristate: OUT BIT

SpeTxDlo/hi are the both data bits to the wire. They are sometime in tristate, if the direction is set to input on this pins for Ring topology or if the transmitter does not send, the gap between telegrams.
SpeTxTristate forces outside using the tristate mode of this pins.

See code snippet to connect this pins in Main_SpeA.vhd:

=>source: main/vhdl/modules/common/speA/TxSpe_SpeA.vhd[tag=txPins]
--Output to tx pins, tristate if not selected or tristate from pin of Tx module. That works.
--Note: Immediately forwarding a 'Z' on the SpeTxDlo/hi does not work on FPGA-Routing Lattice Diamond.
--rx1En is the selection which is the ty port of both rx/tx In/Outputs
SpeTxDlo1_Pin <= 'Z' WHEN (rx1En='1' OR SpeTxTristate ='1') ELSE SpeTxDlo; 
SpeTxDhi1_Pin <= 'Z' WHEN (rx1En='1' OR SpeTxTristate ='1') ELSE SpeTxDhi; 
SpeTxDlo2_Pin <= 'Z' WHEN (rx1En='0' OR SpeTxTristate ='1') ELSE SpeTxDlo; 
SpeTxDhi2_Pin <= 'Z' WHEN (rx1En='0' OR SpeTxTristate ='1') ELSE SpeTxDhi;

3.4. Synchronization of the CE clock to the Rx data in slave

This module is active in any Slave for the Ring topology. It determines the global used CE vector for clock enabling. For the master the same module is existing to produce the CE vector, but the rxSync_Inp is always 0.

In/Outputs:

=>source: main/vhdl/modules/common/speA/RxClkSync_SpeA.vhd[tag=Inout]
ENTITY RxClkSync_SpeA IS
PORT (
  --general clock signals:
	clr, clk100 : IN STD_LOGIC;
	--inputs
	rxSync_Inp: IN BIT;        -- 1=The sync pattern is receiving, Clk Sync should only occuring during sync pattern
	rxDataClk_Inp: IN BIT;     -- 1 active for 1 clk100-cycle, as CE for received data.
	                           --   This CE will be synchronized with CE4 to have >=50 ns before clk100 on CE0 possible delay time.
	--outputs
	CE_Out: OUT bit_vector(9 downto 0)    -- CE for the 100 ns bit width. It is sometimes 90 or 110 ns while correcture.
	                                      -- if corr occurrs, then either CE_Out(9) is missing
																				-- or there is a gap 10 ns between CE_out(9) and the follwoing CE_Out(0)
                                        -- do never use CE_Out(9) for a clock enabling.
);
END RxClkSync_SpeA;

clr, clk100: as any module this is the central clear (0-active) and the 100 MHz central clock.
The other signals are self-explained. For this module on master the rxSync_Inp is always 0. So no adjustment is done. The output clock is always even.

The ClkSync_Prc does the following.

Firstly measurement:

=>source: main/vhdl/modules/common/speA/RxClkSync_SpeA.vhd[tag=ClkSync_Prc_Meas]
    --Check abbreviation rxDataClk in relation to CE4, set shCorr value.
    --
    IF(rxSync_Inp='1' AND rxDataClk_Inp='1') THEN  --dataClk comes, check phase compared with CE
			IF( Q.CE(4)='1') THEN                      --hits exact:
				D.shCorr := Q.shCorr srl 1 ;             -- no correcture, decrease counter resp shifter for corr necessity to "0000"
			ELSIF( Q.CE(5)='1' OR Q.CE(3)='1') THEN    --10 ns error left or right: does not correcture immediately, it may be jittering
        D.shCorr := Q.shCorr(2 downto 0) & '1';  -- correct only if 3x the same.
				D.slower := Q.CE(5);                     -- slower if after CE4. CE4 should be elongated.
			ELSE                                        --else: dataClk is outside CE 5..4..3
			  IF(Q.shCorr /="0000") THEN               -- if the corr counter is not 0, then
				  D.shCorr := Q.shCorr srl 1;            -- wait 1..2 period, do only any 2. or 3. period.
				ELSE
          D.shCorr := "1010";                    --corr immediately (waiting period is expired), set for 2 periods to wait after corr
          IF Q.CE(9 downto 6)/="0000" THEN D.slower := '1'; ELSE D.slower := '0'; END IF;  -- slower if after CE4 to elongate CE
        END IF;
			END IF;
		ELSE        -- Outside of the rxDataClk or outside sync phase, then change nothing.
		  D := Q;   -- hold the value.
		END IF;
    --

If the rxDataCE_Inp clock is exactly synchronous to CE4, it is stable.

An abbreviation because of some disturbance on the line can occur. Only if at least 3 times one after another the rxDataClk_Inp is one 10 ns left or right, the only one necessary adjustment is done. Three times after another it is a significant signal that the clock should be adjusted. Only one or two times, and after them back to exact on CE4, this is dedicated as only temporary disturbance. This prevents unnecessary CE jittering.

If the clock is outside of the range CE 5..4..3 the adjustment is necessary. But also here the adjustment is not done hectically. Only each forth clock is adjusted. It helps that the mean value of the CE and hence all outputs have a period not outside of 97..103 ns. This helps to prevent jitter propagation.

Secondly actuating:

=>source: main/vhdl/modules/common/speA/RxClkSync_SpeA.vhd[tag=ClkSync_Prc_Act]
    --Executes the correcture on the proper edge of CE8, CE9, CE10
    --
		IF(Q.CE(8)='1' AND Q.slower='0' AND Q.shCorr(3)='1') THEN
			D.CE := "00000000001";              -- faster, switch from CE8->CE0
			D.shCorr := Q.shCorr AND "0111";    --after corr, remaining gap periods without corr.
		ELSIF(Q.CE(9)='1' AND Q.shCorr(3)='0') THEN -- no corr necessity, divide by 10,
			D.CE := "00000000001";              -- normal case CE9->CE0, (do not use CE10)
		ELSIF(Q.CE(10)='1') THEN              -- ELSE: it is slower='1' AND corr(3)='1') THEN
			D.CE := "00000000001";              -- slower , CE10->CE0
			D.shCorr := Q.shCorr AND "0111";    --after corr, remaining gap periods without corr.
		ELSE
			D.CE := Q.CE rol 1;           --CE0->CE1 etc. normal shifting
		END IF;

If the shCorr shift register is set on the 3th position, it is either three times recognized a 10 ns abbreviation, or the adjustment should be done immediately because of more abbreviation. On faster (Q.slower='0')Then after the CE8 the CE0 follows. The CE9 (or CE(9)) is omitted. It means this CE period has only 90 ns. The CE9 is never used in the design.

If CE9 is activ and no adjustment is necessary (the normal case), CE0 follows.

If an adjustment is necessary it goes to the last ELSIF where CE10 is checked. This is already the elongation of the CE period to 110 ns. It should follow CE0.

In the case of adjustment the bit 3 of shCorr is cleared. Together with the shift it creates not too much correcture.

See the image:

Test RxClkSync.awc

If the synchronization starts, the rxDataCE_Inp (yet rxDataClk_Inp) is per accident left of CE0. It means it should be corrected.
The shCorr is set to A = 1010 to adjust immediately, and then adjust one time more in the after next bit. This adjustment is done. You see the next CE_Out(0) on step more right (10 ns).
In the next 3 bit times no adjustment follows because the shCorr is not 0. This is the calming time to avoid hectic edges.
Then again shCorr=1010 (gray) with adjustment and immediately shift, repeated two times.

Then in the mid of the immage nearly the correct CE is reached. Now the fine adjustment comes:

The last adjustment is on the shCorr=B.
After them the rxDataCE_Inp is exact. It means by shifting right 3 1 0, no more adjusting occur.

3.5. SpiA - interface to the controller

The SpiA module processes the communication via SPI interface, especially bit shifting.

3.5.1. Inputs and Outputs

I/O for SPI:

=>source: main/vhdl/modules/common/speA/SpiA_SpeA.vhd[tag=InOutPins]
--=================== Signals to ports of the chip:
  spi_MiSo_Pin: IN BIT;
  spi_MoSi_Pin : OUT STD_LOGIC;
  spi_Clk_Pin : OUT STD_LOGIC := '0';

The spi_MoSi_Pin and spi_Clk_Pin are STD_LOGIC because it should be possible to set it in Tristate 'Z' for test approaches. The spi_MiSo_Pin is always an input, hence BIT is sufficient.

The meaning of this pins should be familiar, see known SPI documentation.

Inputs from other modules:

=>source: main/vhdl/modules/common/speA/SpiA_SpeA.vhd[tag=Inputs]
  --=================== input from other modules:
  spi_Cmd_Inp: IN BIT;  --comes from frame signal on input pin
  spi_reqTxData_Inp: IN BIT; -- comes from TxSpe_SpeA module, after SFD, this module should prepare and offer the txData
  spi_rxDataState_Inp: IN BIT;  -- 1 then sync is reached, all data should be gathered.
  spi_rxData_Inp: in BIT;  --received data from RxSpe_SpeA module
  spi_rxDataCE_Inp: in BIT;  --received data clock enable from RxSpe_SpeA module

spi_Cmd_Inp: This is the frame_in from the controller to trigger reading first two words.
spi_reqTxData_Inp: This comes from the <<+TxSpe>> module, output reqDataTx_Out.
spi_rxData…_Inp: This three signals comes with the receive clock. The spi_rxDataCE_Inp is the relevant clock enable. The spi_rxData_Inp is shifted in registers and used for the other logic which switches with the internal CE0-clock enable (100 ns cycle). It means for that only 10 ns (100 MHz central clock) is possible.
The spi_rxDataState_Inp comes in the exact time where data are received, also with the spi_rxDataCE_Inp as validation clock enable, see RxSpe preparing the receive data input from SPE. It controls writing to the RAM via SPI.

Outputs to other modules:

=>source: main/vhdl/modules/common/speA/SpiA_SpeA.vhd[tag=Outputs]
--=================== outputs to other modules:
  spi_cmd_Out: OUT BIT_VECTOR(15 downto 0);    --the command word from SPI after spiCmd
  --data1_Out: OUT BIT_VECTOR(15 downto 0);  --the f irst data word from SPI after spiCmd
  --data_Out: OUT BIT_VECTOR(15 downto 0);   --the current data word from SPI while tranmitting
  spi_txD_Out: OUT BIT;                    --the data bit to Tx via SPE

spi_cmd_Out: Via SPI a 16 bit command word is read on start of frame and presented here. The bits have special meaning in other modules, see cmd word.
Todo: more info read via SPI
spi_txD_Out: This are the data bit immediately just in time to transmit via SPE. See TxSpe transmit data input to SPE.

3.5.2. Variables of SpeRx_prc: SPE rx data processing

=>source: main/vhdl/modules/common/speA/SpiA_SpeA.vhd[tag=SpeRx_REC]
TYPE SpeRxVars IS RECORD
  ctBitsSpeRx: STD_LOGIC_VECTOR(3 downto 0);
  shDataSpeRx: BIT_VECTOR(7 downto 0);
  dataSpeRxHi: BIT_VECTOR(7 downto 0);
  dataSpeRxLo: BIT_VECTOR(7 downto 0);
END RECORD SpeRxVars;

SIGNAL qRx: SpeRxVars;

This is for the SpeRx_prc: PROCESS (clk100, CE0):

qRx.shDataSpeRx: immediately shift register for spi_rxData_Inp :

d.shDataSpeRx := spi_rxData_Inp & qRx.shDataSpeRx(7 downto 1);

qRx.dataSpeRx: It stores both bytes of shDataSpeRx if the bytes are filled. This register is used to transport the data to the shDataSpi in the given time.

3.5.3. Variables of Spi_prc: PROCESS: SPI read and write from and to controller

=>source: main/vhdl/modules/common/speA/SpiA_SpeA.vhd[tag=Spi_Variables]
--====== volatile working FF
TYPE Spi_Variables IS RECORD
  cmd: BIT_VECTOR(15 downto 0);
  data1: BIT_VECTOR(15 downto 0);
  state: BIT_VECTOR(5 downto 0);


  cmdQ: Bit;           --delayed spi_cmd_Inp to edge detection.
  ctBitsSpi: STD_LOGIC_VECTOR(3 downto 0);
  spiBit0: BIT;    -- 1 then the next ct underflows, 0 is reached.
  spiOn: BIT;      --1 then generates spiClk

  shDataSpi: BIT_VECTOR(15 downto 0);   --data read from controller
  testDataSpiRx: BIT_VECTOR(7 downto 0);   --data read from controller
  rxWord1: BIT;   --set on slave to save first lo word as length

  shDataSpeTx: BIT_VECTOR(7 downto 0);
  testDataSpeTx: BIT_VECTOR(7 downto 0);   --it is only for test, not used in the routed FPGA

END RECORD Spi_Variables;

SIGNAL qSpi: Spi_Variables;  -- stateMachine
ALIAS qSpistateIdle: BIT IS qSpi.state(0);  --expected either cmd or carrier
ALIAS qSpistateCmd: BIT IS qSpi.state(1);   --read cmd
ALIAS qSpistateData1: BIT IS qSpi.state(2); --read data1
ALIAS qSpistateData: BIT IS qSpi.state(3);  --read/write data
ALIAS qSpisubstatePrep: BIT IS qSpi.state(4);
ALIAS qSpisubstateMaster: BIT IS qSpi.state(5);

This is for the spi_Prc: PROCESS (clk100, CE0), see Process spi_Prc

qSpi.shDataSpi: shifted in bits from SPI read
qSpi.shDataSpeTx: shifted out bits for Tx

3.5.4. Process spi_Prc

This processes the read data from SPI: SPISOMI. Note that the FPGA is the SPI-Master, it offers the SPICLK. Hence this process determines the SPICLK output too. It is a concurrent process because it shold work twice in the 100 ns basic cycle: The SPICLK is produced on qSpi.spiOn and comes with the shown CE signals.

=>source: main/vhdl/modules/common/speA/SpiA_SpeA.vhd[tag=spiClk_Prc]
-- Output SPI clk with CE2..7
--
spiClkHi_Prc: PROCESS (clk100, CE2, CE7) begin IF ((CE2='1' OR CE7='1') AND rising_edge(clk100)) THEN
  IF( qSpi.spiOn ='1' AND CE2 ='1') THEN
    spi_Clk_Pin <= '1';        -- spiClk: positive leading edge after CE2: data request to controller,
  ELSE
    spi_Clk_Pin <= '0';        -- negative falling edge after CE7: controller latches the data.
  END IF;
END IF;  END PROCESS;

The Spi_Prc runs all with the CE0 clock enable signals. It means all Flipflop of the Spi_Variables have 100 ns possible delay to switch as timing constrain in the FPGA:

=>source: main/vhdl/modules/common/speA/SpiA_SpeA.vhd[tag=spi_Prc_Start]
--SPI read and write from and to controller, one shift register shDataSpi for both.
--
spi_Prc: PROCESS (clk100, CE0)
  VARIABLE d: Spi_Variables := qSpi;   --default values same as q, means hold, don't change.
  ALIAS dstateIdle: BIT IS d.state(0);  --expected either cmd or carrier
  ALIAS dstateCmd: BIT IS d.state(1);   --read cmd
  ALIAS dstateData1: BIT IS d.state(2); --read data1
  ALIAS dstateData: BIT IS d.state(3);  --read/write data
  ALIAS dsubstatePrep: BIT IS d.state(4);
  ALIAS dsubstateMaster: BIT IS d.state(5);
BEGIN IF (CE0='1' and clk100='1') AND clk100'event THEN
  d.cmdQ := spi_Cmd_Inp; -- OR spi_rxCarrier_Inp; --start condition for SPI activity, detect edge 0->1
  --
  IF( clr = '0'   --clr is lo active
    OR qSpi.state = "000000" ) THEN
    --=== Only first time on start with 0 or on clr
      dstateIdle := '1';
      d.data1 := x"FFFF";   --on non initialized slave, first data high byte is sender identification
      d.cmd := x"0000";     --default

This start block shows also the reset condition: Either the clr input is given or all states are zero, condition on hardware reset.

This process reads the data via SPI and shifts it into the qSpi.shDataSpi register:

IF(qSpi.spiOn ='1') THEN
  d.shDataSpi := qSpi.shDataSpi(14 downto 0) & spi_MiSo_Pin;
END IF;

It shifts the data bit to left to MSBit, first read bit lands in the MSbit after 16 shift operations. This is correct if the SPI interface on the controller is programmed to shift out first the MSB most significant bit for 16 bit access. This is supported by probably all controllers. The 16 bit access should be used to reduce effort on the controller (byte access needs double number of accesses). Shift out first MSB and not LSB was the originally intension from SPI. SPI is not standardized, it is only a "quasi standard". Hence this modus is used here as only one. More modifications seems to be unnecessary. See also chapter Endian approaches: Ethernet Big endian LSB first, SPI: word access, MSB first

To transmit this data it should be shifted in the requested order for the Ethernet bit order rules:

3.5.5. Sequence cmd and data between controller and FPGA/SPI

Test SpiCmd Awave

The sequence for the master starts with the frame_inp of the FPGA, here in the simulation tbM_frame, red line. This signal is compared with the qSpi.cmdQ, line below, to detect the leading edge. The trailing edge has no meaning. The leading edge of the frame_inp is Hi→Lo.

After this the SPI reads from the controller 2 * 16 bit, one for cmd, the next for the first data. It is seen on spi_Clk_Pin. The result is the set to

qSpi.cmd: It changes from 0000 to A003, and to
qSpi.data1, changed from BAD0 to 8143 (only as example data).

Both values remain in the register, also for the next cycle, because the same values are read in any cycle. That is typical for the application software.

To compare with the sequence on slave:

In the left telegram the slave has no frame_inp, the signal tbS_frame remains on hi. This is given in practice if the slave controller is not programmed yet. But for the second telegram in this image tbS_frame comes. Meanwhile (possible sometime later) the slave controller is synchronized in its interrupt to the SPE data stream, has initialized the SPI interface in the controller and produces the frame_inp signal typical via a PWM output (pulse width modulation). The time synchronization between the SPE data stream and the PWM module and the interrupt to program the SPI interface is done with the signal spi_rxDataState_Inp which is the same as dataState_Out from the RxSpeA module driving the pin frame_out to the controller. Independent of telegram content and SPI communication, the frame_out to the controller can be used for synchronization with a PLL control software (Phase Look Loop). This is a necessary condition for the Ring topology in a fast cycle, because the cycle of SPE communication and the interrupt cycle in the controller to initialize the SPI communication and read and write data should be synchronized. There is no "telegram stack" in comparison to an ordinary Ethernet adapter which does not support such an fast data exchange.

For the slave in the second (right) telegram, now the cmd and data1 are read via SPI. The cmd changes from 0000 (initial) to for example here 2003. It determines furthermore acting as slave. The data1 changes from BAD0 to for example here 4203. That is the sender address.

In the master FPGA the signal txReq_Inp comes in that moment, if the cmd bit 13 is set. This is a signal used in the TxSpeA module. For the next telegram (right side) this signal comes immediately with the frame_inp, this is the red tbM_frame. This is the normal sequence for the master: With frame_inp the transmission sequence starts. But: Because also with frame_inp firstly the FPGA should read the cmd and data1 via SPI, the signal spi_ReqTxData_Inp is delayed in the TxSpeA module, see TxSpe transmit data input to SPE. The TxSpeA module waits a definite time for reading the both data words for cmd and data1 via SPI.

In the moment where spi_ReqTxData_Inp comes the TxSpeA starts with transmission of the SFD start frame delimiter and the first data1 word. This is loaded before, it is a constant data word with usual the same content for all telegrams, it is the sender identification and the length of the datagram.

Now, after spi_ReqTxData_Inp the SPI start working with furthermore accesses to the controller for the following data words. But this access is fine tuned between the timing to transmit via SPE. the timing of receive in the RxSpeA module (should write via SPI) and the SPI timing. For that the short state substatePrep is set (number not visible, between 01 and 08 in the green line qSpi.state).

For that a code snippet from SpiA_SpeA.vhd:

=>source: main/vhdl/modules/common/speA/SpiA_SpeA.vhd[tag=spi_Prc_reqTxData]
    ELSIF( spi_reqTxData_Inp='1') THEN  --STATE 1->18  should deliver Spi_TxD to Tx_TxD
      dstateIdle := '0';              --Note: spi_rxDataState_Inp will behandled in
      dsubstateMaster := NOT spi_rxDataState_Inp;  --it is 0, slave if spi_rxDataState comes similar, not later.
      d.rxWord1 := spi_rxDataState_Inp; -- used to set length of telegram form first received word
      dstateData :='1';
      dsubstatePrep := '1';
      d.shDataSpeTx := qSpi.data1(15 downto 8);     -- load 1. data word to SpeTx
      d.testDataSpeTx := qSpi.data1(15 downto 8);   -- only for test, not used in FPGA
      d.shDataSpi := qSpi.data1;  --as first word write data1, hint: bit 7..0 used for shDataSpeTx later
      d.ctBitsSpi :="0111";       --time till start spiClk. Only 7 is admissible, correct time to reload

This time fine tuning depends on shift and bit approaches, see next chapter.

The signal spi_reqTxData_Inp (light green line) comes from the module TxSpe transmit data input to SPE

in the master either as time condition after the SFD output (start frame delimiter). On master the request from tx comes at last from the frame_inp signal, here tbm_frame.
or in the slave in the moment of receiving the SFD bit. This is seen in relation of the signal spi_rxDataState_Inp to the signal spi_reqTxData_Inp on the slave. On slave the request for tx comes from the received signal.

Both conditions indicate the need for new data. Hence the SPI communication starts seen with the spi_Clk_Pin after the correct time. For the first telegram of the slave it hits an uninitialized SPI interface on the controller. It means the controller may not react for the spi_Clk_Pin and the spi_SoMi pin delivers a 0 signal. But this is not used in that state. In the second telegram of the slave all is prepared, as well as on the master.

On the master the signal spi_rxDataState_Inp comes a while after starting trasmission via SPE, here seen on the SPI clock or on the qTx.state. This is the echo in the Ring after all station passing. It is important to consider this relations too: The storing of received data is executed via SPI too, with the same clock. For the slaves that relations are proper: Storing received data and reading the next data to send is anyway hard synchronized, because the received data determines the SPE acitivity and the send data follows this.

But on the master the send of data is primary for the SPI activity. The received data comes any time later. The received data should be aligned to the SPI bit shifting correctly. Then the data are stored not in the first prepared RAM locations, it is not possible, but in later locations. One 16 bit word needs 1.6 µs. Delay in the ring between two station is, seen here on the SPI clk time between master and slave, approcimately 0.3 µs. It means if 10 station are in the ring, about 3 µs delay for the echo on the master. It means also that the SPI clock should work for at least 2 or 3 data words more as necessary for transmission data, to store the received data, on the master.

3.5.6. Shifting data to Tx, timing

The SpiA module prepares the data reading via SPI completely down to the bit to transmit, offering on spi_txD_Out pin of this module. Shifting the data to transmit and provide the data bit in the correct time (clock cycle) is task of this SPiA module.

The module TxSpe transmit data input to SPE offers the start signal for transmission: spi_reqTxData_Inp as input on SpiA, output on TxSpeA: reqDataTx_Out. This signal determines starting the SPI access.

The second task for SpiA is shifting and storing the data from the module RxSpe preparing the receive data input from SPE to write to the controllers RAM. The data comes to the spi_rxData_Inp input of the SpiA from the RxSpe module:

On a Slave for Ring Topology this data are in the same correct timing sequence , because the receiving data determines the timing and hence the reading of transmission data. Note that generally there is only one SPI access to read and write data, of course from and to different addresses in RAM, but with the same SPI clock signal using the SPIMOSI (master out slave in) to write and similar SPISOMI (slave out master in) to read.
But for the master in Ring topology receives this data as echo through the whole Ring, sometimes later. The delay is …3 µs for 10 Stations in the Ring. Additional there can be a jitter.
- The first approach for that is: The data should be stored with 16 bit to synchronize data writing with the correct bit, so that receiving data are correct aligned in the RAM.
- For that more as 16 bit are not stored. It means on greater delays in the Ring the first written data of SPI (first 1..3 words) are not come from receiving. They are not valid. Hence first 0xffff are written instead received data, after them the real received data.
- To recognize this in the controller’s software the first data can be distinguished from the really first received data word. This is done by the value 0xFFFF for this first words.

Test SpiShDataTxMastr

This first image as test Test_SpiShDataTxMastr.awc result shows a master in the Ring topology and an uninitialized slave. The communication in the ring, forwarding data, should be also proper if a slave station is not working in the moment.

The sequence for the master starts with the frame_inp of the FPGA, here in the simulation tbM_frame, red line. This signal forces immediately reading of two data words, the command word and the first data word. See also chapter above: Sequence cmd and data between controller and FPGA/SPI.
After a determined time to read the first data words, the signal spi_reqTxData_Inp (light green) comes from the TxSpe module to the SpiA module. Immediately with this signal the module TxSpe transmit data input to SPE starts output the Start Frame Delimiter SFD as end of the sync pattern and afterwards the first data word, which is already stored.
The qSpi.state changes from 01 = idle to 38 = master + dataPrep one CE-clock later. The state machine switches with the CE0 clock enable (100 ns cycle). This state waits exact 8 CE cylces = 8 bit times, then switches qSpi.state to 28 = master data.
In this state 28 = master data the SPI clock comes and reads and writes data from and to the controller, see signal spiMastr.spi_Clk_Pin.
For the master this first data to write are 0xffff because nothing is received till now. It is seen on spiMastr.spi_MoSi-Pin.
But for Tx the next data are read, see spiMastr.spi_MiSo pin. This is the second data word. The first data word was reading in the first SPI_CLK phase into 'data1' already and it is shifted out via SPE.
This next data are shifted in to spiMastr.qSpi.shDataSpi. In the image you see only a gray bit stream, it is working. But in the next track in spiMastr.qSoi.testDataSpeTx you see a snap shot after 8 shift operations. The data to transmit comes always from the high byte from SPI. Hence you see 81 03 from the first data word, and then 07 0F 1F etc. which are all read from the RAM gotten via spiMastr.spiMiSo_Pin. You can follow the serial signal on this pin and the shifted result after any 8 bit. Note that SPI reads the MSB bit 15first.

+ See code snippet:

=>source: main/vhdl/modules/common/speA/SpiA_SpeA.vhd[tag=spi_Prc_DataBitMid]
    -- === Handling of data bits ================================================================================
    -- Handling is only necessary in the mid and on end of a spi word.
    IF ( qSpi.ctBitsSpi = "1000") THEN             --==== in the mid of ctBitsSpi, proper to 8 bit shifted out with shDataSpeTx
      IF ( qSpistateData ='1' ) THEN            --==== stateTxData
        d.shDataSpeTx := d.shDataSpi(7 downto 0);
        d.testDataSpeTx := d.shDataSpi(7 downto 0);
        d.shDataSpi(15 downto 8) := qRx.dataSpeRxLo;
        d.testDataSpiRx := qRx.dataSpeRxLo;
        IF(qSpi.rxWord1 = '1') THEN
          d.cmd(7 downto 0) := qRx.dataSpeRxLo;  --determines the telegram length from received data word, it is received from master in Ring.
          d.rxWord1 := '0';
        END IF;
      END IF;

The testDataSpeTx are only for visualization, there are removed in the FPGA design. but at the same time also the shDataSpeTx are loaded with that value. And this are the data to offer to the module TxSpe transmit data input to SPE to transmit. But you see exactly this data serial in txMastr.databit. This is the data bit immediately before Manchester-2 coding, which is outputed. Note that on SPE the Ethernet standard is used which means output LSB first, but with big endian.

3.5.7. Handling of received data in the Ring-master for SPI-shifting

The first image shows handling received data in the master of Ring topology. This is appropriate to the chapter above for getting transmission data with the same SPI access. It is the same simulation running, only showing other signals.

Test SpiShDataRxMastr

The SPI sequence is controlled by the spiMastr.spe_reqTxData_Inp because the transmission data should be read just in time. This is valid also for Master and for slave of Ring topology.

Hence this sequence starts as in chapter above with spiMastr.spe_reqTxData_Inp, green line.
The signals from the receiver module RxSpe preparing the receive data input from SPE spiMastr.spi_rxData* (next tracks) comes asynchronous with the receiver clock: rxDataCE. This receiver clock can have any phase position in comparison with the CE0 clock enable. That is true for the master. In the slave it is synchronized.
The receiving signals are shifted with the receiver CE in an extra PROCESS SpeRx_Prc. Hence this Flipflop (FF) group has a possible timing contrain as the Module RxSpe, with 70 ns between received data and this shifting FF. You see the shifted results in spiMastr.dataSpeRxHi/Lo in 16 bit but separated to two register.
The spiMastr.qSpi.testDataSpiRx are that data which are loaded in the SPI shift register. That timing is determined by the transmission data and clock enabled with the transmission CE0 signal. See the following code snippet:

=>source: main/vhdl/modules/common/speA/SpiA_SpeA.vhd[tag=spi_Prc_DataBitMid]
    -- === Handling of data bits ================================================================================
    -- Handling is only necessary in the mid and on end of a spi word.
    IF ( qSpi.ctBitsSpi = "1000") THEN             --==== in the mid of ctBitsSpi, proper to 8 bit shifted out with shDataSpeTx
      IF ( qSpistateData ='1' ) THEN            --==== stateTxData
        d.shDataSpeTx := d.shDataSpi(7 downto 0);
        d.testDataSpeTx := d.shDataSpi(7 downto 0);
        d.shDataSpi(15 downto 8) := qRx.dataSpeRxLo;
        d.testDataSpiRx := qRx.dataSpeRxLo;
        IF(qSpi.rxWord1 = '1') THEN
          d.cmd(7 downto 0) := qRx.dataSpeRxLo;  --determines the telegram length from received data word, it is received from master in Ring.
          d.rxWord1 := '0';
        END IF;
      END IF;

This is the same code snippet shown in the chapter above for tx. The line
```
d.shDataSpi(15 downto 8) := qRx.dataSpeRxLo;
```
is the essential for receive data. The assignment form the one FF group qRx to the FF group of this PROCESS has a maximal timing delay of 10 ns, the system clock of the FPGA, because both clock enable are not related. In the image you see the testDataSpiRx set in the same time.
The other byte is handled by:

=>source: main/vhdl/modules/common/speA/SpiA_SpeA.vhd[tag=spi_Prc_DataBitHi]
      -- ==== stateData: finish or load next data
      ELSIF ( qSpistateData ='1' ) THEN           --===== STATE 8:
        IF ( spi_reqTxData_Inp='0' AND spi_rxDataState_Inp ='0') THEN
          dstateIdle := '1';
          d.spiOn := '0';                        -- finish
        ELSE                                     -- get next dataSpeTx from spiRd
          d.shDataSpeTx := d.shDataSpi(7 downto 0);
          d.testDataSpeTx := d.shDataSpi(7 downto 0);
          d.shDataSpi(15 downto 8) := qRx.dataSpeRxHi;
          d.testDataSpiRx := qRx.dataSpeRxHi;
        END IF;

It is a part of handling after a 16 bit border. The statemens are similar as handling on the middle of the shifted word above, only the other half word of qRx.dataSpeRxHi is used. The dispersion between high and lo word also for Tx comes from the rule that Ethernet shifts out first the LSBit and SPI shifts first the MSbit, but both from the high byte (big endian).

The moment where the shifted data from Rx (orange) are used for shifting in SPI depends on the time delay between start transmitting and the moment of receiving. It is undefined and can jitter. Hence the receiving data are stored with 16 bit to get it in any moment.

3.5.8. Handling of received data in the Ring-slave for SPI-shifting

The algorithm are the same, it is not distinguished between master and slave. But the conditions are different.

For the slave in Ring topology the first data byte is already received via SPE and can written via SPI. For that see on the

…TODO more simulation results available, also with different stimuli using the StimuliSelection tool.

4. Functionality over all modules

4.1. Real scope of data signals Master and Slave

TxRxDatabits Scope

D15: slave-a-A46: frame_out
D14: Infineon calctime
D7: Mastr-q-A28: test1_Out spi_TxD 11111
D4: Mastr-J-A22: rxDataState_Out
D8: Mastr-r:B22: test2_Out Tx_TxD = tx.txD_Test (output before manchester-coding)
D10: Slave-t:A27: dataRx_Out : rxSpe*_dataBit_Out = Spi.spi_rxData_Inp
D11: Slave-r:B22: test2_Out Tx_TxD = tx.txD_Test (output before manchester-coding)
D9: Slave-N:A24: SpeRxD2_out_T: SpeRxD2_Pin = exSpe2.rxDin Input after Diff-Pin
D1: Mastr-N:A24: SpeRxD2_out_T: SpeRxD2_Pin = exSpe2.rxDin Input after Diff-Pin
D5: Mastr-t-A27: dataRx_Out = rxDataBit = rx*.dataBit_Out  spi.spi_RxData_Inp = TxSpe.rxDataBit_Inp
D12: Slave-e-A38: SPI CLK
D13: Slave-c-B33: SPI_MOSI
D6: Mastr-p RxCE
D3: Mastr-L-A23: SPI CLK
D2: Mastr-c-B33: SPI MOSI
D0: Mastr-b-B35 frame_in ( from TI)

The following data are transmitted:

Master: 8103 070f 1f2f 3f4f 5f...
Slave:  FF03 070f 1f2f 3f4f 5f ...

The left cursor is on D10, Slave dataRx_Out. It shows 11110100 = 2F lSB first. It is in slave, you see above and below the same pattern.
The right cursor is on D2, Master SPI data. It shows 00101111 = 2F, MSB first. It is the adequate Byte received on master.

This is the adequate simulation result, src/test/Lattice_pj/Test_All_SpeA/Test_DataBitScope.awc:

TxRxDatabits Sim

Hint: the simulation shlould be executed with 0 delay rxMd0 in Select Stimuli.

4.2. Time stamp and Ring Cycle Synchronization

4.2.1. Standard time stamp approaches

For Ethernet with >= 100Mbit/s there is a NTP "Network Time Protocol" (https://en.wikipedia.org/wiki/Network_Time_Protocol) or better PTP (Precision Time Protocol) (https://en.wikipedia.org/wiki/Precision_Time_Protocol) can be used, accordingly to the TSN (Time Sensitive Network) rules (https://en.wikipedia.org/wiki/Time-Sensitive_Networking).

But this technologies works only for the connections >= 100 Mbit/s, because they are not regarded in the original Ethernet 10 Mbit/s topology. The problem of the originaly 10 Mbit/s Ethernet was: The time of transfer of a telegram depends on the status of the line. For the line a Bus Topology is used. It means all station attacks the line. It is deterministic for transmission in a time range in milliseconds, regarding displacement mechanisms, but not for exact time stamps. Due to the presence of the star topology with its continue data stream, which can work with TSN and PTP, a proper time synchronisation for the 10 Mbit/s where never developed.

Now with SPE also the Bus Topology is in focus, with the so named "Multidrop Technology", and that does not support the time synchronization yet and per default.

But, the Ring Topology offers another approach for Time Synchronization:

Prior, not the time stamp should be offered to all stations as accurate as possible, but:
All stations should work synchronous.

It means the stations in the ring have the same time, but they don’t know (need not know) the exact absolute time. They work together, out of a global time.

It is possible to insert this stations in a global time (UTC, master clock etc.), but this is task only of the master in the Ring Topology. The master can do so by using one of the possible approaches: Standard Ethernet with TSN and PTP or NTP, or specific variants of SPE (with higher bit rates and Star Topology), which also supports TSN, NTP, PTP. Then the master can forwarding this absolute time in the Ring to all other stations with the Ring Cycle Synchronization, described following.

4.2.2. Ring Cycle Synchronization - approach

The approach of the Ring Cycle Synchronization is:

All Stations in the Ring should work synchronous. It means

they have an synchronous interrupt cycle (or a hardware interrupt of the processor),
initialize the SPI for data communication with SPE with this FPGA approach,
expect an receive telegram to a proper time, necessary after the SPI initialization (in Slave mode) and send it forward to the next station.
The master in ring transmits its telegram to an expected time.
A PWM (Pulse Width Modulation) often use for output of data, for example Firing pulses for power valves or also simple outputs should work exact synchronous in this cycle to refresh the current data in each cycle.
The average time for Measurements should exact follow this cycle. For Measurements always the middle value (average, integral) of the pre-period should be used as characteristic of input signals, not a accidently time for a fast measurement. This is necessary to suppress technical noise which is usual synchronous to the actuator period (from a PWM), or also, to suppress accident disturbance with a higher frequency as the measurement one.

This is all necessary for example in control of electrical signals.

The cycle for this can be for example 50 µs, as also a little bit longer if possible, … 1 ms.

The goal is: Do that with a jitter less as possible and/or necessary.

In the Ring cycle the time stamp existing in the master can be forwarded to all slaves if they need an absolute time for processing. But often only the master may need this.

4.2.3. Solution frame_in/out and PLL

The master should transmit the telegram starting with the frame_in hardware signal. This signal can/should be produced in the controller by a PWM output. The PWM with the time of the master determines the time of transmission.

On frame_in on master first the preamble with sync bits are transmitted. After a determined number of sync bits, hence a determined time, the SFD Start Frame Delimiter is sent, after them the data.

The first slave in the Ring receives firstly the sync bits. During the sync bits the Slave tunes its central CE clock enable signal due to the sync bit edges (the dataCE_Out of the module RxSpe preparing the receive data input from SPE. This is done in the module Synchronization of the CE clock to the Rx data in slave. It means on receiving the SFD the clocks are synchronized already.

With the SFD the signal frame_out is produces as output of the FPGA to use as input to the controller. This signal has a minimal jitter doe to the clock sync capability (measured 30 ns) and a known delay. With the frame_out as input to the controller either an interrupt to process the data can be start. Or better, an internal PWM can be controlled by a PLL (Phase Look Loop) algorithm, see next chapter. In both case the interrupt starts synchronous (using PLL before receiving the SFD, force an interrupt immediately after the SFD). This interrupt can put and get data to and from the RAM where data for SPE are written and read (via SPI). It means the data exchange are also related to the ring cycle. Always new received data are processed and always new processed data are transmitted. Of course this data timing should be tuned to the telegram timing.

The next slave stations in the Ring receives the SFD in a time which is exact related to the receive moment in the Slave before, because the transmission is done with the central CE clock, which is synchronized to the dataCE_Out of the receiver module with measured 30 ns jitter.

But because the jitter may be added, the next station has a jitter with double spread, but with a Gaussian normal distribution, because the jitter between one station may be seen as accidently. Due to the real clock frequencies, beating can also occur. All in all the jitter is one step greater to the next stations. On 10 stations 300 ns jitter is expectable. But this is not too much.

The jitter can be reduced between each cycles even by the PLL control.

4.2.4. PLL control of the interrupt cycle

The solution of PLL control is individual for different controller. The principle is:

Measurement or capture of the time where the step_in is occurring, as relative time (value of a wrapping counter).
Comparison of this time information with the time information from the same source where the interrupt is forced. This is very simple if the interrupt is forced by the same timer reaching the zero-value as the capture of step_in is done.
Adjusting of the timer for the interrupt due to the measured difference. Whereby not the exact measured differenc is used for the new value. This is a simple P-part-controller. It is better to have a real PI controller (Proportional and Intergral Part) which is changed by the time difference as input. The output can be used as reload value for the timer, which generates the interrupt.

It means while PLL controlling the interrupt cycle is a little bit lesser or greater, not constant with the same reload value. A constant cycle has usual not the same time as in the master cycle because of quartz frequency tolerances. It would run away by accumulation of time. Hence the time should be adjusted sometimes a little bit.

The different length of an interrupt cycle is a jitter between the Ring cycle (telegram cycle) and the internal timer and interrupt. The range of this jitter depends on the resolution of the timer. If the timer both for interrupt and capture runs with 10 MHz only (100 ns), then the jitter is of course > 100 ns. Because of some small inaccuracies the jitter may be in range of 300 ns. This is added to the telegram jitter (from 30 ns in the first station to 300 ns in the 10th station). A sum of 600 ns jitter for a 50 µs interrupt cycle is not too far. But for analog measurement and also for the PWM output this can be on a limit.

Another possibility for synchronization of the controller with the Ring cycle is: Using a 10 MHz or 5 MHz output signal caused by the internal central CE clock as clock input for the processor. Processors often has a PLL for the internal clock to generate a higher clock frequency from a Quartz with lower frequency. If the FPGA output is used as clock input for the controller, the controller has not an own abbreviating Quartz frequency, and the PLL is not necessary. Measurement the time of frame_in should be used only for the first synchronization and also for checks.

For this solution it should be known that the 10 MHz or 5 MHz clock output can have abbreviations from the contant frequency because of the clock synchronization Synchronization of the CE clock to the Rx data in slave in the FPGA itself. The CE period switches between 9..10 internal clock periods (90..100 ns) or between 10..11 (100 .. 110 ns). This unrest should be processed by the controller’s PLL.

Hence the best solution is: The FPGA clock itself does not come from the Quartz, it does come from a PLL in the FPGA. Then each FPGA can be synchronous to the master’s quartz frequency. And also, the controller in the master is forced by the FPGA quartz or vice versa (maybe better). Hence only one quartz frequency exists in the whole ring. All is synchronous in one ring. Another related ring can be used also the same controller quartz if it is mastered by the same controller. This is the best solution of exact cycle synchronization.

Because of the cycle synchronization so far as possible all stations have the same time, independent whether they use (and knows) the time stamp itself or not.

4.3. Data exchange between the application in the controller and the FPGA via SPI

The data exchange is the important interface between FPGA-PHY and the controller.

4.3.1. Timing with step_in and SPI data for the SPE-Master

The trigger for the data exchange on master is the frame-in Signal on FPGA. This is produces per hardware by for example a PWM output, see Chapter Solution frame_in/out and PLL.

The FPGA is the master for the SPI communication. It means, the FPGA as SPE-master starts the SPI data exchange immediately after the frame_in. To resolve this SPI transfer the controller have to be initialized for SPI-data transfer before the frame_in signal comes. For the timing for a cyclic interrupt it means, firstly the initialization of SPI-transfer should be done. The moment to do depends on the programm stepping (machine codes) and should not jitter too much. The frame_in signal should be produces by a PWM hardware (puls widht modulation) which should be part of a controller. Then the moment of output the falling edge of frame_in can be tuned by the compare value of the PWM. It can be tuned in software, but the value should be fixed in run time.

Now after frame_in leading falling edge, the SPI should output the data. It depends of the controller how does it work. Tip: Use 16 bit per data word. Often either a FIFO (fist in first out) register is part of the SPI interface in the controller, or DMA (direct memory access) can be used, or both in combination.

For FIFO the data which should be read from the FPGA should be written into the FIFO register structure before the frame_in edge comes.

For DMA the DMA controll registers should be initialized for this cycle before the frame_in edge comes.

For both, both should be initialized.

The fast image should show the timing

------ccciiiiiiccccccdddccccccccccDDDccccccccccccccccc--------------- ohne cycle interrupt
++++++++++++++++++|______________________|++++++++++++++++++++++++++ The PWM output for frame_in

The ccc should present any statements of the cyclically interrupt
iii are the initializing statements for SPI, programming FIFO and/or DMA.
ddd are data which are calculated on the fly for transmitting, possible, see next
DDD are data which are used on the fly from the SPE communicaiton, see next.
|___| should present the hardware signal for frame_in, be carefully with the moment of edge.

4.3.2. Timing with step_in and SPI data for the SPE-Slave

On the slave the timing should be similar. But not the step_in is the trigger, the data comes with receiving a telegram. Exactly therfore SPE-master and SPE-slave should be synchronized in the cycle time, see chapter Ring Cycle Synchronization - approach. Then the receive telegram does not come unexpecting, it comes in the expected time range.

Before the receiving is expected, also (as in SPE-master) the SPI interface of the controller should be initialized. All other is adequate.

4.3.3. Data on the fly

As shown in the graphic it is possible to write data for DMA transfer a little bit shortly before they are read from SPI to transmit. The same is with receiving: Using immediately after receive.

But this requires that the timing both from machine code execution in interrupt and the position of data in the SPE telegram are known, stable and tuned. For some simple controlling applications this can be done (need constant calculation time, less jitter). Hence the dead time in the controlling cycle can be reduced. But this is a possibility which can be used in response to the application. You should also regard, that FIFO data should be written (and hence known) before the telegram starts, only DMA data can use this possibility.

In all cases it is not possible to handle the SPI data in software. For that the data rate is too fast (10 Mbit/s, 1.6 µs for 16 bit).

For receiving it is similar. Using a FIFO the first data are written in the FIFO, but some times the FIFO size is too small, and the FIFO is entlees via DMA to the RAM. Then this just in time data can be used. For just in time performance either no FIFO capability should be used or with less size. The DMA should be prior and fast enough for 1.6 µs per access (16 bit SPI access, 10 Mbit/s).

4.3.4. Meaning of the data words

The first data word (16 bit) is used as command as described in …TODO. The second data word is the sender address. All other data words are user specific, payload in the telegram.

The user should have a data struct (C-language, also in C++), which contains the whole data for SPI beginning with the command and sender word as int16, following by the user data.

4.3.5. Received data, meaning

The received data are written via DMA to the RAM. The first two words contains the own command and sender, then received data with the sender address follows. Between both they may be fill words with content 0000 on the SPE-master if the response time in the ring is a little bit greater.

4.4. Endian approaches: Ethernet Big endian LSB first, SPI: word access, MSB first

But the output on SPE is defined adequate to the Ethernet standard with first output the LSB, but from the high byte of the 16 bit access (big endian approach).

The big endian approach should be correct understand: If a memory word (16 bit) is accessed, the processor reads the memory word correct. There is no endian approach for reading whole words from memory, endian approach is only existing for serializing of a word. It means the 16 bit access for a 16 bit word is always correct independent of endian thinking.

But for a 32 bit word, which is accessed by two 16 bit reading from SPI, first the bits 31..16 should be accessed from memory to transmit exactly first bit 24..31 and then bit 16..23 via Ethernet SPE. Following with access bits 15..0 via SPI to transmit bits 8..15 and then 0..7 via SPE. To do so, it is necessary to reverse the content in the memory. This is similar the known hton and ntoh operations "host to net" and "net to host".

But, for approaches of fast data transfer often only 16 bit are used. The order of word remains independent of big or little endian accesses. And, if both partner accepts a 32-bit or longer word without word swapping, it is also okay. The swapping is only necessary if the information are evaluated in an unknown or common way, then the big endian approach is valid and should be recommended. This is an effort in software, not in the PHY layer of this FPGA.

Examples to explain the big endian approach:

 //data example:
 int16 data = 0xabcd;
 int16 datb = 0x1234;
 int32 dat32 = 0xfe0a5678;

Memory content:

  cd ab 34 12 78 56 0a fe      content in memory as byte width for little endian approach
  abcd  1234  5678  fe0a       content in memory 16 bit width for little endian approach
  abcd  1234  fe0a  5678       changed content to transmit for big endian approach
  ab cd 12 34 12 fe 0a 56 78   Transmit order, transmits LSB of each byte first.