This example is directly derived from John Cooley’s design contest at SNUG’95 (Synopsys Users Group meeting). The contest was intended to oppose VHDL and Verilog designers on the same design problem. What John had in mind was probably to determine what language was the most efficient. The results were that 8 out of the 9 Verilog designers managed to complete the design contest yet none of the 5 VHDL designers could. Hopefully, using the proposed method, we will do a much better job.

Our goal is to design in plain synthesizable VHDL (entity and architecture) a synchronous up-by-3, down-by-5, loadable, modulus 512 counter, with carry output, borrow output and parity output. The counter is a 9 bits unsigned counter so it ranges between 0 and 511. The interface specification of the counter is given in the following table:

Name | Bit-width | Direction | Description |
---|---|---|---|

CLOCK | 1 | Input | Master clock; the counter is synchronized on the rising edge of CLOCK |

DI | 9 | Input | Data input bus; the counter is loaded with DI when UP and DOWN are both low |

UP | 1 | Input | Up-by-3 count command; when UP is high and DOWN is low the counter increments by 3, wrapping around its maximum value (511) |

DOWN | 1 | Input | Down-by-5 count command; when DOWN is high and UP is low the counter decrements by 5, wrapping around its minimum value (0) |

CO | 1 | Output | Carry out signal; high only when counting up beyond the maximum value (511) and thus wrapping around |

BO | 1 | Output | Borrow out signal; high only when counting down below the minimum value (0) and thus wrapping around |

DO | 9 | Output | Output bus; the current value of the counter; when UP and DOWN are both high the counter retains its value |

PO | 1 | Output | Parity out signal; high when the current value of the counter contains an even number of 1’s |

When counting up beyond its maximum value or when counting down below its minimum value the counter wraps around:

Counter current value | UP DOWN | Counter next value | Next CO | Next BO | Next PO |
---|---|---|---|---|---|

x | 00 | DI | 0 | 0 | parity(DI) |

x | 11 | x | 0 | 0 | parity(x) |

0 ≤ x ≤ 508 | 10 | x+3 | 0 | 0 | parity(x+3) |

509 | 10 | 0 | 1 | 0 | 1 |

510 | 10 | 1 | 1 | 0 | 0 |

511 | 10 | 2 | 1 | 0 | 0 |

5 ≤ x ≤ 511 | 01 | x-5 | 0 | 0 | parity(x−5) |

4 | 01 | 511 | 0 | 1 | 0 |

3 | 01 | 510 | 0 | 1 | 1 |

2 | 01 | 509 | 0 | 1 | 1 |

1 | 01 | 508 | 0 | 1 | 0 |

0 | 01 | 507 | 0 | 1 | 1 |

Based on these specifications we can start designing a block diagram. Let us first represent the interface:

Our circuit has 4 inputs (including the clock) and 4 outputs. The next step consists in deciding how many registers and combinatorial blocks we will use and what their roles will be. For this simple example we will dedicate one combinatorial block to the computation of the next value of the counter, the carry out and the borrow out. Another combinatorial block will be used to compute the next value of the parity out. The current values of the counter, the carry out and the borrow out will be stored in a register while the current value of the parity out will be stored in a separate register. The result is shown on the figure below:

Checking that the block diagram complies with our 10 design rules is quickly done:

- Our external interface is properly represented by the large surrounding rectangle.
- Our 2 combinatorial blocks (round) and our 2 registers (square) are clearly separated.
- We use only rising edge triggered registers.
- We use only one clock.
- We have 4 internal arrows (signals), 4 input arrows (input ports) and 4 output arrows (output ports).
- None of our arrows has several origins. Three have several destinations (
`clock`

,`ncnt`

and`do`

). - None of our 4 input arrows is an output of our internal blocks.
- Three of our output arrows have exactly one origin and one destination. But
`do`

has 2 destinations: the outside and one of our combinatorial blocks. This violates rule number 8 and must be fixed by inserting a new combinatorial block if we want to comply with VHDL versions prior 2008:

- We have now exactly 5 internal signals (
`cnt`

,`nco`

,`nbo`

,`ncnt`

and`npo`

). - There is only one cycle in the diagram, formed by
`cnt`

and`ncnt`

. There is a square block in the cycle.

Translating our block diagram in VHDL is straightforward. The current value of the counter ranges from 0 to 511, so we will use a 9-bits `bit_vector`

signal to represent it. The only subtlety comes from the need to perform bitwise (like computing the parity) and arithmetic operations on the same data. The standard `numeric_bit`

package of library `ieee`

solves this: it declares an `unsigned`

type with exactly the same declaration as `bit_vector`

and overloads the arithmetic operators such that they take any mixture of `unsigned`

and integers. In order to compute the carry out and the borrow out we will use a 10-bits `unsigned`

temporary value.

The library declarations and the entity:

```
library ieee;
use ieee.numeric_bit.all;
entity cooley is
port(
clock: in bit;
up: in bit;
down: in bit;
di: in bit_vector(8 downto 0);
co: out bit;
bo: out bit;
po: out bit;
do: out bit_vector(8 downto 0)
);
end entity cooley;
```

The skeleton of the architecture is:

```
architecture arc1 of cooley is
signal cnt: unsigned(8 downto 0);
signal ncnt: unsigned(8 downto 0);
signal nco: bit;
signal nbo: bit;
signal npo: bit;
begin
<...processes...>
end architecture arc1;
```

Each of our 5 blocks is modeled as a process. The synchronous processes corresponding to our two registers are very easy to code. We simply use the pattern proposed in the Coding example. The register that stores the parity out flag, for instance, is coded:

```
poreg: process(clock)
begin
if rising_edge(clock) then
po <= npo;
end if;
end process poreg;
```

and the other register that stores `co`

, `bo`

and `cnt`

:

```
cobocntreg: process(clock)
begin
if rising_edge(clock) then
co <= nco;
bo <= nbo;
cnt <= ncnt;
end if;
end process cobocntreg;
```

The renaming combinatorial process is also very simple:

```
rename: process(cnt)
begin
do <= (others => '0');
do <= bit_vector(cnt);
end process rename;
```

The parity computation can use a variable and a simple loop:

```
parity: process(ncnt)
variable tmp: bit;
begin
tmp := '0';
npo <= '0';
for i in 0 to 8 loop
tmp := tmp xor ncnt(i);
end loop;
npo <= not tmp;
end process parity;
```

The last combinatorial process is the most complex of all but strictly applying the proposed translation method makes it easy too:

```
u3d5: process(up, down, di, cnt)
variable tmp: unsigned(9 downto 0);
begin
tmp := (others => '0');
nco <= '0';
nbo <= '0';
ncnt <= (others => '0');
if up = '0' and down = '0' then
ncnt <= unsigned(di);
elsif up = '1' and down = '1' then
ncnt <= cnt;
elsif up = '1' and down = '0' then
tmp := ('0' & cnt) + 3;
ncnt <= tmp(8 downto 0);
nco <= tmp(9);
elsif up = '0' and down = '1' then
tmp := ('0' & cnt) - 5;
ncnt <= tmp(8 downto 0);
nbo <= tmp(9);
end if;
end process u3d5;
```

Note that the two synchronous processes could also be merged and that one of our combinatorial processes can be simplified in a simple concurrent signal assignment. The complete code, with library and packages declarations, and with the proposed simplifications is as follows:

```
library ieee;
use ieee.numeric_bit.all;
entity cooley is
port(
clock: in bit;
up: in bit;
down: in bit;
di: in bit_vector(8 downto 0);
co: out bit;
bo: out bit;
po: out bit;
do: out bit_vector(8 downto 0)
);
end entity cooley;
architecture arc2 of cooley is
signal cnt: unsigned(8 downto 0);
signal ncnt: unsigned(8 downto 0);
signal nco: bit;
signal nbo: bit;
signal npo: bit;
begin
reg: process(clock)
begin
if rising_edge(clock) then
co <= nco;
bo <= nbo;
po <= npo;
cnt <= ncnt;
end if;
end process reg;
do <= bit_vector(cnt);
parity: process(ncnt)
variable tmp: bit;
begin
tmp := '0';
npo <= '0';
for i in 0 to 8 loop
tmp := tmp xor ncnt(i);
end loop;
npo <= not tmp;
end process parity;
u3d5: process(up, down, di, cnt)
variable tmp: unsigned(9 downto 0);
begin
tmp := (others => '0');
nco <= '0';
nbo <= '0';
ncnt <= (others => '0');
if up = '0' and down = '0' then
ncnt <= unsigned(di);
elsif up = '1' and down = '1' then
ncnt <= cnt;
elsif up = '1' and down = '0' then
tmp := ('0' & cnt) + 3;
ncnt <= tmp(8 downto 0);
nco <= tmp(9);
elsif up = '0' and down = '1' then
tmp := ('0' & cnt) - 5;
ncnt <= tmp(8 downto 0);
nbo <= tmp(9);
end if;
end process u3d5;
end architecture arc2;
```

The proposed method is simple and safe but it relies on several constraints that can be relaxed.

Experienced designers can skip the drawing of a block diagram for simple designs. But they still think hardware first. They draw in their head instead of on a sheet of paper but they somehow continue drawing.

There are circumstances where asynchronous resets (or sets) can improve the quality of a design. The proposed method supports only synchronous resets (that is resets that are taken into account on rising edges of the clock):

```
process(clock)
begin
if rising_edge(clock) then
if reset = '1' then
o <= reset_value_for_o;
else
o <= i;
end if;
end if;
end process;
```

The version with asynchronous reset modifies our template by adding the reset signal in the sensitivity list and by giving it the highest priority:

```
process(clock, reset)
begin
if reset = '1' then
o <= reset_value_for_o;
elsif rising_edge(clock) then
o <= i;
end if;
end process;
```

We already used this in the final version of our example. Merging several synchronous processes, if they all have the same clock, is trivial. Merging several combinatorial processes in one is also trivial and is just a simple reorganization of the block diagram.

We can also merge some combinatorial processes with synchronous processes. But in order to do this we must go back to our block diagram and add an eleventh rule:

- Group several round blocks and at least one square block by drawing an enclosure around them. Also enclose the arrows that can be. Do not let an arrow cross the boundary of the enclosure if it does not come or go from/to outside the enclosure. Once this is done, look at all the output arrows of the enclosure. If any of them comes from a round block of the enclosure or is also an input of the enclosure, we cannot merge these processes in a synchronous process. Else we can.

In our counter example, for instance, we could not group the two processes in the red enclosure of the following figure:

because `ncnt`

is an output of the enclosure and its origin is a round (combinatorial) block. But we could group:

The internal signal `npo`

would become useless and the resulting process would be:

```
poreg: process(clock)
variable tmp: bit;
begin
if rising_edge(clock) then
tmp := '0';
for i in 0 to 8 loop
tmp := tmp xor ncnt(i);
end loop;
po <= not tmp;
end if;
end process poreg;
```

which could also be merged with the other synchronous process:

```
reg: process(clock)
variable tmp: bit;
begin
if rising_edge(clock) then
co <= nco;
bo <= nbo;
cnt <= ncnt;
tmp := '0';
for i in 0 to 8 loop
tmp := tmp xor ncnt(i);
end loop;
po <= not tmp;
end if;
end process reg;
```

The grouping could even be:

Leading to the much simpler architecture:

```
architecture arc5 of cooley is
signal cnt: unsigned(8 downto 0);
begin
process(clock)
variable ncnt: unsigned(9 downto 0);
variable tmp: bit;
begin
if rising_edge(clock) then
ncnt := '0' & cnt;
co <= '0';
bo <= '0';
if up = '0' and down = '0' then
ncnt := unsigned('0' & di);
elsif up = '1' and down = '0' then
ncnt := ncnt + 3;
co <= ncnt(9);
elsif up = '0' and down = '1' then
ncnt := ncnt - 5;
bo <= ncnt(9);
end if;
tmp := '0';
for i in 0 to 8 loop
tmp := tmp xor ncnt(i);
end loop;
po <= not tmp;
cnt <= ncnt(8 downto 0);
end if;
end process;
do <= bit_vector(cnt);
end architecture arc5;
```

with two processes (the concurrent signal assignment of `do`

is a shorthand for the equivalent process). The solution with only one process is left as an exercise. Beware, it raises interesting and subtle questions.

Level-triggered latches, falling clock edges, multiple clocks (and resynchronizers between clock domains), multiple drivers for the same signal, etc. are not evil. They are sometimes useful. But learning how to use them and how to avoid the associated pitfalls goes far beyond this short introduction to digital hardware design with VHDL.

VHDL 2008 introduced several modifications that we can use to further simplify our code. In this example we can benefit from 2 modifications:

- output ports can be read, we do not need the
`cnt`

signal any more, - the unary
`xor`

operator can be used to compute the parity.

The VHDL 2008 code could be:

```
library ieee;
use ieee.numeric_bit.all;
entity cooley is
port(
clock: in bit;
up: in bit;
down: in bit;
di: in bit_vector(8 downto 0);
co: out bit;
bo: out bit;
po: out bit;
do: out bit_vector(8 downto 0)
);
end entity cooley;
architecture arc6 of cooley is
begin
process(clock)
variable ncnt: unsigned(9 downto 0);
begin
if rising_edge(clock) then
ncnt := unsigned('0' & do);
co <= '0';
bo <= '0';
if up = '0' and down = '0' then
ncnt := unsigned('0' & di);
elsif up = '1' and down = '0' then
ncnt := ncnt + 3;
co <= ncnt(9);
elsif up = '0' and down = '1' then
ncnt := ncnt - 5;
bo <= ncnt(9);
end if;
po <= not (xor ncnt(8 downto 0));
do <= bit_vector(ncnt(8 downto 0));
end if;
end process;
end architecture arc6;
```