But you can do that in Verilog...
No. You can’t!
The decision to introduce new language constructs is often challenged, and with good reason. We have not chosen to do so lightly; we're doing so out of necessity. Many new languages have come about to improve syntax. They all fail to gain traction. TL-X is adding semantic value. It is all but impossible in RTL to get the retiming properties offered in TL-X, which are of fundamental importance in today’s high-speed designs. While we're at it, we're taking the opportunity to provide better syntax, and eliminate 30 years of legacy baggage, and there's significant value in doing so. But we would not be here just for those reasons.
In this post, I’ll focus on SystemVerilog, though similar arguments can be made of VHDL. I’ll explain SystemVerilog’s legacy, describe today’s best coding practices, and show how one would best code SystemVerilog to achieve the same benefits of TL-Verilog. The awkwardness and limitations of doing so will rationalize the choice to introduce new constructs.
The Legacy of Verilog
Despite the fact that Verilog is referred to as a "Hardware Description Language," it is nothing of the sort. Verilog is an "Event-Based Simulator Description Language". Always blocks are a unit of simulation. The simulator schedules these blocks for execution. Why on earth are logic designers managing the simulation?? They shouldn’t be.
Furthermore, always blocks use sequentially semantics. Why are we expressing hardware, which is fundamentally parallel, using sequential semantics?? We shouldn’t be. Not to draw too hard a line, there are times when sequential semantics enable a simpler expression of logic, and allowing tools to de-serialize the logic can provide value, but it also introduces a level of separation between the expression of the logic and the implementation that makes it difficult to manipulate and control the implementation.
Hardware designers should be designing hardware, not a simulator. So, the first value proposition of new semantics is to eliminate these “simulation blocks” and to make sequential semantics the exception, not the rule. If a simulator wants to perform event-based simulation, let tools turn the hardware model into simulation blocks for scheduling. And let it do so as makes the most sense for the particular simulator and hardware platform on which the simulation is to be run, not according to logic partitioning in the source code.
Best Practices with SystemVerilog
Let's say in one clock cycle you want to do the following
a = b + c;
d = a - 1;
and capture d in a flip-flop.
As straight-forward as this logic is, there are numerous ways to implement it in Verilog, all of which have nuances. One must consider blocking vs. non-blocking assignments, which are the source of endless confusion, frustration, blog posts, papers, specs, etc. "Nonblocking Assignments in Verilog Synthesis, Coding Styles That Kill!" is a 24-page "Best Paper" at SNUG 2000 that explains what you need to be aware of as a Verilog designer. One must also be aware of the 18 rules for where you can and can’t use reg vs. wire, described in "Verilog: wire vs. reg". Finally, maintaining correct sensitivity lists for always blocks is a nightmare (see “SystemVerilog Insights: always @* and always_comb are not the same !!”). Mistakes in sensitivity lists and mistakes with the use of blocking vs. non-blocking assignments affect simulation and can hide functional bugs and lead to very costly post-Si bugs.
Fortunately, SystemVerilog addressed these problems, though it must carry forward its legacy baggage. Although many environments, esp. FPGA synthesis tools, have limited support for SystemVerilog, the features that address these headaches are supported by most, but not all popular tools. Of course, without support in all tools, your code is less portable with these constructs. Though there is interest in TL-Verilog for its clean language constructs that can deliver into tools that don’t support clean constructs, this is merely a transient issue and was not the motivation for new constructs. I will focus on SystemVerilog coding practices assuming better support is coming soon. A good summary of recent FPGA synthesis tool support is here: "Can My Synthesis Compiler Do That?"
Without describing in detail the myriad bad things you could do, or that you could get stuck with in inherited code, generally-accepted best coding practices in SystemVerilog would have the above logic implemented as:
logic [7:0] a, b, c, next_d, d;
a = b + c;
next_d = a - 8'b1;
always_ff @ (posedge clk) d <= next_d;
Several considerations lead to the code style choices above:
Using logic avoids the headaches with reg and wire.
Using always_comb blocks avoids the need to manage sensitivity lists.
Using always_comb with blocking (“=”) assignments for combinational logic and always_ff with non-blocking (“<=”) assignments for sequential elements avoids non-deterministic simulation.
Avoiding combinational logic inside always_ff ensures that the pre-flopped version of a signal (next_d) is available to other logic.
Pipelined Design in SystemVerilog:
The main selling point of TL-Verilog is the fluidity with which logic can be retimed. Pipelines provide the construct for specifying and safely adjusting the timing of logic across pipeline stages.
SystemVerilog has no specific support for pipelines. If you need to retime the calculation for d to the next pipestage, you’d end up with:
logic [7:0] a, a2, b, c, d;
a = b + c;
always_ff @ (posedge clk) a2 <= a;
d = a2 - 8'b1;
This requires substantial error-prone change, and the resulting code does not clearly reflect the timing of signals -- d is in a later pipeline stage than b. Real-world code is littered with code like this that is difficult to modify without introducing bugs because the nature of the timing is unclear.
Many organizations have adopted coding practices that encourage or require the use of pipeline names and pipeline stages in signal names (and perhaps other things like clock domain and edge sensitivity). This results in:
logic [7:0] a_p1, b_p1, c_p1, d_p1, d_p2;
a_p1 = b_p1 + c_p1;
d_p1 = a_p1 - 8'b1;
always_ff @ (posedge clk) d_p2 <= d_p1;
This clarifies the pipestages, but it gets a bit cumbersome to look at, and requires even more change to retime:
logic [7:0] a_p1, a_p2, b_p1, c_p1, d_p2;
a_p1 = b_p1 + c_p1;
always_ff @ (posedge clk) a_p2 <= a_p1;
d_p2 = a_p2 - 8'b1;
This is a significant change from the original code because the stage suffixes must be updated appropriately on every signal. We can mimic TL-Verilog like this.
logic [7:0] a_p [2:1], b_p [1:1], c_p [1:1], d_p [2:2];
`define s 1
a_p[s] = b_p[s] + c_s[s];
always_ff @ (posedge clk) a_p[s+1] <= a_p[s];
`define s 2
d_p[s] = a_p[s] - 8'b1;
Now the assignment of d could be moved between stage 1 and 2 without change… but, the signal declaration ranges must be changed. It’s possible to use superset ranges, though there are considerations with unused signals. We also still have to manually manage the flip-flops. This can be addressed by burying all the signals in a structure representing the transaction in the pipeline and staging this whole structure through its pipeline. I’ve seen it done. It works. But I’ve run out of energy to do it myself, here. It provides the nice retiming properties, but you can only imagine what it looks like. For large pipelines it’s actually not horrid, but there’s overhead to establish each pipeline that makes this approach completely impractical.
Bottom line… No, you cannot do that in SystemVerilog!