RTL Coding in FPGA

Module designers shall have detailed view of the design down to function/major component level for near-accurate estimates. At the end of this phase, exact FPGA part to be used shall be finalized from the chosen family.The following are critical aspects which need to be considered during RTL coding phase:

  1. Logic delay: Though it may be adequate to maintain logic delay of around 50%, it is desirable to maintain high speed paths in the design lower than that, say to 20-30%. Usually there are abundant resources such as Flip Flops (normally 1 flip flop for each look-up table), RAMs, and Multipliers etc. Wherever it doesn’t affect throughput, additional pipeline stages can be introduced judiciously keeping in mind the routing congestion issues.

2.      Device mapping efficiency: The RTL code shall enable best FPGA mapping by exploring the device architecture.. If a 4:1 MUX is coded as single entity, it will map well in one slice with 2 LUTs and an F5 MUX. Instead if 4:1 MUX built with pipelining after 2:1 MUX, then it can’t be mapped to F5 MUX and additional slice is needed. Another example is long register based shift register can be mapped to SRL configuration of LUT, provided all these registers need not have reset.

3.      Fan-out: Though synthesis tools can do automatic fan-out control, manual control is needed especially for the signals interfacing to hard-macros, as tools will treat every thing in same manner and often they are black-boxes.

4.      Vendor specific structures and instantiations: Create hierarchy around them to give freedom to migrate from one technology to another.

5.      Macro interface: All the inputs/outputs of macros shall be registered due to their fixed locations.

6.      Gated clocks: Avoid gated clocks and use clock enables instead.

7.      Critical logic: Place critical logic in separate hierarchy

8.      Critical paths: Make sure that they are not crossing hierarchy of the block by registering all the outputs.

9.      Tri-state buffers: For low speed paths, it is desirable to use tri-state buffers to save logic cells

10.  Unused hard-macros: Unused RAMs can be used as register set or to map state machines coded as look up tables. This will also avoid large multiplexers in the read path. Also unused multipliers can be used as long shifters.

11.  False and multi-cycle paths: False and multicycle paths shall not be pipelined and shall be identified by design and pass on to synthesis tool.

12.  Trail synthesis and P&R: Each module level designer shall perform individual module level synthesis and P&R of the design with the given floorplan and optimize the RTL code while being developed. If the IO requirement of a module exceeds the device physical IOs, dummy logic can be added to demultiplex/miltiplex few-pins-to-more-pins and/or more-pins-to-few-pins using shift register structures and/or OR-gate structure as shown in Figure 2. Also as shown in this figure insert additional flip-flops on interfaces to selected module to other modules by leaving actual IO interfaces same. This will eliminate skewed timing results due to dummy logic and connections. Also black-box timing information shall be used during synthesis to avoid skewed timing results.

13.  Module level Floorplanning: With-in the given floorplan area, often it is desirable to do sub-module level floorplanning. In this submodule level floorplanning it is often necessary to do floorplan only for critical parts of the design. Also it is necessary to do individual synthesis compile of timing critical sub-modules being floorplanned which will prevent hierarchy loss (as shown in Figure 3), and there-by ineffcient placement.

14.  Logic compression: Though from area standpoint it is preferred to do maximum level packing of unrelated logic (for example using COMPRESSION with Xilinx flow), it will have adverse impact on timing. Thus unrelated logic packing level shall be set based on timing criticality of each sub-module.

15.  IO allocation: The respective module IO fixing shall be done based on IO ring pin sequence on the die rather than pin sequence on the package.

 

To view or add a comment, sign in

More articles by Sampath VP

  • Switch to Stand out

    Switch to Stand out

    Bharath Semiconductor Society which emphasis on the ESDM, Semiconductor, entrepreneurship,MSMEs and academics. Since…

    1 Comment
  • ASIC RTL vs FPGA RTL

    ASIC RTL vs FPGA RTL

    The biggest difference between RTL design for ASIC and RTL design for FPGA is that ASICs are custom-designed integrated…

  • DV TALK 31ST AUGUST DONT MISS IT!

    DV TALK 31ST AUGUST DONT MISS IT!

    DV TALK Greetings from Bharath Semiconductor society.Bharath Semiconductor Society of India was established in 2022 as…

    3 Comments
  • Transaction layer of PCIe

    Transaction layer of PCIe

    Transaction layer Transaction layer’s primary responsibility is to create PCI Express request and completion…

  • Deep learning designs

    Deep learning designs

    DL designs for training can be a large size due to the shear amount of high precision MACs, memory routing, and…

  • PCIe Equalization phases

    PCIe Equalization phases

    Equalization is a critical aspect of PCIe technology that ensures the integrity of data transmission in increasingly…

  • PCIe Equalization

    PCIe Equalization

    · PCIe 3.0: Gen 3 introduced static equalization, primarily performed by the transmitter using 128/130 encoding.

  • PCIe Enumeration

    PCIe Enumeration

    PCIe enumeration is the process of detecting the devices connected to the PCIe bus. switches and endpoint devices are…

  • Important rules in RTL-Signoff

    Important rules in RTL-Signoff

    Before the RTL-Signoff template was applied on any of the blocks, the selected rules were surveyed to determine if…

  • Need of the hour

    Need of the hour

    Engineers need to figure out remote network configurations for development tools for MCUs and FPGAs, simulation, and…

Insights from the community

Others also viewed

Explore topics