Tutorial 3: Investigating Xilinx FPGA Flow with Torc – Manual Control

In this tutorial we’ll continue on the path of investigating the FPGA implementation flow from Xilinx. Here, we’ll take an aside to go through what exactly happens when using ISE. Then we’ll present the actual tools that are being called from behind the scenes and explain what is being generated and how to manipulate the flow process.

Overview of FPGA design process flow

When you create a new project in ISE you’re presented with a design flow that encompasses three steps: Synthesize, Implement, and Generate Programming File

ISEoptions

In the synthesis stage, the hardware description language (HDL) design is analyzed and converted into whats called a ‘netlist’. This representation is composed of generic circuit components: flip flops, and gate, or gates, etc. and the connections between them are called ‘nets’. At this point, the design isn’t fixed for any particular FPGA device, in fact the same netlist could be used to build an application specific integrated circuit (ASIC) or custom chip.

The implement stage is where all the real work happens. If we expand this step in ISE we are presented with a set sub tasks:

ISEoptions2

Notice that the implement steps is actually composed of four sub steps: Translate, Map, Place, and Route. In the translate step, the design is converted from a generic netlist to a Xilinx specific netlist. In this version, all of the circuit components are specific to Xilinx devices: flip flops (FFs), look-up tables (LUTs), multiplexors (MUXs), etc. But at this stage the design is still generic to Xilinx FPGAs, not customized for the particular device that we want to use.

The mapping step converts the Xilinx specific component netlist to one customized for the particular components available in the device that we want to implement in. This may involve 4-input LUTs, 5-input LUTs, or 6-input LUTs depending on the device that we choose.

Placement then takes this device-specific netlist and specifies exactly where on the particular device each component will be placed. As simple as this sounds, think of it as a HUGE puzzle with millions of pieces to be connected together. Or more realistically, consider finding a seating arrangement for all of your friends and relatives at your wedding or birthday party where you try to seat everyone so that everyone will have a good time. If your aunt really dislikes your friend, then they shouldn’t sit near each other. But you want to place your grandma next to your aunt. Except in the case of FPGA placement instead of seating people around a table, you have to place them in an X-Y grid. Starting to sound more difficult than you initially thought, huh.

Lastly, once the components are placed on the device the ‘nets’ or connects/signals between components have to get connected together. FPGAs have an immense amount of on-chip routing resources including: local wires, long wires, global wires, switch boxes, etc. Initially, we want all of the components connected together the right way. But ideally, we want them connected with the lowest latency connections to enable us to achieve a fast clock frequency for the design.

At this point, the design is completely ready to program onto the device. The generate programming file step simply takes the design and puts it into a representation that can be used to program the FPGA device. This representation is a series of  bits and instructions that will configure the device, setup the equations in the LUTs, configure the routes and switches, etc.

Synthesis

If we click the Synthesize option and perform synthesis in ISE you notice in the console that it displays the exact command that is used to call the executable. For the example AND design that we previously evaluated in Tutorials #1 & #2, heres the output after running synthesis:


Started : "Synthesize - XST".
Running xst...
Command Line: xst -intstyle ise -ifn /import/home/skalicky/Projects/redsharc/summer14/torc/ISE/and/top.xst" -ofn "/import/home/skalicky/Projects/redsharc/summer14/torc/ISE/and/top.syr"
Reading design: top.prj

=========================================================================
* HDL Parsing *
=========================================================================
Parsing VHDL file “/import/home/skalicky/Projects/redsharc/summer14/torc/ISE/and/top_clk_div.vhd” into library work
Parsing entity .
Parsing architecture of entity .

=========================================================================
* HDL Elaboration *
=========================================================================

Elaborating entity (architecture ) from library .

=========================================================================
* HDL Synthesis *
=========================================================================

Synthesizing Unit .
Related source file is “/import/home/skalicky/Projects/redsharc/summer14/torc/ISE/and/top_clk_div.vhd”.
Summary:
no macro.
Unit synthesized.

=========================================================================
HDL Synthesis Report

Found no macro
=========================================================================

=========================================================================
* Advanced HDL Synthesis *
=========================================================================

=========================================================================
Advanced HDL Synthesis Report

Found no macro
=========================================================================

=========================================================================
* Low Level Synthesis *
=========================================================================

Optimizing unit …

Mapping all equations…
Building and optimizing final netlist …
Found area constraint ratio of 100 (+ 5) on block top, actual ratio is 0.

Final Macro Processing …

=========================================================================
Final Register Report

Found no macro
=========================================================================

=========================================================================
* Partition Report *
=========================================================================

Partition Implementation Status
——————————-

No Partitions were found in this design.

——————————-

=========================================================================
* Design Summary *
=========================================================================

Clock Information:
——————
No clock signals found in this design

Asynchronous Control Signals Information:
—————————————-
No asynchronous control signals found in this design

Timing Summary:
—————
Speed Grade: -2

Minimum period: No path found
Minimum input arrival time before clock: No path found
Maximum output required time after clock: No path found
Maximum combinational path delay: 0.787ns

=========================================================================

Process “Synthesize – XST” completed successfully

Ive highlighted the important portion of the output, showing the exact command that was executed. The top.xst file contains a set of commands that the synthesis tool (called Xilinx Synthesis Tool, or xst) reads when executing. If you look in your ISE directory, you’ll be able to find this file, top.xst. Opening it will reveal all of the project settings that you configure when creating an ISE project. Heres a copy of my top.xst file:


set -tmpdir "xst/projnav.tmp"
set -xsthdpdir "xst"
run
-ifn top.prj
-ofn top
-ofmt NGC
-p xc7k325t-2-ffg900
-top top
-opt_mode Speed
-opt_level 1
-power NO
-iuc NO
-keep_hierarchy No
-netlist_hierarchy As_Optimized
-rtlview Yes
-glob_opt AllClockNets
-read_cores YES
-write_timing_constraints NO
-cross_clock_analysis NO
-hierarchy_separator /
-bus_delimiter <>
-case Maintain
-slice_utilization_ratio 100
-bram_utilization_ratio 100
-dsp_utilization_ratio 100
-lc Auto
-reduce_control_sets Auto
-fsm_extract YES -fsm_encoding Auto
-safe_implementation No
-fsm_style LUT
-ram_extract Yes
-ram_style Auto
-rom_extract Yes
-shreg_extract YES
-rom_style Auto
-auto_bram_packing NO
-resource_sharing YES
-async_to_sync NO
-shreg_min_size 2
-use_dsp48 Auto
-iobuf YES
-max_fanout 100000
-bufg 32
-register_duplication YES
-register_balancing No
-optimize_primitives NO
-use_clock_enable Auto
-use_sync_set Auto
-use_sync_reset Auto
-iob Auto
-equivalent_register_removal YES
-slice_utilization_ratio_maxmargin 5

Most importantly, the -ifn flag indicates which project file xst should read to find the HDL to synthesize. In this case, its the top.prj file which also exists in your ISE director. Heres a copy of mine:

vhdl work "top_clk_div.vhd"

Yes, thats right, theres only one line in the file. This file is used to keep track of the HDL sources that make up your system design. Since our design is so simple it only has the single .vhd file. Knowing this, we can easily create our own .prj file (or just as easily copy this one) and use it to run xst ourselves from the command line. At a minimum we’ll also need another script file to pass into xst with the -ifn flag. At a minimum, this .xst file must contain:

run
-ifn top.prj
-ofn top
-ofmt NGC
-p xc7k325t-2-ffg900
-top top

The run command is the main synthesis command. It allows you to run synthesis in its entirety, beginning with the parsing of the Hardware Description Language (HDL) files, and ending with the generation of the final netlist. [citation] After the run command, you can provide any number of options, one per line beginning with a dash ‘-‘. The first one refers to the project file we discussed before. The second one specifies the name of the output file and the third one is the format of the output file (NGC – native generic C-something, which is the xilinx specific output).  The fourth one is the device part number that we want to use, this provides some more information about what hard core components (like DSP, or multipliers) are available rather than having to cluster individual gates together to represent these larger functional blocks. Lastly the -top flag allows us to specify the name of the top level module in the design (which also happens to be called “top”). Given both the .prj and .xst files, we can now call xst from the command line using the following syntax (assuming that all of the files are in the same folder):

xst -intstyle ise -ifn top.xst

the “-intstyle ise” flag indicates the level of printing that xst will produce. You should now be able to see a top.ngc file in the directory after running the command. This is the synthesized netlist.

Translate

To translate the design to Xilinx primitives, we need to provide the user constraints file (UCF) to the native generic database builder (ngdbuild) tool. To execute this step use the following command:

ngdbuild -intstyle ise -dd _ngo -uc k7_base.ucf -p xc7k325t-ffg900-2 top.ngc top.ngd

Similarly to xst, the ngdbuild command is followed by options. The first -intstyle option is the same as before. The -dd option allows us to specify the destination directory for temporary/intermediate files. By default, ngdbuild looks for a UCF with the same name as the netlist, but we can specify another file name using the -uc option. The last two entries specify the input netlist (top.ngc) and the output file (top.ngd) names.

Map

To map the design to the specific primitives available for the chosen device, we use the xilinx mapper tool (map). To execute this step use the following command:

map -intstyle ise -p xc7k325t-ffg900-2 -w -o top_map.ncd top.ngd top.pcf

The options for the map command include some similar entries, we’ll ignore these we’ve already mentioned. The -w option allows the tool to overwrite any previously existing files (from running mapping previously). The -o option allows us to specify the name of the output file. If we don’t specify a name, map will output using the same name as the input file (overwriting the input top.ncd file).

Note: For all devices after Spartan 3 & Virtex 4 (meaning Spartan 6, Virtex 5/6, and all 7-series devices) placement occurs during mapping, at least an initial version of the placement. There is no way to direct the map tool not to perform this step, its now part of the normal flow, designed to achieve better utilization and performance.

Place & Route

Previously we discussed each of these steps separately. However, placing components has an impact on routing the connections together and vice-versa so they are normally performed simultaneously. The following command performs placement and routing on the design:

par -w -intstyle ise top_map.ncd top.ncd top.pcf

Bitstream Generation

The last step is to convert the final placed and routed .ncd file into a stream of bits necessary to program the FPGA device. This bitstream generation (bitgen) tool is executed like:

bitgen -intstyle ise -w top.ncd

After running this command you should find a top.bit file in the directory. This is the file that will be used to program the actual device.

Summary

In total, we’ve gone through the entire development flow for Xilinx FPGA devices. We’ve investigate the actual tools that are executed each step in the flow, and how to manipulate them at the command line level outside of ISE. The figure below shows the overall flow and the tools used at each step.

 

XilinxFlow

One response to “Tutorial 3: Investigating Xilinx FPGA Flow with Torc – Manual Control

  1. Pingback: Tutorial 4: Placement Functionality in Torc | Sam Skalicky·

Leave a comment