Circuit Steering Logic Extraction (PDF)

Extracting Steering Logic
Duo Liu
The project I am working on is to extract steering logic from HLS. I will use
GAUT to do the initial scheduling with respect to resource and latency
constraints and use TDS to improve it. With the optimized cdfg file, the final
VHDL will be generated using GAUT. Then the expected DOT file will be
transformed from the steering logic description which is extracted from the
achieved VHDL file. The conversion process is totally automatic, which takes
in a VHDL file (shall be generated by GAUT or it may not work so well) as
input and produces exactly the expected DOT description of steering logic as
output.
Below is the concrete example of the implementation flow.
In this example I choose the design of FIR8, it is adapted from an example,
FIR16, in the test folder in GAUT just for simplicity. The codes are shown as
follows in Fig.1.
From the codes we can see that the input data is given to echantillon, and
output =
temporaire = echantillon*Coeffs[7] + Valeurs[1]*Coeffs[6] +
Valeurs[2]*Coeffs[5] + Valeurs[3]*Coeffs[4] + … + Valeurs[7]*Coeffs[0]
We also get the information that Valeurs[7]= Valeurs[6]= Valeurs[5]=…=
Valeurs[1]=echantillon=data_in,
thus
the
output
temporaire
=
data_in*( Coeffs[7] + Coeffs[6] + Coeffs[5] + … + Coeffs[0] ).
Generation of CDFG File and Optimization Using TDS
First of all, we will use the Compiler block of GAUT. The compiler will extract
all potential parallelism from the codes at the basis of ensuring the
correctness of the algorithm syntactically and semantically. After this initial
checking step, we can get the preliminary DFG generated by GAUT by
compiling the codes. The fir8.cdfg file is shown in Fig.2.
1
Fig.1 Example Codes
Fig.2 fir8.cdfg File
The dark yellow circles represent Coeffs integer array and intermediate
values, the bright yellow circle which at the most right is output, the green
one in the middle is input data and the red ones are Valeurs variables. The
blue boxes represent operations, addition and multiplication in this case.
2
This DFG can now be used as the input file to TDS, which provides behavioral
transformations of specific designs for a particular objective. The
transformations manipulated by TDS are based on graph-based canonical
representation, Taylor Expansion Diagram (TED), and focus on the
transformation of the DFG network.
First of all, I copied the fir8. cdfg file from the folder in GAUT and pasted it
into TDS’s folder. Then, I ran TDS in terminal and used command “read
--no_dff fir8.cdfg” to read the file as a netlist without discovering the DFFs.
Through command “show -n” we can observe the netlist:
Fig.3 Netlist of fir8.cdfg in TDS
Then I transformed the file into TED formation, as shown in Fig.4. In the subsequent steps I reordered the variables (as few variables in this case, the influence of reordering is not significant, at least as I see), and linearized and
balanced the TED function. In the initial try I didn’t replace the multipliers
with shifter, so only utilized multipliers, adders and subtractors. After transforming from ted to dfg using command “ted2dfg”, I balanced the dfg thus
acquired the minimum-latency dfg, finally the dfg was transformed to netlist
by running the command “dfg2ntl” and the cdfg file was generated by running “write” command. The cdfg with minimum-latency was then copied to
GAUT, the DFG graphs in TDS (Fig.5) and GAUT (Fig.6) are as follows.
In this process, I wasted a lot of time here for trying to find a correct combi nation of commands and a right flow to optimize TED and DFG by trial and
error, although the result is still not the most satisfactory one, through many
times of attempts I understand that the optimal result needs the combination
of optimization both on TED level and DFG level, what we did in homework 1
is just on the surface. I gained the experience that the conversions among
TED, DFG and NTL at the right time are also very important, and that’s why
at the very beginning I always could not acquire a usable CDFG file for GAUT.
3
Fig.4 TED Formation in TDS
Fig.5 Minimum-Latency DFG Graph in TDS
Fig.6 Minimum-Latency DFG Graph in GAUT
4
After achieving the first graph successfully, I then implemented the fir8 function with only one adder, one multiplier and one subtractor. The process is
similar but the latency is much higher while the area is decreased to the low est. The DFG graphs in TDS (Fig.7) and GAUT (Fig.8) are as follows.
Fig. 7 Minimum-Area DFG Graph in TDS
Fig.8 Minimum-Area DFG Graph in GAUT
When all these are done, I decided to implement this circuit with shifters. To
achieve this goal, some modifications are done with TDS commands, especially adding 2 important commands: “shifter” and “remapshift”. The DFG
graphs in TDS (Fig.9) and GAUT (Fig.10) are shown as follows.
5
Fig.9 DFG Graph Using Shifters in TDS
Fig.10 DFG Graph Using Shifters in GAUT
Generation of Gantt and VHDL
The next step is to use the synthesis block in GAUT. The purpose of this context is to synthesize under constraints of rate the formerly analyzed circuit
carrying out the calculation in the algorithm. In this step, we can specify the
cadency, which equals sampling rate, of the design, the memory and I/O constraints (which are less useful in this case because the goal network is the
steering logic), and also the allocation and scheduling strategy.
Step 1: the CDFG file which is generated aiming at achieving the minimum
latency is synthesized. In this method no shifter is used. We choose the “operator optimization” option; other selected options of the configurations are
illustrated in Table.1.
6
allocation strategy
global_lb: the number of resources given by the global allocation is considered as a lower-bound value
scheduling
strategy
ASAP promise the least latency
Register allocation
MWBM
vhdl output
Fsm_sig (at the very first I planned to use Structural +
FUS_LEA4 as the output, because it utilizes a modified leftedge algorithm which reduces the number of registers as
well as the cost of MUXs. However, through running the
synthesis several times, I found that the operator resources and latency needed for Fsm_sig and Structural +
FUS_LEA4 are exactly the same. As Structural + FUS_LEA4
always collapses when using large cadency, I choose
fsm_sig as the output.)
Table.1 Configuration of Minimum-Latency Synthesis Using GAUT
As for which register allocation method to use, we can compare by running
the same cdfg file (I choose the one with minimum latency) with different
register allocation methods (other options remain unchanged), a table can
be drew as Table.2.
MWBM
MLEA
Left Edge
None
Adder No.
2
2
2
2
Substractor No.
2
2
2
2
Multiplier No.
8
8
8
8
Total Operator
Area
696
696
696
696
Simple Register
No.
14
10
10
0
2 to 1 MUX No.
96
176
256
96
Latency
60
60
60
60
Flip-Flop No.
224
160
160
368
7
Table.2 Comparison of Different Register Allocation Methods
From table above we can observe that without any allocation algorithm, a
great number of flip-flop are needed, and the middle 2 algorithms need a
large number of MUX, MWBM also needs many flip-flops, but less MUXs, and
the cost of MUX is apparently higher than flip-flop, hence I choose MWBM
strategy.
Through further investigation, we can find the fact that as the ASAP is used
the minimum latency=60 is fixed, but the number of steering logic components differs due to different cadency settings. The details can be observed
from the Table.3.
As for allocation strategies, only distributed_lb, distributed_reuse_lb and
global_lb can be used and the scheduling results are exactly the same given
other parameters are configured the same way. Through comparison we set
the cadency to 70 ns.
Cadency(n Latency(ns
s)
)
+
*
-
Area_oprtr Regs
s
FFs
MUX2_1 Databus
50
60
2
8
2
696
14
224
96
1
60
60
2
8
2
696
12
192
96
2
70
60
2
8
2
696
12
192
96
1
80
60
2
8
2
696
12
192
96
1
90
…
…
…
…
…
…
…
…
...
Table.3 Steering Logic Components Comparison Using Different Cadencies
The Gantt graph for minimum latency is as follows:
8
Fig.11 Gantt Graph for Minimum Latency
Step 2: the DFG with minimum area will be synthesized. I tried using only 1
adder, 1 multiplier, 1 substrctor. As the latency is 180, the cadency must be
no less than 180. And the ASAP algorithm cannot be used. The total operator
area is 99. Moreover, we choose the “operator optimization” option. Other
configurations are as follows.
Allocation strategy
manual
Scheduling
strategy
ASAP
Register allocation
MEBM
Vhdl output
Fsm_sig
Table.4 Configuration For Minimum Area Synthesis
The resulting steering logic composition is as follows.
Cadency(n Latency(ns +
*
- Area_oprtr Regs
s)
)
180
180
1
1
1
99
4
FFs
64
Mux2_1 DataBus
256
Table.5 Steering Logic Components Under Minimum Area
9
1
However this is not the minimum area can be achieved because in the
process of TED optimization the TDS recognizes the adder and subtractor as
different operators. Thus when the optimized cdfg file transferred back to
GAUT, the synthesis block will use different operators for scheduling which
increases the total area. In comparison, the original cdfg file which is not op timized by TDS, however, can achieve an even smaller area by reusing the
operator that is used in former time steps in a complete scheduling process.
According to the configuration in Table.6 we can acquire the corresponding
steering logic components in Table.7.
Allocation strategy
automatic
Scheduling strategy
Default/no_pipeline/no_more_stage
Register allocation
MWBM
Vhdl output
Fsm_sig
Table.6 Another Configuration For Minimum Area Synthesis
Cadency(n Latency(ns +
s)
)
180
180
1
*
-
1
0
Area_oprtr Regs
91
6
FFs Mux2_1 Datanus
96
64
2
Table.7 Final Steering Logic Components After Minimum Area Synthesis
The Gantt view is illustrated below.
Fig.13 Gantt View for Final Minimum Area Synthesis
The VHDL files of all the syntheses above are achieved, as the length is a little verbose now, I don’t list them here.
Extracting Steering Logic From VHDL File and Convert It to
DOT Language
In this part, I will first extract the steering logic from hardware description,
VHDL file generated by GAUT and then convert the extracted VHDL-format
10
logic to a visible file format, the DOT file. Dot file can be read as a picture using Graphviz package which is pre-installed in UNIX operating system. All I
need to do is changing the VHDL description into DOT description which has
its own syntax, grammar and semantic meaning.
The language I use to implement the automatic-converting program is
Python due to its simplicity in grammar and sentence pattern. The whole program can be divided into 7 parts in order, each of which has its own function.
We start from the first part.
1. The First Extraction
The first extraction will extract the case bodies from the original VHDL file.
Part of the extracted file is shown below.
when S0 => -- time 0
when S1 => -- time 10
-- outputs of operation op0
reg_2 <= comp_1_mul_op_o;
-- static memory read from buses
reg_1 <= BUS_DONNEES_1_fir8;
reg_3 <= BUS_DONNEES_2_fir8;
when S2 => -- time 20
when S3 => -- time 30
-- outputs of operation op1
reg_5 <= comp_1_mul_op_o;
-- static memory read from buses
reg_1 <= BUS_DONNEES_1_fir8;
reg_3 <= BUS_DONNEES_2_fir8;
when S4 => -- time 40
when S5 => -- time 50
-- outputs of operation op2
reg_8 <= comp_1_mul_op_o;
-- static memory read from buses
reg_1 <= BUS_DONNEES_1_fir8;
reg_3 <= BUS_DONNEES_2_fir8;
when S6 => -- time 60
-- outputs of operation op7
reg_11 <= comp_0_add_op_o;
when S7 => -- time 70
-- outputs of operation op3
reg_5 <= comp_1_mul_op_o;
In the above lines, “reg_n” (n ∈ Z+) represents one register,
“comp_1_mul_op_o” represents the output of multiplier 1. “ comp_1_mul_op_a”
and “comp_1_mul_op_b” are two inputs of the multiplier operator. The reason
why extract the case body is because all the steering logic information are
contained here. For example, looking at the top of the above VHDL sentences, at S1 state, the output of the area should be sent to reg_2. The VHDL
11
file is easy to understand, but when extracting these sentences, attention
shall be paid to Python language. In the sentence “pattern =
re.compile('case \S+ is(.*?)\s*end case', re.DOTALL)”, “re” refers that
the regular expression library is used, “'case \S+ is(.*?)\s*end case'” is the
part of the whole VHDL file that we want to extract which is the case bodies.
“re.DOTALL” means the search will go through all VHDL file (or else the
search will terminate at the end of each line). “reg = pattern.findall(data)
” returns a list of strings, by “text = '\n'.join(reg)” the list is transformed to
consecutive strings which are exactly what we want to extract in the first
step.
2. Second Extraction
In this step, I further extract the file we get in the first step. After this step
what I get is pure data transfer relationship without state information. (We
still need state information to help assure the data transfer occurs at the
right time, this will be handled in the following steps) part of the achieved
sentences is as follows:
reg_11 <= comp_0_add_op_o;
reg_5 <= comp_1_mul_op_o;
reg_1 <= BUS_DONNEES_1_fir8;
reg_3 <= BUS_DONNEES_2_fir8;
reg_11 <= comp_0_add_op_o;
reg_11 <= comp_0_add_op_o;
reg_8 <= comp_1_mul_op_o;
reg_11 <= comp_0_add_op_o;
reg_1 <= BUS_DONNEES_1_fir8;
comp_1_mul_op_a <= reg_1;
comp_1_mul_op_b <= reg_0;
comp_1_mul_op_a <= reg_1;
comp_1_mul_op_b <= reg_0;
comp_1_mul_op_a <= reg_1;
comp_1_mul_op_b <= reg_3;
comp_1_mul_op_a <= reg_1;
The method I used to do the extraction is finding out all sentences that contains the key word “reg”. The reason why I don't use this means in the first
step is because apart from the case bodies there are many other sentences
in the original VHDL file that contain word “reg”, many useless information
would be extracted in the meanwhile.
3. Convert The Extracted Contents to DOT Sentence Pattern
In DOT program, the relationship between 2 nodes is expressed as “A -> B;”
which means a directed edge from node A to node B. Part of converted sentences is as follows:
12
comp_1_mul_op_o -> reg_8 ;
comp_0_add_op_o -> reg_11 ;
BUS_DONNEES_1_fir8 -> reg_1 ;
reg_1 -> comp_1_mul_op_a ;
reg_0 -> comp_1_mul_op_b ;
How I achieve this effect is simple, just switch the positions of names at different sides of sign “<=”, then change the sign “<=” to “->”.
4. Delete Redundant Lines
The reason of adding this part is clear, because in the original VHDL file, the
relationship
between
two
names,
for
example,
“reg_1
->
comp_1_mul_op_a ;”, may exist more than once, for this relationship may be
used in different time state. Then according to the method I provided in the
former steps, redundant lines may be extracted unnecessarily which may
make the coming steps difficult to be implemented. The simplified steering
logic is quite simple:
comp_1_mul_op_o -> reg_2 ;
BUS_DONNEES_1_fir8 -> reg_1 ;
BUS_DONNEES_2_fir8 -> reg_3 ;
comp_1_mul_op_o -> reg_5 ;
comp_1_mul_op_o -> reg_8 ;
comp_0_add_op_o -> reg_11 ;
reg_1 -> comp_1_mul_op_a ;
reg_0 -> comp_1_mul_op_b ;
reg_3 -> comp_1_mul_op_b ;
reg_8 -> comp_0_add_op_a ;
reg_5 -> comp_0_add_op_b ;
reg_11 -> comp_0_add_op_a ;
reg_2 -> comp_0_add_op_b ;
reg_11 -> comp_0_add_op_b ;
reg_11 -> BUS_DONNEES_2_fir8 ;
Above are all relationships that are needed to draw the steering logic for a
minimum area scheme.
5. Construct MUX Structures and Place MUXs into Graph (1 st and 3rd
levels are registers and operators)
This step may be the trickiest one. First I need to know how many MUXs are
needed. Through observing the result in Step 4, actually, each relationship
requires one MUX (except the last one in which data is transferred to BUS).
According to the experience from homework 1 in Synthesis and Verification
of Digital Systems, the contents at the left of sign “->” can stay as they are
because those are indeed where data come from, however, the contents at
the right side need to be replaced with MUX. But things are not as easy as
just replacing them with corresponding MUX, for the reason that there are
13
data in and out of MUX, together with control signals, if simply adding MUX
to the picture, the structure of the whole picture will be in a mess and make
the picture hard to read and to understand. Through studying the relevant
contents in DOT tutorial, I solve this problem by establishing a structure for
each MUX. In DOT language, a structure is a rectangle whose area can be
split into several subparts with literal notations. One example is as follows:
Fig.14 A Record Structure Example
I split one MUX structure into 3 subparts: input, output and control signals.
For a typical steering logic, the MUX for register receives input data from operators or BUS and forwards data to corresponding register.
Fig.15 Register MUX
The MUX for operator receives data from registers and forwards data to corresponding operator input (there are two inputs for an operator so two MUXs
for one operator).
Fig.16 Operator MUX
Both of two kinds of registers have control signals connected. By consulting
with Jiajun who handle logic extraction part, we reach an agreement, assign
3 control signals to each MUX. I only need to know which signals control
which MUX, I don't need to ascertain how the signals control the MUX. This
resolves the state identification problem we stated in Step 2.
The corresponding DOT language to describe a MUX structure is like:
“struct_MUX_reg_2[shape=record,color=red,label="b1,b2,b3|
{<up>MUX_reg_2|<down>out}"];”. The process of choosing control sig-
14
nal is also very interesting. We scan the file in Step 4 from top to bottom, for
each part by the right side of sign “->” in one line that never existed before,
I assign 3 control signals for its corresponding MUX in sequence. Thus Jiajun
and me can reach the agreement on which signals control which MUX. The
connection for one MUX is as follows:
Fig.17 Connection of One MUX
From Fig.17 we can see the data flow which is related with Reg_5 (in the middle of the first level) clearly. For each MUX we can see one line goes from the
bottom of itself to the top, this is done in case there is no new data into MUX
and meanwhile it can keep original value, when this happens, the control signal for it is “000”.
6. Add DOT subgraphs for operators
One operator has 3 ports, 2 for inputs and one for output. I make use of the
subgraph format of DOT language and construct one subgraph for each operator. An example for the subgraph description is like “subgraph
cluster_comp_1_mul_op{comp_1_mul_op_a->comp_1_mul_op_o;
15
comp_1_mul_op_b->comp_1_mul_op_o;}” and the corresponding graph is as
follows:
Fig.18 Operator Subgraph
7. Rearrange DOT Graph
The steering logic graph achieved by the above steps still have some problems, one of them is that the layout is not very methodical. The registers
sometimes appear at the first level, sometimes the fourth, the same problem
exists with other components, like operators and MUXs. To avoid this ambiguity, I use DOT method “rank” to arrange the position of different components in steering logic. The concrete working method is :
“{rank=same;reg_1;reg_0;reg_8;reg_5;reg_11;}
{rank=sink;struct_MUX_reg_2;struct_MUX_reg_1;struct_MUX_reg_3;struct_M
UX_reg_5;struct_MUX_reg_8;struct_MUX_reg_11;}”
In this way, MUXs are spread in the second and fourth level.
8. Reduce Repeated Lines
When constructing MUX structure and subgraphs, redundant lines may be
created because every line is scanned in the process and some names may
appear in parts of different lines for multiple times.
The final DOT graph generated for the minimum area scheme is as follows:
16
Fig.19 The Final Dot Graph for Minimum Area
UP to now, the flow seems to reach the end, but there are some flaws here.
Firstly, this program is not so easy to be transplanted from one system to another. It is written totally according to the VHDL file that is generated by
GAUT Synthesis, by totally, I mean even the whitespaces and names given to
different components in the file play an important role in determine whether
the program will work or not. Secondly, the layout of the final picture still has
room to improve, for example, the operators take too much room in the
graph, a trapezoid-shaped MUX may make sense much better. But these are
related with DOT language functions which are not so hard to solve.
17