卷积函数的FPGA实现（四）函数接口的HLS

小鱼儿 2022-04-08 10:38 226阅读 0赞

**背景：**编写好IPcore并且验证通过，但是接口需要进行HLS。

**目的：**将卷积IPcore接口进行HLS，将权重输入输出同步为DRAM的地址，axi-stream协议进行传输数据。将神经网络参数通过axi-lite协议进行传输。

**参考：**

用IPcore调用DDR3相关知识 [https://blog.csdn.net/weixin\_36474809/article/details/81018040][https_blog.csdn.net_weixin_36474809_article_details_81018040]

AXI-Lite实现PS与PL通信 [https://blog.csdn.net/weixin\_36474809/article/details/81206660][https_blog.csdn.net_weixin_36474809_article_details_81206660]

FPGA实践教程（五）PS用MIG调用DDR [https://blog.csdn.net/weixin\_36474809/article/details/80997945\#%E4%BA%94%E3%80%81SDK][https_blog.csdn.net_weixin_36474809_article_details_80997945_E4_BA_94_E3_80_81SDK]

ARM用MIG调用DDR3的c程序解析 [https://blog.csdn.net/weixin\_36474809/article/details/81012267][https_blog.csdn.net_weixin_36474809_article_details_81012267]

FPGA实践教程（七）运用IPcore调用DDR [https://blog.csdn.net/weixin\_36474809/article/details/84942607][https_blog.csdn.net_weixin_36474809_article_details_84942607]

UG1037 (v4.0) July 15, 2017 , AXI Reference guide

**目录**

一、参考部分的接口

1.1 axi-lite

1.2 m\_axi

二、添加指令

2.1 需要传递的参数（参考）

2.2 IPcore的参数传入（参考）

2.3 加入volatile指令

2.4 传入参数更改

2.5 最终执行的接口HLS

三、进行HLS

四、 必须有return值

--------------------

# 一、参考部分的接口 #

原接口输入格式为结构体的格式，其参数包含了网络参数也包含DRAM上的指针，所以难以进行接口HLS，我们需要将DRAM指针与网络参数分开传入卷积。

## 1.1 axi-lite ##

void AxiLiteTest(int * tenNum, int * oneNum, int * outNum)
    {
    #pragma HLS INTERFACE s_axilite port=outNum
    #pragma HLS INTERFACE s_axilite port=oneNum
    #pragma HLS INTERFACE s_axilite port=tenNum

直接进行axi-lite即可，port表示进行axi-lite接口的变量，bundle表示一批，其他内容均在这一批之下。

int migTester(int size, volatile int *migPtr ,int totalNumDDR){
    #pragma HLS INTERFACE s_axilite port=totalNumDDR
    #pragma HLS INTERFACE s_axilite port=return
    #pragma HLS INTERFACE m_axi depth=512 port=migPtr offset=slave
    #pragma HLS INTERFACE s_axilite port=size

unsigned int memDDR3Tester(unsigned int start, unsigned int size,
    		unsigned int mode, unsigned int data, 
    		volatile unsigned int *memPtr, unsigned int *expectedVal, 
    		unsigned int *failedAddr, unsigned int *numErrors)
    {
    #pragma HLS INTERFACE s_axilite port=numErrors bundle=CRTL_BUS
    #pragma HLS INTERFACE s_axilite port=failedAddr bundle=CRTL_BUS
    #pragma HLS INTERFACE s_axilite port=expectedVal bundle=CRTL_BUS
    #pragma HLS INTERFACE s_axilite port=start bundle=CRTL_BUS
    #pragma HLS INTERFACE m_axi depth=512 port=memPtr offset=slave
    #pragma HLS INTERFACE s_axilite port=data bundle=CRTL_BUS
    #pragma HLS INTERFACE s_axilite port=mode bundle=CRTL_BUS
    #pragma HLS INTERFACE s_axilite port=size bundle=CRTL_BUS
    #pragma HLS INTERFACE s_axilite port=return bundle=CRTL_BUS

void fpga_top(layer_t layer, data_t *SHARED_DRAM, unsigned int weights_offset,
                  weightaddr_t num_weights, unsigned int input_offset) {
    #pragma HLS INTERFACE m_axi depth = DRAM_DEPTH port = SHARED_DRAM offset = \
        slave bundle = memorybus register
    #pragma HLS INTERFACE s_axilite port = layer bundle = axilite  register
    #pragma HLS INTERFACE s_axilite port = num_weights bundle = axilite  register
    #pragma HLS INTERFACE s_axilite port = weights_offset bundle = axilite  register
    #pragma HLS INTERFACE s_axilite port = input_offset bundle = axilite  register
    #pragma HLS INTERFACE s_axilite port = return bundle = axilite  register

关于register的参数设置暂不深究，后续需要查找文档找axi接口的相关问题。UG1037 (v4.0) July 15, 2017

所以我们实现卷积时候需要设置axi-lite下面这些内容：

*  INTERFACE s\_axilite
 *  port设置为相应的函数参量
 *  bundle表示同一批
 *  register格式

## 1.2 m\_axi ##

此接口协议为IPcore与DRAM之间通过axi协议进行通信，前缀m表示IPcore为主，控制DDR。

下面为我们在HLS里面自己添加指令得出的预编译源码。

#pragma HLS INTERFACE m_axi depth=512 port=weightIn->pdata offset=slave bundle=memorybus

depth我们不太清楚含义，zynqNet之中，const int DRAM\_DEPTH = 5932576;较深。

offset=salve表示需要设置指针的偏移地址。

bundle表示一系列的线。

所以调用m\_axi需要的指令为：

*  INTERFACE m\_axi
 *  port=相应的函数输入参数
 *  depth，通信位宽？
 *  offset=salve
 *  budnle=memorybus

# 二、添加指令 #

## 2.1 需要传递的参数（参考） ##

此步因为涉及多指针的问题，后面舍弃掉了。

函数之中，需要用到axi-lite指令传递的参数为：

//current varable for loop
    int cur_channel_out,cur_channel_in,cur_row_out,cur_col_out;
    int filter_col,filter_row;
    //network parameters
    int stride = weightIn->stride;
    int kernelSize=weightIn->kernelSize,kernelSize_2D=weightIn->kernelSize*weightIn->kernelSize;//kernel
    
    
    //DRAM location offset variable
    int output_loc,weight_pre_loc,input_pre_loc,weight_loc,input_loc;
    //DRAM three variable pointer
    float* weight_ptr=weightIn->pdata;float *input_ptr=pboxIn->pdata;float *output_ptr=outpBox->pdata;
    
    
    layer_setup:{
    	MemoryController::setLayerConfig(weightIn,pboxIn,outpBox);
    	ImageCache::setLayerConfig(weightIn,pboxIn);
    	WeightsCache::setLayerConfig(weightIn);
    };

其中涉及的结构体：

struct Weight
    {
        mydataFmt *pdata;
        mydataFmt *pbias;
        int out_ChannelNum;
        int in_ChannelNum;
        int kernelSize;
        int stride;
        int leftPad;
        int rightPad;
    };
    
    struct pBox
    {
    	mydataFmt *pdata;
    	int width;
    	int height;
    	int channel;
    };

为后续实现方便，我们一次性将所以的参数均用axilite协议传入FPGA

## 2.2 IPcore的参数传入（参考） ##

此步设计多指针的问题，后面舍弃掉了。

//----------------convolution in FPGA-----------------------------------
    void convolution_3x3(const Weight *weightIn, const pBox *pboxIn, pBox *outpBox){
    //axilite interface	
    #pragma HLS INTERFACE s_axilite register port=weightIn->out_ChannelNum bundle=axilite
    #pragma HLS INTERFACE s_axilite register port=weightIn->in_ChannelNum bundle=axilite
    #pragma HLS INTERFACE s_axilite register port=weightIn->kernelSize bundle=axilite
    #pragma HLS INTERFACE s_axilite register port=weightIn->stride bundle=axilite
    #pragma HLS INTERFACE s_axilite register port=weightIn->leftPad bundle=axilite
    #pragma HLS INTERFACE s_axilite register port=weightIn->rightPad bundle=axilite //weight
    #pragma HLS INTERFACE s_axilite register port=pboxIn->width bundle=axilite
    #pragma HLS INTERFACE s_axilite register port=pboxIn->height bundle=axilite
    #pragma HLS INTERFACE s_axilite register port=pboxIn->channel bundle=axilite  //pboxIn
    #pragma HLS INTERFACE s_axilite register port=outpBox->width bundle=axilite
    #pragma HLS INTERFACE s_axilite register port=outpBox->height bundle=axilite
    #pragma HLS INTERFACE s_axilite register port=outpBox->channel bundle=axilite  //outpBox
    //m_axi interface
    #pragma HLS INTERFACE m_axi depth=512 port=weightIn->pdata offset=slave bundle=memorybus
    #pragma HLS INTERFACE m_axi depth=512 port=pboxIn->pdata offset=slave bundle=memorybus
    #pragma HLS INTERFACE m_axi depth=512 port=outpBox->pdata offset=slave bundle=memorybus

按照上面的语句，实现相应的预编译语句

## 2.3 加入volatile指令 ##

[https://baike.baidu.com/item/volatile/10606957?fr=aladdin][https_baike.baidu.com_item_volatile_10606957_fr_aladdin]

这是c代码之中的volatile指令，加volatile指令用于告诉编译器volatile修饰的值要求每次直接读值。

DDR上的调用需要在变量前加入volatile的语句。我们先不加进行实验。发现依然是两个报错，

*  ERROR: \[SYNCHK 200-11\] src/fpgaAcc.cpp:259: Argument 'weightIn.pdata' of function 'convolution\_3x3' (src/fpgaAcc.cpp:45) has an unsynthesizable type (possible cause(s): pointer to pointer or global pointer).
 *  weightIn.pdata这个包含着不能被HLS综合的类型，例如指针指向的指针，或者全局变量指针。
 *  ERROR: \[SYNCHK 200-61\] src/fpgaAcc.cpp:174: unsupported memory access on variable 'weightIn.pdata' which is (or contains) an array with unknown size at compile time.
 *  weightIn.pdata是一个（或者包含）不知大小的数组。

所以我们需要加入volatile指令来指定相应的接口类型。

**加入的位置：**更改过程之中，编译器会大量报错，按照编译器的报错依次更令。主要更改为加入强制类型转换。

*  pBox.h之中，weight与pbox的结构体的变量需要变为volatile float
 *  network.cpp与.h之中，addbias与prelu函数的输入参数，所有函数
 *  mtcnn.cpp之中，与memset相关的，memcpy，和fread
 *  initconvandfc，initprelu之中
 *  fpgaAcc之中，巨大量的需要更改。

## 2.4 传入参数更改 ##

传入参数为指针型的结构体，相对复杂，经过HLS实验之后发现此结构体HLS难以编译，所以我们需要对此输入函数进行更改。

神经网络实现于FPGA的难点就是牵一发而动全身。每更改一个变量，就需要把所有相关的变量均进行更改。

void convolution_3x3(int inHight,int inWidth,int inChanNum,int outHight,int outWidth,int OutChanNum,
    			int stride,
    			volatile float *weight_ptr,volatile float *input_ptr,volatile float *output_ptr)

先在fpga.cpp之中更改成功，然后HLS testbench更改通过，

//conv in PL
    	convolution_3x3(featureIn.height, featureIn.width ,featureIn.channel,
    						 conv_PL_out.height,conv_PL_out.width,conv_PL_out.channel,
    						 weightIn.stride,
    						 weightIn.pdata, featureIn.pdata,conv_PL_out.pdata);

然后更改mtcnn.cpp之中的代码，在mtcnn之中也更改通过。需要将所有的conv3\*3换为这个函数。

其中所有设计3\*3卷积的函数均改为这个形式。

convolution_3x3(this->pooling1_out->height,this->pooling1_out->width,this->pooling1_out->channel,
    					this->conv2_out->height,this->conv2_out->width,this->conv2_out->channel,
    					this->conv2_wb->stride,
    					this->conv2_wb->pdata,this->pooling1_out->pdata,this->conv2_out->pdata);

大量更改之后嵌套入原程序执行成功。

## 2.5 最终执行的接口HLS ##

//----------------convolution in FPGA-----------------------------------
    void convolution_3x3(int inHight,int inWidth,int inChanNum,int outHight,int outWidth,int OutChanNum,
    					 int stride,
    					 volatile float *weight_ptr,volatile float *input_ptr,volatile float *output_ptr){
    #pragma HLS INTERFACE s_axilite register port=inHight bundle=axilite
    #pragma HLS INTERFACE s_axilite register port=inWidth bundle=axilite
    #pragma HLS INTERFACE s_axilite register port=inChanNum bundle=axilite
    #pragma HLS INTERFACE s_axilite register port=outHight bundle=axilite
    #pragma HLS INTERFACE s_axilite register port=outWidth bundle=axilite
    #pragma HLS INTERFACE s_axilite register port=OutChanNum bundle=axilite
    #pragma HLS INTERFACE s_axilite register port=stride bundle=axilite
    #pragma HLS INTERFACE m_axi depth=DRAM_DEPTH port=weight_ptr offset=slave bundle=memorybus
    #pragma HLS INTERFACE m_axi depth=DRAM_DEPTH port=input_ptr offset=slave bundle=memorybus
    #pragma HLS INTERFACE m_axi depth=DRAM_DEPTH port=output_ptr offset=slave bundle=memorybus

参数直接通过s\_axilite协议传入，运用register，bundle设为

# 三、进行HLS #

程序在mtcnn主程序之中测试通过

然后再HLS-testBench之中测试通过

在接口之中测试通过

Starting C synthesis ...
    /mnt/workspace/Xilinx/Vivado/2017.4/bin/vivado_hls /home/osrc/Desktop/document/conv_Core/HLS_Conv/conv3x3_IPcore/solution1/csynth.tcl
    INFO: [HLS 200-10] Running '/mnt/workspace/Xilinx/Vivado/2017.4/bin/unwrapped/lnx64.o/vivado_hls'
    INFO: [HLS 200-10] For user 'osrc' on host 'osrc-virtual-machine' (Linux_x86_64 version 4.13.0-32-generic) on Tue Dec 11 16:53:16 CST 2018
    INFO: [HLS 200-10] On os Ubuntu 16.04.3 LTS
    INFO: [HLS 200-10] In directory '/home/osrc/Desktop/document/conv_Core/HLS_Conv'
    INFO: [HLS 200-10] Opening project '/home/osrc/Desktop/document/conv_Core/HLS_Conv/conv3x3_IPcore'.
    INFO: [HLS 200-10] Adding design file 'src/fpgaAcc.cpp' to the project
    INFO: [HLS 200-10] Adding design file 'src/fpgaAcc.hpp' to the project
    INFO: [HLS 200-10] Adding design file 'src/pBox.cpp' to the project
    INFO: [HLS 200-10] Adding design file 'src/pBox.h' to the project
    INFO: [HLS 200-10] Adding test bench file 'src/test_convBench.cpp' to the project
    INFO: [HLS 200-10] Opening solution '/home/osrc/Desktop/document/conv_Core/HLS_Conv/conv3x3_IPcore/solution1'.
    INFO: [SYN 201-201] Setting up clock 'default' with a period of 10ns.
    INFO: [HLS 200-10] Setting target device to 'xc7z035ffg676-2'
    INFO: [HLS 200-10] Analyzing design file 'src/pBox.cpp' ...
    INFO: [HLS 200-10] Analyzing design file 'src/fpgaAcc.cpp' ...
    INFO: [HLS 200-10] Validating synthesis directives ...
    INFO: [HLS 200-111] Finished Checking Pragmas Time (s): cpu = 00:00:42 ; elapsed = 00:01:18 . Memory (MB): peak = 361.637 ; gain = 13.375 ; free physical = 337 ; free virtual = 32673
    INFO: [HLS 200-111] Finished Linking Time (s): cpu = 00:00:44 ; elapsed = 00:01:20 . Memory (MB): peak = 361.637 ; gain = 13.375 ; free physical = 335 ; free virtual = 32673
    INFO: [HLS 200-10] Starting code transformations ...
    INFO: [XFORM 203-603] Inlining function 'MemoryController::setLayerConfig' into 'convolution_3x3' (src/fpgaAcc.cpp:77).
    INFO: [XFORM 203-603] Inlining function 'ImageCache::setLayerConfig' into 'convolution_3x3' (src/fpgaAcc.cpp:78).
    INFO: [XFORM 203-603] Inlining function 'WeightsCache::setLayerConfig' into 'convolution_3x3' (src/fpgaAcc.cpp:79).
    INFO: [XFORM 203-603] Inlining function 'WeightsCache::get_WBRAM_addr' into 'WeightsCache::get_9_weights_to_buffer' (src/fpgaAcc.cpp:307).
    INFO: [XFORM 203-603] Inlining function 'WeightsCache::get_WBRAM_addr' into 'WeightsCache::load_WBRAM_from_DRAM' (src/fpgaAcc.cpp:284).
    INFO: [XFORM 203-603] Inlining function 'MemoryController::load_weight_2_reg' into 'WeightsCache::load_WBRAM_from_DRAM' (src/fpgaAcc.cpp:291).
    INFO: [XFORM 203-603] Inlining function 'WeightsCache::load_WBRAM_from_DRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:83).
    INFO: [XFORM 203-603] Inlining function 'MemoryController::setPixelLoadRowOffset' into 'convolution_3x3' (src/fpgaAcc.cpp:94).
    INFO: [XFORM 203-603] Inlining function 'MemoryController::setPixelLoadRowOffset' into 'convolution_3x3' (src/fpgaAcc.cpp:87).
    INFO: [XFORM 203-603] Inlining function 'MemoryController::setPixelLoadRowOffset' into 'convolution_3x3' (src/fpgaAcc.cpp:85).
    INFO: [XFORM 203-603] Inlining function 'MemoryController::setPixelLoadOffset' into 'ImageCache::loadRowDRAM_2_IBRAM' (src/fpgaAcc.cpp:330).
    INFO: [XFORM 203-603] Inlining function 'MemoryController::loadInputChannelPixel' into 'ImageCache::loadPixelDRAM_2_IBRAM' (src/fpgaAcc.cpp:339).
    INFO: [XFORM 203-603] Inlining function 'ImageCache::loadPixelDRAM_2_IBRAM' into 'ImageCache::loadRowDRAM_2_IBRAM' (src/fpgaAcc.cpp:331).
    INFO: [XFORM 203-603] Inlining function 'ImageCache::loadRowDRAM_2_IBRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:95).
    INFO: [XFORM 203-603] Inlining function 'ImageCache::loadRowDRAM_2_IBRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:88).
    INFO: [XFORM 203-603] Inlining function 'ImageCache::loadRowDRAM_2_IBRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:86).
    INFO: [XFORM 203-603] Inlining function 'MemoryController::setPixelOutOffset' into 'convolution_3x3' (src/fpgaAcc.cpp:99).
    INFO: [XFORM 203-603] Inlining function 'ImageCache::calcu_IBRAM_row_offset' into 'ProcessingElement::loadPixel_buffer' (src/fpgaAcc.cpp:209).
    INFO: [XFORM 203-603] Inlining function 'ImageCache::get_IBRAM_Pixel' into 'ProcessingElement::loadPixel_buffer' (src/fpgaAcc.cpp:213).
    INFO: [XFORM 203-603] Inlining function 'ProcessingElement::loadPixel_buffer' into 'ProcessingElement::processInputChannel' (src/fpgaAcc.cpp:230).
    INFO: [XFORM 203-603] Inlining function 'WeightsCache::get_9_weights_to_buffer' into 'ProcessingElement::processAll_channelOut' (src/fpgaAcc.cpp:247).
    INFO: [XFORM 203-603] Inlining function 'ProcessingElement::macc2d' into 'ProcessingElement::processAll_channelOut' (src/fpgaAcc.cpp:249).
    INFO: [XFORM 203-603] Inlining function 'OutputCache::setOutChannel' into 'OutputCache::accumulateChannel' (src/fpgaAcc.cpp:384).
    INFO: [XFORM 203-603] Inlining function 'OutputCache::setOutChannel' into 'ProcessingElement::processAll_channelOut' (src/fpgaAcc.cpp:252).
    INFO: [XFORM 203-603] Inlining function 'OutputCache::getOutChannel' into 'OutputCache::accumulateChannel' (src/fpgaAcc.cpp:382).
    INFO: [XFORM 203-603] Inlining function 'OutputCache::accumulateChannel' into 'ProcessingElement::processAll_channelOut' (src/fpgaAcc.cpp:254).
    INFO: [XFORM 203-603] Inlining function 'MemoryController::writeBackOutputChannel' into 'convolution_3x3' (src/fpgaAcc.cpp:109).
    INFO: [HLS 200-111] Finished Standard Transforms Time (s): cpu = 00:00:45 ; elapsed = 00:01:22 . Memory (MB): peak = 361.922 ; gain = 13.660 ; free physical = 324 ; free virtual = 32664
    INFO: [HLS 200-10] Checking synthesizability ...
    INFO: [XFORM 203-602] Inlining function 'ImageCache::writeNextChannelPixel_2_IBRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:340->src/fpgaAcc.cpp:331->src/fpgaAcc.cpp:86) automatically.
    INFO: [HLS 200-111] Finished Checking Synthesizability Time (s): cpu = 00:00:46 ; elapsed = 00:01:22 . Memory (MB): peak = 361.922 ; gain = 13.660 ; free physical = 320 ; free virtual = 32661
    INFO: [XFORM 203-502] Unrolling all sub-loops inside loop 'L_CH_OUT' (src/fpgaAcc.cpp:241) in function 'ProcessingElement::processAll_channelOut' for pipelining.
    INFO: [XFORM 203-501] Unrolling loop 'L_CH_OUT' (src/fpgaAcc.cpp:241) in function 'ProcessingElement::processAll_channelOut' partially with a factor of 8.
    INFO: [XFORM 203-501] Unrolling loop 'Loop-1.1' (src/fpgaAcc.cpp:308) in function 'ProcessingElement::processAll_channelOut' completely.
    INFO: [XFORM 203-501] Unrolling loop 'L_MACC_multiply' (src/fpgaAcc.cpp:190) in function 'ProcessingElement::processAll_channelOut' completely.
    INFO: [XFORM 203-501] Unrolling loop 'L_MACC_accumulate' (src/fpgaAcc.cpp:195) in function 'ProcessingElement::processAll_channelOut' completely.
    INFO: [XFORM 203-101] Partitioning array 'pixel_buffer' (src/fpgaAcc.cpp:228) in dimension 1 completely.
    INFO: [XFORM 203-101] Partitioning array 'weights_local' (src/fpgaAcc.cpp:244) in dimension 1 completely.
    INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM'  in dimension 1 completely.
    INFO: [XFORM 203-101] Partitioning array 'multresult' (src/fpgaAcc.cpp:187) in dimension 1 completely.
    INFO: [XFORM 203-101] Partitioning array 'OutputCache::OBRAM'  in dimension 1 with a cyclic factor 8.
    INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.0'  in dimension 2 completely.
    INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.1'  in dimension 2 completely.
    INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.2'  in dimension 2 completely.
    INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.3'  in dimension 2 completely.
    INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.4'  in dimension 2 completely.
    INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.5'  in dimension 2 completely.
    INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.6'  in dimension 2 completely.
    INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.7'  in dimension 2 completely.
    INFO: [XFORM 203-602] Inlining function 'ImageCache::writeNextChannelPixel_2_IBRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:340->src/fpgaAcc.cpp:331->src/fpgaAcc.cpp:86) automatically.
    INFO: [XFORM 203-622] Instantiating function 'ProcessingElement::processInputChannel'(src/fpgaAcc.cpp:221) to 'ProcessingElement::processInputChannel.0' at call site (src/fpgaAcc.cpp:103) by setting 'cur_ci' to 'cur_channel_in'.
    INFO: [XFORM 203-721] Changing loop 'Loop_load_pixel_2_PE_row_loop_proc' (src/fpgaAcc.cpp:207) to a process function for dataflow in function 'ProcessingElement::processInputChannel.0'.
    INFO: [XFORM 203-712] Applying dataflow to function 'ProcessingElement::processInputChannel.0' (src/fpgaAcc.cpp:224:1), detected/extracted 2 process function(s): 
    	 'ProcessingElement::processInputChannel.0_Loop_load_pixel_2_PE_row_loop_proc5'
    	 'ProcessingElement::processAll_channelOut'.
    INFO: [HLS 200-111] Finished Pre-synthesis Time (s): cpu = 00:00:49 ; elapsed = 00:01:25 . Memory (MB): peak = 489.633 ; gain = 141.371 ; free physical = 291 ; free virtual = 32635
    WARNING: [XFORM 203-542] Cannot flatten a loop nest 'Loop-1.1' (src/fpgaAcc.cpp:283:18) in function 'convolution_3x3' : 
    
    
    the outer loop is not a perfect loop.
    WARNING: [XFORM 203-542] Cannot flatten a loop nest 'Loop-1' (src/fpgaAcc.cpp:280:18) in function 'convolution_3x3' : 
    
    
    the outer loop is not a perfect loop.
    WARNING: [XFORM 203-542] Cannot flatten a loop nest 'L_DRAM_PRELOADROW_X' (src/fpgaAcc.cpp:329:77) in function 'convolution_3x3' : 
    
    
    the outer loop is not a perfect loop.
    WARNING: [XFORM 203-542] Cannot flatten a loop nest 'L_DRAM_PRELOADROW_X' (src/fpgaAcc.cpp:329:77) in function 'convolution_3x3' : 
    
    
    the outer loop is not a perfect loop.
    WARNING: [XFORM 203-542] Cannot flatten a loop nest 'L_DRAM_PRELOADROW_X' (src/fpgaAcc.cpp:329:77) in function 'convolution_3x3' : 
    
    
    the outer loop is not a perfect loop.
    WARNING: [XFORM 203-542] Cannot flatten a loop nest 'Loop-4.1' (src/fpgaAcc.cpp:93:3) in function 'convolution_3x3' : 
    
    
    the outer loop is not a perfect loop.
    WARNING: [XFORM 203-542] Cannot flatten a loop nest 'Loop-4' (src/fpgaAcc.cpp:91:6) in function 'convolution_3x3' : 
    
    
    more than one sub loop.
    WARNING: [XFORM 203-631] Renaming function 'ProcessingElement::processInputChannel.0_Loop_load_pixel_2_PE_row_loop_proc5' to 'processInputChannel.' (src/fpgaAcc.cpp:207:3)
    WARNING: [XFORM 203-631] Renaming function 'ProcessingElement::processInputChannel.0' to 'processInputChannel..1' (src/fpgaAcc.cpp:226:1)
    WARNING: [XFORM 203-631] Renaming function 'ProcessingElement::processAll_channelOut' to 'processAll_channelOu' (src/fpgaAcc.cpp:192:43)
    INFO: [XFORM 203-811] Inferring bus burst read of variable length on port 'memorybus' (src/fpgaAcc.cpp:178:15).
    WARNING: [XFORM 203-562] Loop 'L_CH_OUT' (src/fpgaAcc.cpp:241) in function 'processAll_channelOu' has unknown bound because it has multiple exiting blocks.
    WARNING: [XFORM 203-713] Function 'processInputChannel..1' (src/fpgaAcc.cpp:226:1) failed dataflow checking:  A dataflow region cannot be instantiated from with a pipelined loop  (src/fpgaAcc.cpp:226:1). Ignoring pipeline directive to allow the dataflow directive to take precedence. This behavior can be disabled by using 'config_compile -disable_dataflow_pipeline_check'.
    Instruction does not dominate all uses!
      %tmp_57 = add i32 %WeightsCache_inChan_1, %tmp_56
      %memorybus_addr_rd_re = call i1 @_ssdm_op_ReadReq.m_axi.floatP(float* %memorybus_addr, i32 %tmp_57), !dbg !1031
    Broken module found, compilation aborted!
    Stack dump:
    0.	Running pass 'Function Pass Manager' on module '/home/osrc/Desktop/document/conv_Core/HLS_Conv/conv3x3_IPcore/solution1/.autopilot/db/a.o.2.bc'.
    1.	Running pass 'Module Verifier' on function '@convolution_3x3'
    /mnt/workspace/Xilinx/Vivado/2017.4/bin/loader: line 194: 13582 Aborted                 (core dumped) "$RDI_PROG" "$@"
    Finished C synthesis.

虽有其他报错，但是我们关于接口的问题已经调试通过。接口在IPcore端的HLS完成

# 四、 必须有return值 #

在进行FPGA测试时，发现一个bug，必须给程序加一个return值，否则无法判断IPcore是否完成。

所以我们需要将卷积加一个返回值。这样才会生成下面这样的驱动的函数：

while (!XMigtester_IsDone(&XMigtesterCore));
        result=XMigtester_Get_return(&XMigtesterCore);

所以我们将卷积加一个return值。

[https_blog.csdn.net_weixin_36474809_article_details_81018040]: https://blog.csdn.net/weixin_36474809/article/details/81018040
[https_blog.csdn.net_weixin_36474809_article_details_81206660]: https://blog.csdn.net/weixin_36474809/article/details/81206660
[https_blog.csdn.net_weixin_36474809_article_details_80997945_E4_BA_94_E3_80_81SDK]: https://blog.csdn.net/weixin_36474809/article/details/80997945#%E4%BA%94%E3%80%81SDK
[https_blog.csdn.net_weixin_36474809_article_details_81012267]: https://blog.csdn.net/weixin_36474809/article/details/81012267
[https_blog.csdn.net_weixin_36474809_article_details_84942607]: https://blog.csdn.net/weixin_36474809/article/details/84942607
[https_baike.baidu.com_item_volatile_10606957_fr_aladdin]: https://baike.baidu.com/item/volatile/10606957?fr=aladdin