时间序列预测系列4

叁歲伎倆 2022-11-28 15:09 170阅读 0赞

###  时间序列预测系列4 ###

##### 主要内容 #####

本文主要讲述双阶段注意力模型的实现，即从时间和空间两个维度，分别使用注意力模型。具体模型的详细内容，可以参考下面这篇论文。  
链接：https://pan.baidu.com/s/1UePtoCnYcKXbb7eTwqvNiw  
提取码：j1c6

本文专注于讲述如何用代码实现，对论文里的内容不做过多的解释。若想完全理解论文，需要多读几遍。

本文应该是时间序列预测系列中最难的一部分，需要花费一定的时间去阅读代码和论文。

##### 论文核心思想 #####

1.  空间注意力：每支股票每天的数据中，都含有多种因素（开盘价，收盘价，最高价，最低价…），之前我们的实验都是认为每种因素对预测值的影响相同，但实际上每种因素对预测值的影响应该是不同的，因此需要对这些因素使用注意力机制，不同的因素分配不同的权重。  
    其次，不同股票直接的有相互影响的，例如如果你要预测沪深300指数的收盘价，那么你可以将沪深300的一些成分股的收盘价当做驱动序列，分配给不同的成分股不同的权重系数（因为每支成分股对沪深300指数的影响是不同的）。
2.  时间注意力：正如第3篇中提到的，有T个时间步的信息，但每个时间步对最终预测值的影响应该是不同的，所以也应该使用注意力机制。

##### 数据集 #####

本次实验预测沪深300指数的收盘价，文件tg.csv中存储的就是它的收盘价数据，每行数据对应的日期已经丢失，每行数据代表一天的历史数据。数据保证是正确的。  
文件hs300.csv中存储的是67支成分股的收盘价历史数据，每行对应一天的数据，日期保证能与tg.csv中的数据对齐。实验时直接使用即可。

也就是说，本文没有考虑最高价，最低价，交易量等因素的影响。本文的空间注意力机制目的是给不同的成分股分配不同的权重。  
链接：https://pan.baidu.com/s/1qrr5PxWVNedukv4r6Nx6Kw  
提取码：je8h

链接：https://pan.baidu.com/s/1ydKUSQFL072j0mG4Z1V3GA  
提取码：c6rq

##### 数据预处理 #####

本次实验同前面相同，对数据进行标准化即可。

# 对数据进行标准化
    def normal(float_data):
        print(float_data.shape)
        mean = float_data.mean(axis=0)
        print(mean)
        float_data -= mean
        std = float_data.std(axis=0)
        print(std)
        float_data /= std
        return float_data
    
    # print(float_data[:,5])
    
    def get_data(data_path):
        f = open(data_path)
        data = f.read()
        f.close
        lines = data.split('\n')
        #print(len(lines))
        header = lines[0].split(',')
        lines = lines[1:]
        #print(len(lines))
        #print('size')
        print(len(header))
        float_data = np.zeros((len(lines),len(header)))
        for i,line in enumerate(lines):
            f=1
            #print(i)
            for j in line.split(','):
                if j == 'None':
                    f=0
                    break
            if i==len(lines)-1:
                break
            if f==1:
                tmp = [float(x) for x in line.split(',')]
                float_data[i]=tmp
        return float_data

##### 编码器 #####

空间注意力机制是在编码器中实现的。而实现它又分成两步，首先计算出attention\_weight，然后进行加权。

def one_encoder_attention_step(h_prev,s_prev,X):
        ''' :param h_prev: previous hidden state # LSTM的隐含层 :param s_prev: previous cell state # 记忆细胞状态 ,论文中的s :param X: (T,n),n is length of input series at time t,T is length of time series n代表维度,X代表输入的一个序列，步长为T，每一步的维度是n :return: x_t's attention weights,total n numbers,sum these are 1 '''
        concat = Concatenate()([h_prev,s_prev])  #(none,1,2m) 按最后一个维度进行拼接 m+m=2m
        result1 = en_densor_We(concat)   #(none,1,T) # 通过全连接层，等价于 w*[h_prev;s_prev]
        result1 = RepeatVector(X.shape[2],)(result1)  #(none,n,T) 扩展为n*T
        X_temp = My_Transpose(axis=(0,2,1))(X) #X_temp(None,n,T) 改变X的维度，也可使用permute((0,2,1))(X)
        result2 = Dense_no_bias(T)(X_temp)  # (none,n,T) * Ue(T,T) ==n*T
        result3 = Add()([result1,result2])  #(none,n,T) w*[h_prev;s_prev] + Ue*X 
        result4 = Activation(activation='tanh')(result3)  #(none,n,T)
        
        result5 = Dense_no_bias(1)(result4) # Ve(1*T)*(n,T) == 1*n 
        result5 = My_Transpose(axis=(0,2,1))(result5)
        print('result5 ',result5)
        alphas = Activation(activation='softmax')(result5)
    
        return alphas

上面计算权重的方式使用了相加的方式，而没有使用点积，论文中没有使用点积的方式，应该是为了避免点积导致数值过大或者过小。

def encoder_attention(T,X,s0,h0):
    
        s = s0
        h = h0
        print('s:', s)
        #initialize empty list of outputs
        attention_weight_t = None
        for t in range(T):
            print('X:', X)
            context = one_encoder_attention_step(h,s,X)  #(none,1,n)
            print('context:',context)
            x = Lambda(lambda x: X[:,t,:])(X) # 取出第t个时间步的数据
            x = Reshape((1,x.shape[1]))(x) # 设置维度为1*n
            print('x:',x) 
            h, _ , s = en_LSTM_cell(x, initial_state=[h, s]) 
            if t!=0: 
                print('attention_weight_t:',attention_weight_t) 
                #attention_weight_t= Merge(mode='concat', concat_axis=1)([attention_weight_t,context]) # 旧版本
                if t==T-1:
                    attention_weight_t  = Lambda(lambda x:  K.concatenate([x[0], x[1]], axis=1),name='attention_weight_local')([attention_weight_t, context])
                else:
                    attention_weight_t  = Lambda(lambda x:  K.concatenate([x[0], x[1]], axis=1))([attention_weight_t, context])
                # my_concat([attention_weight_t, context])
                #attention_weight_t = Concatenate(axis=1)([attention_weight_t, context]) # 新版本的Keras
                print(attention_weight_t)
                print('hello') 
            else:
                attention_weight_t = context
            print('h:', h)
            print('_:', _)
            print('s:', s)
            print('t', t)
            # break
    
        X_ = Multiply()([attention_weight_t,X]) # 获得各维度加权后的值 T*n
        print('return X:',X_)
        return X_

上述代码的意义很明确，每组数据有T个时间步，每个时间步都对应着67支成分股的收盘价，因此最后得到一个T*67的权重矩阵，然后与原数据X(T*67)进行加权即可。

建议参考论文中的公式和上述代码进行分析，尤其是注意分析各矩阵的维度。

##### 解码器 #####

一方面将沪深300的收盘价读取进来，一方面对来自编码器的T个时间步信息求权重系数。

def one_decoder_attention_step(h_de_prev,s_de_prev,h_en_all,t):
        ''' :param h_prev: previous hidden state :param s_prev: previous cell state :param h_en_all: (None,T,m),m is hidden size at time t,T is length of time series :return: x_t's attention weights,total T numbers,sum these are 1 '''
        print('h_en_all:',h_en_all)
        concat = Concatenate()([h_de_prev,s_de_prev])  #(None,1,2p)
        result1 = de_densor_We(concat)   #(None,1,m)
        result1 = RepeatVector(T)(result1)  #(None,T,m)
        result2 = Dense_no_bias(m)(h_en_all) # m*m dot (T,m)=T*m 或直接写 Dense(m)(h_en_all)
        print('result2:',result2)
        print('result1:',result1) 
        result3 = Add()([result1,result2])  #(None,T,m) 
        result4 = Activation(activation='tanh')(result3)  #(None,T,m) 
        result5 = Dense_no_bias(1)(result4) # 1*m dot T*m= 1*T 
        print('result5, ',result5.shape) 
        result5 = Reshape((1,result5.shape[1]))(result5)
        print('result5_new ',result5)
        if t==T-2:
            beta = Activation(activation='softmax' ,name = 'attention_weight_time')(result5)
        else:
            beta = Activation(activation='softmax')(result5) 
        beta = Reshape((beta.shape[2],1))(beta)
        context = Dot(axes = 1)([beta,h_en_all])  #(1,m) 将T个（1*m)的向量，按比例合并，最终为一个1*m。
        return context

def decoder_attention(T,h_en_all,Y,s0,h0):
        s = s0
        h = h0
        for t in range(T-1):  
            print(t)
            y_prev = Lambda(lambda y_prev: Y[:, t, :])(Y) # Y是输入的预测序列
            y_prev = Reshape((1, y_prev.shape[1]))(y_prev)   # (None,1,1) ，代码修改点
            print('y_prev:',y_prev) 
            context = one_decoder_attention_step(h,s,h_en_all,t)  #(None,1,20)
            y_prev = Concatenate(axis=2)([y_prev,context])   #(None,1,21) 
            print('y_prev:',y_prev)
            y_prev = Dense(1)(y_prev)       #(None,1,1) w [y;c] 
            print('y_prev:',y_prev)
            h, _, s = de_LSTM_cell(y_prev, initial_state=[h, s])
            print('h:', h)
            print('_:', _)
            print('s:', s)
    
        context = one_decoder_attention_step(h, s, h_en_all,T-1) # Ct 1*m 
        return h,context # h === 最后一个隐含层

总的来说，这篇论文一开始我觉得很强，但后面真正理解以后，我发现这大概只是为了水论文而写的。

这篇文章写的有些简略，因为很多分析都不好写，需要你看着论文来理解。  
后续若发现问题可以在评论区评论。  
下面把完整代码放上来，两种格式的代码。其中.ipnb里面可以看到运行的结果。  
链接：https://pan.baidu.com/s/1QxSclUxTW1T0FRoo5Rz3hg  
提取码：hlpn