时间序列模型 ARIMA ゝ一世哀愁。 2022-12-28 05:30 129阅读 0赞 **ARIMA模型**(英语:**A**uto**r**egressive **I**ntegrated **M**oving **A**verage model),差分整合移动平均自回归模型,又称整合移动平均自回归模型(移动也可称作滑动),是[时间序列][Link 1]预测分析方法之一。ARIMA(p,d,q)中,AR是“自回归”,p为自回归项数;MA为“滑动平均”,q为滑动平均项数,d为使之成为平稳序列所做的差分次数(阶数)。“差分”一词虽未出现在ARIMA的英文名称中,却是关键步骤。 ### statsmodels.tsa.arima\_model包中有ARIMA集成好的模型,我们只需要输入p,d,q即可。 ### 数据为纽约市的交通进出情况(一个txt进,一个txt出),然后已知一年365天\*24小时的数据,想用ARIMA来预测,并计算MAE和RMSE来评估预测准确性。 我的data\_prepare()函数是用来拼接两个文件夹里的数据的,一般只要读取一个文件夹的数据即可,返回是一个大矩阵。 其中取了数据集的前66%做训练集,后34%做测试集,与预测的结果做对比。 res()函数是输入测试矩阵和预测矩阵,计算MAE和RMSE来评估预测准确性的。 main中写的三个数组是打算遍历p,d,q找到最优值的,但是我电脑跑的太慢了,最后直接取了0,1,0。 import warnings import pandas as pd import numpy as np from sklearn.metrics import mean_squared_error from sklearn.metrics import mean_absolute_error from statsmodels.tsa.arima_model import ARIMA def data_prepare(): matrix = [] file1 = "tensor_year_hour_lease.txt" file2 = "tensor_year_hour_return.txt" f1 = open(file1, "r") f2 = open(file2, "r") matrix1 = [] lines1 = f1.readlines() for line in lines1: arr = line.split(",") arr = np.array(arr, dtype=int) matrix1.append(arr) f1.close() matrix2 = [] lines2 = f2.readlines() for line in lines2: arr = line.split(",") arr = np.array(arr, dtype=int) matrix2.append(arr) f2.close() matrix = np.hstack((matrix1, matrix2)) # 拼接成功 输出(8760*188) return matrix def evaluate_arima_model(X, arima_order): # 数据集的前66%作为训练集,后34%作为测试集 train_size = int(len(X) * 0.66) # print("train_size",train_size) train, test = X[0:train_size], X[train_size:] history = [x for x in train] # make predictions predictions = list() for t in range(len(test)): model = ARIMA(history, order=arima_order) model_fit = model.fit(disp=0) yhat = model_fit.forecast()[0] predictions.append(yhat) history.append(test[t]) predictions = np.array(predictions) # print(predictions.shape) return predictions def res(test, predictions): mae = mean_absolute_error(test, predictions) mse = mean_squared_error(test, predictions) rmse = mse ** 0.5 return mae, rmse def evaluate_models(p_values, d_values, q_values): matrix = data_prepare() train_size = int(len(matrix) * 0.66) test = matrix[train_size:] best_mae, best_rmse, best_cfg = float("inf"), float("inf"), None for p in p_values: for d in d_values: for q in q_values: pre = np.zeros((len(matrix)-train_size, 0)) for i in range(0, 188): # 188是列数 dataset = matrix[:, i] dataset = dataset.astype('float32') order = (p, d, q) predictions = evaluate_arima_model(dataset, order) pre = np.hstack((pre, predictions)) # 每一列做一次预测,然后拼接成矩阵 print("p, d, q, i:", p, d, q, i) # print(pre.shape) # print(pre) mae, rmse = res(test, pre) if mae < best_mae: best_mae, best_rmse, best_cfg = mae, rmse, order print('ARIMA%s MAE=%.3f RMSE=%.3f' % (order, mae, rmse)) print('Best ARIMA%s MAE=%.3f RMSE=%.3f' % (best_cfg, best_mae, best_rmse)) if __name__ == '__main__': p_values = [0, 1, 2, 4, 6, 8, 10] d_values = range(0, 3) q_values = range(1, 3) warnings.filterwarnings("ignore") evaluate_models(p_values, d_values, q_values) [Link 1]: https://baike.baidu.com/item/%E6%97%B6%E9%97%B4%E5%BA%8F%E5%88%97
还没有评论,来说两句吧...