机器学习----KNN算法
KNN算法应该是机器学习中最简单的算法,其算法核心步骤如下:
对未知类别属性的数据集中的每个样本执行:
1.计算已知类别数据集中的点与当前点之间的距离
2.按照距离递增次序排序
3.选取当前点距离最小的K个点
4.确定前K个点所在类别的出现频率
5.返回前K个点出现频率最高的类别作为当前点的预测分类
纪录一下核心代码
#knn算法实现
def classify0(inX, dataSet, labels, k): #s测试向量 训练向量,标签, k值
dataSetSize = dataSet.shape[0]
diffMat = np.tile(inX, (dataSetSize,1)) - dataSet #tile()用于扩展矩阵 将测试向量Y方向复制到与训练数据一致
sqDiffMat = diffMat**2
sqDistances = sqDiffMat.sum(axis=1)
distances = sqDistances**0.5 #求得距离
sortedDistIndicies = distances.argsort() #返回排序的下标,从小到大
classCount={ }
for i in range(k):
voteIlabel = labels[sortedDistIndicies[i]] #找到测试最近的前3个数据
classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1 #用字典计数
sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True)
#对字典排序,基于值,从大到小(逆序,reverse=True)
return sortedClassCount[0][0]
#导入数据集
def file2matrix(filename):
fr = open(filename)
numberOfLines = len(fr.readlines()) #get the number of lines in the file
returnMat = np.zeros((numberOfLines,3)) #prepare matrix to return
classLabelVector = [] #prepare labels return
fr = open(filename)
index = 0
for line in fr.readlines():
line = line.strip()
listFromLine = line.split('\t')
returnMat[index, :] = listFromLine[0:3]
classLabelVector.append(int(listFromLine[-1]))
index += 1
return returnMat,classLabelVector
#数值归一化
#newValue=(oldvalue-min)/(max-min)
def autoNorm(dataSet):
minVals = dataSet.min(0) #返回该矩阵中每一列的最小值
maxVals = dataSet.max(0)
ranges = maxVals - minVals
normDataSet = np.zeros(np.shape(dataSet))
m = dataSet.shape[0]
normDataSet = dataSet - np.tile(minVals, (m,1))
normDataSet = normDataSet/np.tile(ranges, (m,1)) #element wise divide
return normDataSet, ranges, minVals
还没有评论,来说两句吧...