1.利用Logistic regression 进行分类的主要思想

根据现有数据对分类边界线建立回归公式,即寻找最佳拟合参数集,然后进行分类。

2.利用梯度下降找出最佳拟合参数

3.代码实现

 # -*- coding: utf-8 -*-
"""
Created on Tue Mar 28 21:35:25 2017 @author: MyHome
"""
import numpy as np
from random import uniform
'''定义sigmoid函数'''
def sigmoid(inX):
return 1.0 /(1.0 +np.exp(-inX)) '''使用随机梯度下降更新权重,并返回最终值'''
def StocGradientDescent(dataMatrix,classLabels,numIter = 600):
m,n = dataMatrix.shape
#print m,n
weights = np.ones(n)
for j in xrange(numIter):
dataIndex = range(m) for i in xrange(m): alpha = 4 / (1.0+j+i) + 0.01
randIndex = int(uniform(0,len(dataIndex)))
h = sigmoid(sum(dataMatrix[randIndex]*weights))
gradient = (h - classLabels[randIndex])*dataMatrix[randIndex]
weights = weights - alpha*gradient
del(dataIndex[randIndex]) return weights '''创建分类器'''
def classifyVector(inX,weights):
prob = sigmoid(sum(inX*weights))
if prob > 0.5:
return 1.0
else:
return 0.0 '''测试'''
def Test(): frTrain = open("horseColicTraining.txt")
frTest = open("horseColicTest.txt")
trainingSet = []
trainingLabel = []
for line in frTrain.readlines():
currLine = line.strip().split("\t")
lineArr = []
for i in range(21):
lineArr.append(float(currLine[i]))
trainingSet.append(lineArr)
trainingLabel.append(float(currLine[21]))
trainWeights = StocGradientDescent(np.array(trainingSet),trainingLabel)
errorCount = 0.0
numTestVec = 0.0
for line in frTest.readlines():
numTestVec += 1.0
currLine = line.strip().split("\t")
lineArr = []
for i in range(21):
lineArr.append(float(currLine[i]))
if int(classifyVector(np.array(lineArr),trainWeights)) != int(currLine[21]):
errorCount += 1
errorRate = (float(errorCount)/numTestVec)
print "the error rate of this test is:%f"%errorRate
return errorRate '''调用Test()10次求平均值'''
def multiTest():
numTest = 10
errorSum = 0.0
for k in range(numTest):
errorSum += Test()
print "after %d iterations the average errror rate is:\
%f"%(numTest,errorSum/float(numTest)) if __name__ == "__main__":
multiTest()

结果:

the error rate of this test is:0.522388
the error rate of this test is:0.328358

the error rate of this test is:0.313433

the error rate of this test is:0.358209

the error rate of this test is:0.298507

the error rate of this test is:0.343284

the error rate of this test is:0.283582

the error rate of this test is:0.313433

the error rate of this test is:0.343284

the error rate of this test is:0.358209

after 10 iterations the average errror rate is:        0.346269

4.总结

Logistic regression is finding best-fit parameters to a nonlinear function called the sigmoid.

Methods of optimization can be used to find the best-fit parameters. Among the

optimization algorithms, one of the most common algorithms is gradient descent. Gradient

desent can be simplified with stochastic gradient descent.

Stochastic gradient descent can do as well as gradient descent using far fewer computing

resources. In addition, stochastic gradient descent is an online algorithm; it can

update what it has learned as new data comes in rather than reloading all of the data

as in batch processing.

One major problem in machine learning is how to deal with missing values in the

data. There’s no blanket answer to this question. It really depends on what you’re

doing with the data. There are a number of solutions, and each solution has its own

advantages and disadvantages.

Logistic Regression 用于预测马是否生病的更多相关文章

  1. Logistic回归应用-预测马的死亡率

    Logistic回归应用-预测马的死亡率 本文所有代码均来自<机器学习实战>,数据也是 本例中的数据有以下几个特征: 部分指标比较主观.难以很好的定量测量,例如马的疼痛级别 数据集中有30 ...

  2. matlab(8) Regularized logistic regression : 不同的λ(0,1,10,100)值对regularization的影响,对应不同的decision boundary\ 预测新的值和计算模型的精度predict.m

    不同的λ(0,1,10,100)值对regularization的影响\ 预测新的值和计算模型的精度 %% ============= Part 2: Regularization and Accur ...

  3. Machine Learning - 第3周(Logistic Regression、Regularization)

    Logistic regression is a method for classifying data into discrete outcomes. For example, we might u ...

  4. Coursera公开课笔记: 斯坦福大学机器学习第六课“逻辑回归(Logistic Regression)” 清晰讲解logistic-good!!!!!!

    原文:http://52opencourse.com/125/coursera%E5%85%AC%E5%BC%80%E8%AF%BE%E7%AC%94%E8%AE%B0-%E6%96%AF%E5%9D ...

  5. 机器学习理论基础学习3.3--- Linear classification 线性分类之logistic regression(基于经验风险最小化)

    一.逻辑回归是什么? 1.逻辑回归 逻辑回归假设数据服从伯努利分布,通过极大化似然函数的方法,运用梯度下降来求解参数,来达到将数据二分类的目的. logistic回归也称为逻辑回归,与线性回归这样输出 ...

  6. SparkMLlib之 logistic regression源码分析

    最近在研究机器学习,使用的工具是spark,本文是针对spar最新的源码Spark1.6.0的MLlib中的logistic regression, linear regression进行源码分析,其 ...

  7. Logistic Regression Vs Decision Trees Vs SVM: Part I

    Classification is one of the major problems that we solve while working on standard business problem ...

  8. Logistic Regression逻辑回归

    参考自: http://blog.sina.com.cn/s/blog_74cf26810100ypzf.html http://blog.sina.com.cn/s/blog_64ecfc2f010 ...

  9. 在opencv3中实现机器学习之:利用逻辑斯谛回归(logistic regression)分类

    logistic regression,注意这个单词logistic ,并不是逻辑(logic)的意思,音译过来应该是逻辑斯谛回归,或者直接叫logistic回归,并不是什么逻辑回归.大部分人都叫成逻 ...

随机推荐

  1. ACM学习历程—51NOD 1770数数字(循环节)

    http://www.51nod.com/onlineJudge/questionCode.html#!problemId=1770 这是这次BSG白山极客挑战赛的A题.由于数字全部相同,乘上b必然会 ...

  2. php中 curl模拟post发送json并接收json(转)

    本地模拟请求服务器数据,请求数据格式为json,服务器返回数据也是json. 由于需求特殊性, 如同步客户端的批量数据至云端, 提交至服务器的数据可能是多维数组数据了.  这时需要将此数据以一定的数据 ...

  3. 找到最大或最小的N个元素

    问题: 想在某个集合中找到最大或最小的N个元素 解决方案: heapq 模块中有两个函数  nlargest() 和 nsmallest()  它们正是我们需要的.例如: import heapq n ...

  4. <trim>: prefix+prefixOverrides+suffix+suffixOverrides

    <trim prefix="where" prefixOverrides="where" suffixOverrides="and"& ...

  5. Linux 中断下半部

    为什么使用中断下半部? 中断执行的原则是要以最快的速度执行完,而且期间不能延时和休眠! 可是现实中,中断中可能没办法很快的处理完需要做的事,或者必须用到延时和休眠,因此引入了中断下半部. 中断中处理紧 ...

  6. 找回mysql root用户的密码

    1.停掉mysql服务ps -ef | grep mysqldkill -9 mysql进程的pid2.vi /etc/my.cnf找到[mysqld]在下面添加一行skip-grant-tables ...

  7. jQuery选择器this通过onclick传入方法以及Jquery中的this与$(this)初探,this传处变量等

    起初以为this和$(this)就是一模子刻出来.但是我在阅读时,和coding时发现,总不是一回事. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML ...

  8. PL/SQL 训练08--触发器

    --什么是触发器呢?--一触即发,某个事件发生时,执行的程序块?--数据库触发器是一个当数据库发生某种事件时作为对这个事件的响应而执行的一个被命名的程序单元 --适合场景--对表的修改做验证--数据库 ...

  9. flask系列一之环境搭建包安装

    一,python的安装 (1)python的安装 (2)虚拟环境的配置 参考:http://www.cnblogs.com/bfwbfw/p/7995245.html 1,虚拟环境的建立 (1)使用p ...

  10. Java面向对象-static关键字、静态方法与普通方法、静态成员变量

    Java面向对象-static关键字.静态方法与普通方法 static关键字的基本作用:方便在没有创建对象的情况下来进行调用(方法/变量). 很显然,被static关键字修饰的方法或者变量不需要依赖于 ...