Logistic Regression 用于预测马是否生病

1.利用Logistic regression 进行分类的主要思想

根据现有数据对分类边界线建立回归公式，即寻找最佳拟合参数集，然后进行分类。

2.利用梯度下降找出最佳拟合参数

3.代码实现

 # -*- coding: utf-8 -*-

 """

 Created on Tue Mar 28 21:35:25 2017

 @author: MyHome

 """

 import numpy as np

 from random import uniform

 '''定义sigmoid函数'''

 def sigmoid(inX):

     return 1.0 /(1.0 +np.exp(-inX))

 '''使用随机梯度下降更新权重，并返回最终值'''

 def StocGradientDescent(dataMatrix,classLabels,numIter = 600):

     m,n = dataMatrix.shape

     #print m,n

     weights = np.ones(n)

     for j in xrange(numIter):

         dataIndex = range(m)

         for i in xrange(m):

             alpha = 4 / (1.0+j+i) + 0.01

             randIndex = int(uniform(0,len(dataIndex)))

             h = sigmoid(sum(dataMatrix[randIndex]*weights))

             gradient = (h - classLabels[randIndex])*dataMatrix[randIndex]

             weights = weights - alpha*gradient

             del(dataIndex[randIndex])

     return weights

 '''创建分类器'''

 def classifyVector(inX,weights):

     prob = sigmoid(sum(inX*weights))

     if prob > 0.5:

         return 1.0

     else:

         return 0.0

 '''测试'''

 def Test():

     frTrain = open("horseColicTraining.txt")

     frTest = open("horseColicTest.txt")

     trainingSet = []

     trainingLabel = []

     for line in frTrain.readlines():

         currLine = line.strip().split("\t")

         lineArr = []

         for i in range(21):

             lineArr.append(float(currLine[i]))

         trainingSet.append(lineArr)

         trainingLabel.append(float(currLine[21]))

     trainWeights = StocGradientDescent(np.array(trainingSet),trainingLabel)

     errorCount = 0.0

     numTestVec = 0.0

     for line in frTest.readlines():

         numTestVec += 1.0

         currLine = line.strip().split("\t")

         lineArr = []

         for i in range(21):

             lineArr.append(float(currLine[i]))

         if int(classifyVector(np.array(lineArr),trainWeights)) != int(currLine[21]):

             errorCount += 1

     errorRate = (float(errorCount)/numTestVec)

     print "the error rate of this test is:%f"%errorRate

     return errorRate

 '''调用Test()10次求平均值'''

 def multiTest():

     numTest = 10

     errorSum = 0.0

     for k in range(numTest):

         errorSum += Test()

     print "after %d iterations the average errror rate is:\

         %f"%(numTest,errorSum/float(numTest))

 if __name__ == "__main__":

     multiTest()

结果：

the error rate of this test is:0.522388
the error rate of this test is:0.328358

the error rate of this test is:0.313433

the error rate of this test is:0.358209

the error rate of this test is:0.298507

the error rate of this test is:0.343284

the error rate of this test is:0.283582

the error rate of this test is:0.313433

the error rate of this test is:0.343284

the error rate of this test is:0.358209

after 10 iterations the average errror rate is: 0.346269

4.总结

Logistic regression is finding best-fit parameters to a nonlinear function called the sigmoid.

Methods of optimization can be used to find the best-fit parameters. Among the

optimization algorithms, one of the most common algorithms is gradient descent. Gradient

desent can be simplified with stochastic gradient descent.

Stochastic gradient descent can do as well as gradient descent using far fewer computing

resources. In addition, stochastic gradient descent is an online algorithm; it can

update what it has learned as new data comes in rather than reloading all of the data

as in batch processing.

One major problem in machine learning is how to deal with missing values in the

data. There’s no blanket answer to this question. It really depends on what you’re

doing with the data. There are a number of solutions, and each solution has its own

advantages and disadvantages.

Logistic Regression 用于预测马是否生病的更多相关文章

Logistic回归应用-预测马的死亡率
Logistic回归应用-预测马的死亡率本文所有代码均来自<机器学习实战>,数据也是本例中的数据有以下几个特征: 部分指标比较主观.难以很好的定量测量,例如马的疼痛级别数据集中有30 ...
matlab(8) Regularized logistic regression : 不同的λ(0,1,10,100)值对regularization的影响，对应不同的decision boundary\ 预测新的值和计算模型的精度predict.m
不同的λ(0,1,10,100)值对regularization的影响\ 预测新的值和计算模型的精度 %% ============= Part 2: Regularization and Accur ...
Machine Learning - 第3周（Logistic Regression、Regularization）
Logistic regression is a method for classifying data into discrete outcomes. For example, we might u ...
Coursera公开课笔记: 斯坦福大学机器学习第六课“逻辑回归(Logistic Regression)” 清晰讲解logistic-good!!!!!!
原文:http://52opencourse.com/125/coursera%E5%85%AC%E5%BC%80%E8%AF%BE%E7%AC%94%E8%AE%B0-%E6%96%AF%E5%9D ...
机器学习理论基础学习3.3--- Linear classification 线性分类之logistic regression（基于经验风险最小化）
一.逻辑回归是什么? 1.逻辑回归逻辑回归假设数据服从伯努利分布,通过极大化似然函数的方法,运用梯度下降来求解参数,来达到将数据二分类的目的. logistic回归也称为逻辑回归,与线性回归这样输出 ...
SparkMLlib之 logistic regression源码分析
最近在研究机器学习,使用的工具是spark,本文是针对spar最新的源码Spark1.6.0的MLlib中的logistic regression, linear regression进行源码分析,其 ...
Logistic Regression Vs Decision Trees Vs SVM: Part I
Classification is one of the major problems that we solve while working on standard business problem ...
Logistic Regression逻辑回归
参考自: http://blog.sina.com.cn/s/blog_74cf26810100ypzf.html http://blog.sina.com.cn/s/blog_64ecfc2f010 ...
在opencv3中实现机器学习之：利用逻辑斯谛回归（logistic regression)分类
logistic regression,注意这个单词logistic ,并不是逻辑(logic)的意思,音译过来应该是逻辑斯谛回归,或者直接叫logistic回归,并不是什么逻辑回归.大部分人都叫成逻 ...

随机推荐

关于matlab浮点转定点总结
1,算式长度不应该太长,否则在转换过程中提示位宽超过128位,(用的64位matlab),长算式改为短算式就可以了. 2,不要过于相信推荐字长,有些地方需要更高的精度,如果用推荐字长,可能结果误差较大 ...
使用.NET Remoting开发分布式应用——配置文件篇
我们已经知道可以通过编码的方式配置服务器通道和远程客户机,除此之外,还可以使用配置文件对服务器通道和远程客户机进行配置.使用远程客户机和服务器对象的配置文件的优点在于,用户无需修改任何一行代码,也无需 ...
selenium - 控制浏览器窗口的大小和浏览器最大化
1.控制浏览器大小有些前端的页面需要查看在不同像素下的兼容情况,比如把像素设置为 480*800,然后截图看看页面显示有没有问题 WebDriver 提供了 set_windows_size() 方 ...
Eclipse里git提交冲突rejected – non-fast-forward
Eclipse里commit代码,其实只是提交到本地仓库,需要push才会提交到远程的git仓库,这时是一个本地仓库到远程仓库的同步过程.Git是分布式的,每个人在本地仓库维护本地的自己的那一份代码, ...
显示等待之 text_to_be_present_in_element 判断元素是否有xx 文本信息用法
【转】Jmeter做功能测试的优点和不足
利用Jmeter做功能测试有以下优点: ● 不依赖于界面,如果服务正常启动,传递参数明确就可以添加测试用例,执行测试 ● 测试脚本不需要编程,熟悉http请求,熟悉业务流程,就可以根据页面 ...
实战 TestNG 监听器
TestNG 是一个开源的自动化测试框架,其灵感来自 JUnit 和 NUnit,但它引入了一些新功能,使其功能更强大,更易于使用.TestNG 的设计目标是能够被用于进行各种类型测试:单元测试.功能 ...
linux 定时脚本任务的创建
参考资料https://my.oschina.net/xsh1208/blog/512810 定时脚本任务创建过程 1. 启动/终止 crontab 服务一般使用这个命令/sbin/service ...
Resize事件和SizeChanged事件
窗体加载的时候, 就会触发Form_ResizeBeginForm_ResizeEnd 窗体的拖动会触发:Form_ResizeBeginForm_ResizeEnd 窗体的最小化按钮会触发:Form ...
Populating Next Right Pointers in Each Node II ？
void connect(TreeLinkNode *root) { if(root==NULL) return; if(root->left&&root->right) ...

Logistic Regression 用于预测马是否生病

Logistic Regression 用于预测马是否生病的更多相关文章

随机推荐

热门专题