Logistic Regression 用于预测马是否生病
1.利用Logistic regression 进行分类的主要思想
根据现有数据对分类边界线建立回归公式,即寻找最佳拟合参数集,然后进行分类。
2.利用梯度下降找出最佳拟合参数
3.代码实现
# -*- coding: utf-8 -*-
"""
Created on Tue Mar 28 21:35:25 2017 @author: MyHome
"""
import numpy as np
from random import uniform
'''定义sigmoid函数'''
def sigmoid(inX):
return 1.0 /(1.0 +np.exp(-inX)) '''使用随机梯度下降更新权重,并返回最终值'''
def StocGradientDescent(dataMatrix,classLabels,numIter = 600):
m,n = dataMatrix.shape
#print m,n
weights = np.ones(n)
for j in xrange(numIter):
dataIndex = range(m) for i in xrange(m): alpha = 4 / (1.0+j+i) + 0.01
randIndex = int(uniform(0,len(dataIndex)))
h = sigmoid(sum(dataMatrix[randIndex]*weights))
gradient = (h - classLabels[randIndex])*dataMatrix[randIndex]
weights = weights - alpha*gradient
del(dataIndex[randIndex]) return weights '''创建分类器'''
def classifyVector(inX,weights):
prob = sigmoid(sum(inX*weights))
if prob > 0.5:
return 1.0
else:
return 0.0 '''测试'''
def Test(): frTrain = open("horseColicTraining.txt")
frTest = open("horseColicTest.txt")
trainingSet = []
trainingLabel = []
for line in frTrain.readlines():
currLine = line.strip().split("\t")
lineArr = []
for i in range(21):
lineArr.append(float(currLine[i]))
trainingSet.append(lineArr)
trainingLabel.append(float(currLine[21]))
trainWeights = StocGradientDescent(np.array(trainingSet),trainingLabel)
errorCount = 0.0
numTestVec = 0.0
for line in frTest.readlines():
numTestVec += 1.0
currLine = line.strip().split("\t")
lineArr = []
for i in range(21):
lineArr.append(float(currLine[i]))
if int(classifyVector(np.array(lineArr),trainWeights)) != int(currLine[21]):
errorCount += 1
errorRate = (float(errorCount)/numTestVec)
print "the error rate of this test is:%f"%errorRate
return errorRate '''调用Test()10次求平均值'''
def multiTest():
numTest = 10
errorSum = 0.0
for k in range(numTest):
errorSum += Test()
print "after %d iterations the average errror rate is:\
%f"%(numTest,errorSum/float(numTest)) if __name__ == "__main__":
multiTest()
结果:
the error rate of this test is:0.522388
the error rate of this test is:0.328358
the error rate of this test is:0.313433
the error rate of this test is:0.358209
the error rate of this test is:0.298507
the error rate of this test is:0.343284
the error rate of this test is:0.283582
the error rate of this test is:0.313433
the error rate of this test is:0.343284
the error rate of this test is:0.358209
after 10 iterations the average errror rate is: 0.346269
4.总结
Logistic regression is finding best-fit parameters to a nonlinear function called the sigmoid.
Methods of optimization can be used to find the best-fit parameters. Among theoptimization algorithms, one of the most common algorithms is gradient descent. Gradient
desent can be simplified with stochastic gradient descent.
Stochastic gradient descent can do as well as gradient descent using far fewer computing
resources. In addition, stochastic gradient descent is an online algorithm; it can
update what it has learned as new data comes in rather than reloading all of the data
as in batch processing.
One major problem in machine learning is how to deal with missing values in the
data. There’s no blanket answer to this question. It really depends on what you’re
doing with the data. There are a number of solutions, and each solution has its own
advantages and disadvantages.
Logistic Regression 用于预测马是否生病的更多相关文章
- Logistic回归应用-预测马的死亡率
Logistic回归应用-预测马的死亡率 本文所有代码均来自<机器学习实战>,数据也是 本例中的数据有以下几个特征: 部分指标比较主观.难以很好的定量测量,例如马的疼痛级别 数据集中有30 ...
- matlab(8) Regularized logistic regression : 不同的λ(0,1,10,100)值对regularization的影响,对应不同的decision boundary\ 预测新的值和计算模型的精度predict.m
不同的λ(0,1,10,100)值对regularization的影响\ 预测新的值和计算模型的精度 %% ============= Part 2: Regularization and Accur ...
- Machine Learning - 第3周(Logistic Regression、Regularization)
Logistic regression is a method for classifying data into discrete outcomes. For example, we might u ...
- Coursera公开课笔记: 斯坦福大学机器学习第六课“逻辑回归(Logistic Regression)” 清晰讲解logistic-good!!!!!!
原文:http://52opencourse.com/125/coursera%E5%85%AC%E5%BC%80%E8%AF%BE%E7%AC%94%E8%AE%B0-%E6%96%AF%E5%9D ...
- 机器学习理论基础学习3.3--- Linear classification 线性分类之logistic regression(基于经验风险最小化)
一.逻辑回归是什么? 1.逻辑回归 逻辑回归假设数据服从伯努利分布,通过极大化似然函数的方法,运用梯度下降来求解参数,来达到将数据二分类的目的. logistic回归也称为逻辑回归,与线性回归这样输出 ...
- SparkMLlib之 logistic regression源码分析
最近在研究机器学习,使用的工具是spark,本文是针对spar最新的源码Spark1.6.0的MLlib中的logistic regression, linear regression进行源码分析,其 ...
- Logistic Regression Vs Decision Trees Vs SVM: Part I
Classification is one of the major problems that we solve while working on standard business problem ...
- Logistic Regression逻辑回归
参考自: http://blog.sina.com.cn/s/blog_74cf26810100ypzf.html http://blog.sina.com.cn/s/blog_64ecfc2f010 ...
- 在opencv3中实现机器学习之:利用逻辑斯谛回归(logistic regression)分类
logistic regression,注意这个单词logistic ,并不是逻辑(logic)的意思,音译过来应该是逻辑斯谛回归,或者直接叫logistic回归,并不是什么逻辑回归.大部分人都叫成逻 ...
随机推荐
- 剑指offer-第六章面试中的各项能力(数字在排序数组中出现的次数)
题目:统计一个数字在排序数组中出现的次数. 思路:采用二分查找,找到该数字在数组中第一次出现的位置,然后再找到组后一个出现的位置.两者做减法运算再加1.时间复杂度为O(logn) Java代码: // ...
- 【java规则引擎】规则引擎RuleBase中利用观察者模式
(1)当RuleBase中有规则添加或删除,利用观察者模式实现,一旦有变动,规则引擎其他组件也做出相应的改变.(2)学习思想:当一个应用中涉及多个组件,为了实现易扩展,解耦思想.可以利用观察者模式实现 ...
- The type javax.xml.rpc.ServiceException cannot be resolved.It is indirectly
The type javax.xml.rpc.ServiceException cannot be resolved.It is indirectly 博客分类: 解决方案_Java 问题描述:T ...
- php跨域问题
http://www.cnblogs.com/xiezn/p/5651093.html
- java多线程的练习------------。加深
总结:线程的理解不够.还不够 package com.aa; public class MyThread implements Runnable {// 我们可以继承一个Thread.但是我们可以实现 ...
- PTA 银行排队问题之单队列多窗口服务(25 分)
银行排队问题之单队列多窗口服务(25 分) 假设银行有K个窗口提供服务,窗口前设一条黄线,所有顾客按到达时间在黄线后排成一条长龙.当有窗口空闲时,下一位顾客即去该窗口处理事务.当有多个窗口可选择时,假 ...
- Vue.js:自定义指令
ylbtech-Vue.js:自定义指令 1.返回顶部 1. Vue.js 自定义指令 除了默认设置的核心指令( v-model 和 v-show ), Vue 也允许注册自定义指令. 下面我们注册一 ...
- 阿里云openapi接口使用,PHP,视频直播
1.下载sdk放入项目文件夹中 核心就是aliyun-php-sdk-core,它的配置文件会自动加载相应的类 2.引入文件 include_once LIB_PATH . 'ORG/aliyun-o ...
- 怎么样使用yum来安装mysql
linux下使用yum安装mysql,以及启动.登录和远程访问. 1.安装 查看有没有安装过: yum list installed mysql* rpm -qa | grep mysql* 查看有没 ...
- socket粘包现象加解决办法
socket粘包现象分析与解决方案 简单远程执行命令程序开发(内容回顾) res = subprocess.Popen(cmd.decode('utf-8'),shell=True,stderr=su ...