一元回归_ols参数解读(推荐AAA)

python机器学习-乳腺癌细胞挖掘（博主亲自录制视频）

https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share

项目合作QQ：231469242

多重共线性测试需要改进

文件夹需要两个包

python3.0 anaconda

normality_check.py 正太检验

# -*- coding: utf-8 -*-

'''

Author：Toby

QQ：231469242，all right reversed,no commercial use

normality_check.py

正态性检验脚本

'''

import scipy

from scipy.stats import f

import numpy as np

import matplotlib.pyplot as plt

import scipy.stats as stats

# additional packages

from statsmodels.stats.diagnostic import lillifors

#正态分布测试

def check_normality(testData):

    #20<样本数<50用normal test算法检验正态分布性

    if 20<len(testData) <50:

       p_value= stats.normaltest(testData)[1]

       if p_value<0.05:

           print("use normaltest")

           print ("data are not normal distributed")

           return  False

       else:

           print("use normaltest")

           print ("data are normal distributed")

           return True

    #样本数小于50用Shapiro-Wilk算法检验正态分布性

    if len(testData) <50:

       p_value= stats.shapiro(testData)[1]

       if p_value<0.05:

           print ("use shapiro:")

           print ("data are not normal distributed")

           return  False

       else:

           print ("use shapiro:")

           print ("data are normal distributed")

           return True

    if 300>=len(testData) >=50:

       p_value= lillifors(testData)[1]

       if p_value<0.05:

           print ("use lillifors:")

           print ("data are not normal distributed")

           return  False

       else:

           print ("use lillifors:")

           print ("data are normal distributed")

           return True

    if len(testData) >300:

       p_value= stats.kstest(testData,'norm')[1]

       if p_value<0.05:

           print ("use kstest:")

           print ("data are not normal distributed")

           return  False

       else:

           print ("use kstest:")

           print ("data are normal distributed")

           return True

#对所有样本组进行正态性检验

def NormalTest(list_groups):

    for group in list_groups:

        #正态性检验

        status=check_normality(group)

        if status==False :

            return False

    return True

Rsquare_multimode.py 多种模型计算R平方

加入了线性显著检测和ｒ相关系数显著检测，多重共线性，自相关，残差正太检验等等

# -*- coding: utf-8 -*-

#斯皮尔曼等级相关（Spearman’s correlation coefficient for ranked data）

import math,pylab,scipy

import numpy as np

import scipy.stats as stats

from scipy.stats import t

from scipy.stats import f

import pandas as pd

import matplotlib.pyplot as plt

from statsmodels.stats.diagnostic import lillifors

import normality_check

import statsmodels.formula.api as sm

x=[4.03,3.76,3.77,3.34,3.47,2.92,3.20,2.71,3.53,4.51]

y=[6.47,6.13,6.19,4.89,5.63,4.52,5.89,4.79,5.27,6.08]

list_group=[x,y]

sample=len(x)

#显著性

a=0.05

#数据可视化

plt.plot(x,y,'ro')

#斯皮尔曼等级相关，非参数检验

def Spearmanr(x,y):

    print("use spearmanr,Nonparametric tests")

    #样本不一致时，发出警告

    if len(x)!=len(y):

        print ("warming,the samples are not equal!")

    r,p=stats.spearmanr(x,y)

    print("spearman r**2:",r**2)

    print("spearman p:",p)

    if sample<500 and p>0.05:

        print("when sample < 500，p has no mean（>0.05）")

        print("when sample > 500，p has mean")

#皮尔森 ，参数检验

def Pearsonr(x,y):

    print("use Pearson,parametric tests")

    r,p=stats.pearsonr(x,y)

    print("pearson r**2:",r**2)

    print("pearson p:",p)

    if sample<30:

        print("when sample <30,pearson has no mean")

#皮尔森 ，参数检验,带有详细参数

def Pearsonr_details(x,y,xLabel,yLabel,formula):

    n=len(x)

    df=n-2

    data=pd.DataFrame({yLabel:y,xLabel:x})

    result = sm.ols(formula, data).fit()

    print(result.summary()) 

    #模型F分布显著性分析

    print('\n')

    print("linear relation Significant test:...................................")

    #如果F检验的P值<0.05，拒绝H0，x和y无显著关系，H1成立，x和y有显著关系

    if result.f_pvalue<0.05:

        print ("P value of f test<0.05,the linear relation is right.")

    #R的显著检验

    print('\n')

    print("R significant test:...................................")

    r_square=result.rsquared

    r=math.sqrt(r_square)

    t_score=r*math.sqrt(n-2)/(math.sqrt(1-r**2))

    t_std=t.isf(a/2,df)

    if t_score<-t_std or t_score>t_std:

        print ("R is significant according to its sample size")

    else:

        print ("R is not significant")

    #残差分析

    print('\n')

    print("residual error analysis:...................................")

    states=normality_check.check_normality(result.resid)

    if states==True:

        print("the residual error are normal distributed")

    else:

        print("the residual error are not normal distributed")

    #残差偏态和峰态

    Skew = stats.skew(result.resid, bias=True)

    Kurtosis = stats.kurtosis(result.resid, fisher=False,bias=True)

    if round(Skew,1)==0:

        print("residual errors normality Skew:in middle,perfect match")

    elif  round(Skew,1)>0:

        print("residual errors normality Skew:close right")

    elif  round(Skew,1)<0:

        print("residual errors normality Skew:close left")

    if round(Kurtosis,1)==3:

        print("residual errors normality Kurtosis:in middle,perfect match")

    elif  round(Kurtosis,1)>3:

        print("residual errors normality Kurtosis:more peak")

    elif  round(Kurtosis,1)<3:

        print("residual errors normality Kurtosis:more flat")    

    #自相关分析autocorrelation

    print('\n')

    print("autocorrelation test:...................................")

    DW = np.sum( np.diff( result.resid.values )**2.0 )/ result.ssr

    if round(DW,1)==2:

        print("Durbin-Watson close to 2,there is no autocorrelation.OLS model works well")    

    #共线性检查

    print('\n')

    print("multicollinearity test:")

    conditionNumber=result.condition_number

    if conditionNumber>30:

        print("conditionNumber>30,multicollinearity exists")

    else:

        print("conditionNumber<=30,multicollinearity not exists")

    #绘制残差图，用于方差齐性检验

    Draw_residual(list(result.resid))

'''

result.rsquared

Out[28]: 0.61510660055413524

''' 

#kendalltau非参数检验

def Kendalltau(x,y):

    print("use kendalltau,Nonparametric tests")

    r,p=stats.kendalltau(x,y)

    print("kendalltau r**2:",r**2)

    print("kendalltau p:",p)

#选择模型

def R_mode(x,y,xLabel,yLabel,formula):

    #正态性检验

    Normal_result=normality_check.NormalTest(list_group)

    print ("normality result:",Normal_result)

    if len(list_group)>2:

        Kendalltau(x,y)

    if Normal_result==False:

        Spearmanr(x,y)

        Kendalltau(x,y)

    if Normal_result==True:

        Pearsonr_details(x,y,xLabel,yLabel,formula)

#调整的R方

def Adjust_Rsquare(r_square,n,k):

    adjust_rSquare=1-((1-r_square)*(n-1)*1.0/(n-k-1))

    return adjust_rSquare

'''

n=len(x)

n=10

k=1

 r_square=0.615

 Adjust_Rsquare(r_square,n,k)

Out[11]: 0.566875

'''    

#绘图

def Plot(x,y,yLabel,xLabel,Title):

    plt.plot(x,y,'ro')

    plt.ylabel(yLabel)

    plt.xlabel(xLabel)

    plt.title(Title)

    plt.show()

#绘图参数

yLabel='Alcohol'

xLabel='Tobacco'

Title='Sales in Several UK Regions'

Plot(x,y,yLabel,xLabel,Title)

formula='Alcohol ~ Tobacco'    

#绘制残点图

def Draw_residual(residual_list):

    x=[i for i in range(1,len(residual_list)+1)]

    y=residual_list

    pylab.plot(x,y,'ro')

    pylab.title("draw residual to check wrong number")

    # Pad margins so that markers don't get clipped by the axes,让点不与坐标轴重合

    pylab.margins(0.3)

    #绘制网格

    pylab.grid(True)

    pylab.show()

R_mode(x,y,xLabel,yLabel,formula)

'''

result.fittedvalues表示预测的y值阵列

result.fittedvalues

Out[42]:

0    6.094983

1    5.823391

2    5.833450

3    5.400915

4    5.531682

5    4.978439

6    5.260090

7    4.767201

8    5.592035

9    6.577813

dtype: float64

#计算残差的偏态

S = stats.skew(result.resid, bias=True)

Out[44]: -0.013678125910039975

K = stats.kurtosis(result.resid, fisher=False,bias=True)

K

Out[47]: 1.5271300905736027

'''

result.params 得到两个参数：x的系数和截距

截距

result.params[0]

x系数

result.params[1]

dubin watson解读

--残差是否符合正太分布

D.W统计量是用来检验残差分布是否为正态分布的，因为用OLS进行回归估计是假设模型残差服从正态分布的，因此，如果残差不服从正态分布，那么，模型将是有偏的，也就是说模型的解释能力是不强的。
D.W统计量在2左右说明残差是服从正态分布的，若偏离2太远，那么你所构建的模型

的解释能力就要受影响了。

jarque-bera解读
----样本是否符合正太分布

JB统计量全称叫Jarque-Bera统计量，是用来检验一组样本是否能够认为来自正态总体的一种方法，它依据OLS残差，对大样本进行检验（或称为渐进检验）。

首先计算偏度系数S(对概率密度函数对称性的度量)：

及峰度系数K(对概率密度函数的“胖瘦”的度量)：

对于正态分布变量，偏度为零，峰度为3.

Jarque和Bera建立了如下检验统计量——JB统计量：

其中，n为样本容量，S为偏度，K为峰度。

在正态分布的假设下，JB统计量渐进地服从自由度为2的卡方分布, JBasy~χ2(2)。

若变量服从正态分布，则S为零，K为3，因而JB统计量的值为零；如果变量不是正态变量，则JB统计量将为一个逐渐增大值。

如果JB统计量值较大，比如为11，则可以计算出卡方值大于11的概率为0.004，这个概率过小，因此不能认为样本来自正态分布。反之，成立。

Jarque-Bera的P值接近于0，表明显著性高，数据服从正态分布。

Omnibus解读

Omnibus统计量的P值都接近于0，自变量的作用显著。

Omnibus tests are a kind of statistical test. They test whether the explained variance in a set of data is significantly greater than the unexplained variance, overall. One example is the F-test in the analysis of variance. There can be legitimate significant effects within a model even if the omnibus test is not significant. For instance, in a model with two independent variables, if only one variable exerts a significant effect on the dependent variable and the other does not, then the omnibus test may be non-significant. This fact does not affect the conclusions that may be drawn from the one significant variable. In order to test effects within an omnibus test, researchers often use contrasts.

https://en.wikipedia.org/wiki/Omnibus_test

python信用评分卡建模（附代码，博主录制）

https://study.163.com/course/introduction.htm?courseId=1005214003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share

一元回归_ols参数解读(推荐AAA)的更多相关文章

一元回归1_基础（python代码实现）
python机器学习-乳腺癌细胞挖掘(博主亲自录制视频) https://study.163.com/course/introduction.htm?courseId=1005269003&u ...
机器学习（2）：简单线性回归 | 一元回归 | 损失计算 | MSE
前文再续书接上一回,机器学习的主要目的,是根据特征进行预测.预测到的信息,叫标签. 从特征映射出标签的诸多算法中,有一个简单的算法,叫简单线性回归.本文介绍简单线性回归的概念. (1)什么是简单线性回 ...
main(int argc, char **argv)参数解读
main(int argc, char **argv)参数解读编译生成了test.exe ,然后在控制台下相应的目录下输入:test 1 2 3 4 argc就是一个输入了多少个参数,包括te ...
Python_sklearn机器学习库学习笔记（一）_一元回归
一.引入相关库 %matplotlib inline import matplotlib.pyplot as plt from matplotlib.font_manager import FontP ...
sklearn_随机森林random forest原理_乳腺癌分类器建模(推荐AAA)
sklearn实战-乳腺癌细胞数据挖掘(博主亲自录制视频) https://study.163.com/course/introduction.htm?courseId=1005269003& ...
支持向量机SVM原理_python sklearn建模乳腺癌细胞分类器（推荐AAA）
项目合作联系QQ:231469242 sklearn实战-乳腺癌细胞数据挖掘(博主亲自录制视频) https://study.163.com/course/introduction.htm?cours ...
因子分析factor analysis_spss运用_python建模(推荐AAA)
sklearn实战-乳腺癌细胞数据挖掘(博主亲自录制视频) https://study.163.com/course/introduction.htm?courseId=1005269003& ...
决策树decision tree原理介绍_python sklearn建模_乳腺癌细胞分类器（推荐AAA）
sklearn实战-乳腺癌细胞数据挖掘(博主亲自录制视频) https://study.163.com/course/introduction.htm?courseId=1005269003& ...
Java8 JVM参数解读
附录:https://www.liangzl.com/get-article-detail-134315.html 摘要: 我们知道java虚拟机启动时会带有很多的启动参数,Java命令本身就是一个多 ...

随机推荐

asp.net mvc同一个view展示多个不同列表思路
asp.net mvc一个模型一个view容易展示,可是遇到像首页那样,要同时调用好几个不同表的内容一小部分展示时,该怎么是好呢? 下边根据我的测试,用的是mvc access数据测试先建立一个强类 ...
回归Qt——写在Qt5.10发布之日
今天偶然看到一条关于Qt5.10发布的消息,发现Qt经历了诺基亚风波之后发展得依然良好,感到很欣慰.回头看上次关注Qt技术还是2011年,那时候用Qt4.7做一个小项目,对于一个写Win32界面和MF ...
hibernate.hbm.xml文件配置入门小结（1）
在Hibernate中,各表的映射文件xxx.hbm.xml可以通过工具生成,例如在使用MyEclipse开发时,它提供了自动生成映射文件的工具. hibernate.hbm.xml文件的基本结构如下 ...
字符串拆分和拼接（含list拼接）---基于python
最近得一超长字符串如下: l=“5245474953544552207369703a3137322e3136312e31302e323232205349502f322e300d0a5669613a20 ...
php作用域限定符
双冒号::被认为是作用域限定操作符,用来指定类中不同的作用域级别.::左边表示的是作用域,右边表示的是访问的成员. 系统定义了两个作用域,self和parent.self表示当前类的作用域,在类之外的 ...
一个整数N中的1的个数
设计思想: 通过大量数据分解找规律 abcd 从d开始若d=0则count(1的个数)=左边的abc *d的位值(1.10.100..) 若等欲1则count=左边的abc*d的位值(1.10.100 ...
常用IDE插件
Visual Studio 常用 Refactoring Essentials:代码重构分析 Roslynator:代码重构 CodeMaid:代码格式化 Github Extension for V ...
java一些知识
类名前只有两种修饰符:不写(即default,但不能把default写上去)或public.默认不写则此类只能被同一包下的类调用以生成相应的实例.但若是public,则可以被不同包下的类调用以生成其实 ...
新手必备！11个强大的 Visual Studio 调试技巧
简介调试是软件开发周期中很重要的一部分.它具有挑战性,同时也很让人疑惑和烦恼.总的来说,对于稍大一点的程序,调试是不可避免的.最近几年,调试工具的发展让很多调试任务变的越来越简单和省时. 这篇文章总 ...
【PHP】session失效时间
最近用到php中session时,忽然发现php中的session有点让人头疼啊,要设置一个严格的特定时间内过期的session还真不太容易!后来在网上查询时,发现这个问题还真是有点普遍,网上也有关于 ...

一元回归_ols参数解读(推荐AAA)

https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share

python信用评分卡建模（附代码，博主录制）

一元回归_ols参数解读(推荐AAA)的更多相关文章

随机推荐

热门专题