sklearn.model_selection 的train_test

train_test_split是sklearn中用于划分数据集，即将原始数据集划分成测试集和训练集两部分的函数。

from sklearn.model_selection import train_test_split

1. 其函数源代码是：

def train_test_split(*arrays, **options):

    """Split arrays or matrices into random train and test subsets

    Quick utility that wraps input validation and

    ``next(ShuffleSplit().split(X, y))`` and application to input data

    into a single call for splitting (and optionally subsampling) data in a

    oneliner.

    Read more in the :ref:`User Guide <cross_validation>`.

    Parameters

    ----------

    *arrays : sequence of indexables with same length / shape[0]

        Allowed inputs are lists, numpy arrays, scipy-sparse

        matrices or pandas dataframes.

    test_size : float, int, None, optional

        If float, should be between 0.0 and 1.0 and represent the proportion

        of the dataset to include in the test split. If int, represents the

        absolute number of test samples. If None, the value is set to the

        complement of the train size. By default, the value is set to 0.25.

        The default will change in version 0.21. It will remain 0.25 only

        if ``train_size`` is unspecified, otherwise it will complement

        the specified ``train_size``.

    train_size : float, int, or None, default None

        If float, should be between 0.0 and 1.0 and represent the

        proportion of the dataset to include in the train split. If

        int, represents the absolute number of train samples. If None,

        the value is automatically set to the complement of the test size.

    random_state : int, RandomState instance or None, optional (default=None)

        If int, random_state is the seed used by the random number generator;

        If RandomState instance, random_state is the random number generator;

        If None, the random number generator is the RandomState instance used

        by `np.random`.

    shuffle : boolean, optional (default=True)

        Whether or not to shuffle the data before splitting. If shuffle=False

        then stratify must be None.

    stratify : array-like or None (default is None)

        If not None, data is split in a stratified fashion, using this as

        the class labels.

    Returns

    -------

    splitting : list, length=2 * len(arrays)

        List containing train-test split of inputs.

        .. versionadded:: 0.16

            If the input is sparse, the output will be a

            ``scipy.sparse.csr_matrix``. Else, output type is the same as the

            input type.

2. 参数

train_size：训练集大小

　　float：0-1之间，表示训练集所占的比例

　　int：直接指定训练集的数量

　　None：自动为测试集的补集，也就是原始数据集减去测试集

test_size：测试集大小，默认值是0.25

　　float：0-1之间，表示测试集所占的比例

　　int：直接指定测试集的数量

　　None：自动为训练集的补集，也就是原始数据集减去训练集

random_state：可以理解为随机数种子，主要是为了复现结果而设置

shuffle：表示是否打乱数据位置，True或者False，默认是True

stratify：表示是否按照样本比例（不同类别的比例）来划分数据集，例如原始数据集类A:类B = 75%:25%，那么划分的测试集和训练集中的A:B的比例都会是75%:25%；可用于样本类别差异很大的情况，一般使用为：stratify=y，即用数据集的标签y来进行划分。

3. 一般使用形式是：

X_train,X_test,y_train,y_test = train_test_split(X,y,train_size = 0.75, random_state=14, stratify=y)

参考：

https://blog.csdn.net/liuxiao214/article/details/79019901

https://blog.csdn.net/qq_38410428/article/details/94054920

sklearn.model_selection 的train_test_split方法和参数的更多相关文章

sklearn.model_selection 的 train_test_split作用
train_test_split函数用于将数据划分为训练数据和测试数据. train_test_split是交叉验证中常用的函数,功能是从样本中随机的按比例选取train_data和test_data ...
sklearn中的train_test_split （随机划分训练集和测试集）
官方文档:http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html ...
sklearn评估模型的方法
一.acc.recall.F1.混淆矩阵.分类综合报告 1.准确率第一种方式:accuracy_score # 准确率import numpy as np from sklearn.metrics ...
sklearn 标准化数据的方法
Sklearn 标准化数据 from __future__ import print_function from sklearn import preprocessing import numpy a ...
No module named ‘sklearn.model_selection解决办法
在python中运行导入以下模块 from sklearn.model_selection import train_test_split 出现错误: No module named ‘sklear ...
[Python]-sklearn.model_selection模块-处理数据集
拆分数据集train&test from sklearn.model_selection import train_test_split 可以按比例拆分数据集,分为train和test x_t ...
sklearn的train_test_split()各函数参数含义解释（非常全）
sklearn之train_test_split()函数各参数含义(非常全) 在机器学习中,我们通常将原始数据按照比例分割为“测试集”和“训练集”,从 sklearn.model_selection ...
深度学习 | sklearn的train_test_split()各函数参数含义解释（超级全）
在机器学习中,我们通常将原始数据按照比例分割为"测试集"和"训练集",从 sklearn.model_selection 中调用train_test_split ...
sklearn.model_selection模块
后续补代码 sklearn.model_selection模块的几个方法参数

随机推荐

CMDBuild部署教程
一.CMDBuild简介 CMDBuild是一个通过Web界面配置的CMDB系统.可以通过Web界面来进行建模.创建资产数据库,并处理相关的工作流程.CMDBuild可用于集中管理数据库模块和外部应用 ...
go中值传递、引用传递、指针传递的区别
go语言中的值类型: int.float.bool.array.sturct等值传递是指在调用函数时将实际参数复制一份传递到函数中,这样在函数中如果对参数进行修改,将不会影响到实际参数声明一个值类 ...
推荐系统中的协同滤波算法___使用SVD
对于推荐方法,基于内容和基于协同过滤是目前的主流推荐算法,很多电子商务网站的推荐系统都是基于这两种算法的. 协同过滤是一种基于相似性来进行推荐的算法,主要分为基于用户的协同过滤算法和基于 ...
java中通过Adb判断PC是否连接了移动设备
最近用到PC端和移动端通过USB连接传输数据的方式,于是总在使用Adb命令,为了逻辑的严谨和代码容错,想在传输数据的之前,PC和移动端先建立一次会话,防止移动端还未连接就直接传输数据会报错,找了很久并 ...
leetcode动态规划笔记二
动态规划题目分类一维dp 矩阵型DP Unique Paths II : 矩阵型DP,求所有方法总数 Minimum Path Sum:矩阵型,求最大最小值 Triangle : 矩阵型,求最大最 ...
jQuery基础事件处理
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8&quo ...
RabbitMQ之消息模式
目的: 消息如何保证100%的投递幂等性概念 Confirm确认消息 Return返回消息自定义消费者前言: 想必知道消息中间件RabbitMQ的小伙伴,对于引入中间件的好处可以起到抗高并发,削 ...
php GD 和图像处理函数, 制作一张图片
php GD 和图像处理函数, 制作一张图片 // GD 和图像处理函数 // https://www.php.net/manual/zh/ref.image.php // https://www.p ...
python_封装redis_hash方法
xshell 进入虚拟环境安装 redis workon py3env # 进入虚拟环境 pip install redis # 安装redis deactivate # 退出虚拟环境简单的封装 ...
linq根据英文首字母姓名排序
names.Sort((a, b) => a.name.CompareTo(b.name));

sklearn.model_selection 的train_test_split方法和参数

sklearn.model_selection 的train_test_split方法和参数的更多相关文章

随机推荐

热门专题