SciTech-Mathmatics-Probability+Statistics-Descriptive Statistics I + II(using Python) and Data Visualization
Learn Stats for Python: Descriptive Statistics I
Learn Stats for Python: Descriptive Statistics II + Data Visualization
BY IVÁN PALOMARES CARRASCOSAPOSTED ON AUGUST 28, 2024
In today's world, pervaded by data and AI-driven technologies and solutions,
mastering their foundations is a guaranteed gateway to unlocking powerful insights from data and making effective and reliable data-driven decisions.
One such family of foundational notions comes from nowhere other than statistics. Given its versatility and capabilities, as well as its popularity in data analysis and AI applications, learning stats with the aid of the Python programming language is an ideal approach to learning statistical concepts and putting them in practice: all at the same time!
This comprehensive tutorial series, consisting of five parts, curates and links together these "learn stats for Python" tutorials, providing you with a strong foundational learning pathway in both programming and statistics. Each tutorial is designed to be short, straight to the point, and easy to digest.
Descriptive Statistics I
Part I of the series focuses on tutorials to get started with the most essential pillar of statistics: descriptive statistics. Descriptive statistics encompass tools and techniques to summarize and describe the main characteristics of a dataset, its distribution, variability, tendency, etc.
1. Data Preparation Essentials
The first group of tutorials we curated for you focuses on the essential preliminary steps needed before conducting any statistical analysis (even the most basic ones). These steps include cleaning, normalizing, and transforming your initial data to ensure consistent and accurate analysis results thereafter. Data preparation is essential as it sets the foundation for any subsequent statistical computations. These selected tutorials illustrate how to use Python to perform some of the most frequent data preparation steps:
normalize data in Python
remove outliers in Python
perform data binning in Python
transform data in Python
2. Descriptive Statistics Fundamentals
Now it's time to plunge into pure stats.
The following group of tutorials covers the central notions of descriptive statistics, that is, summarizing and describing the main characteristics of your (previously prepared) data: \(\large \text{ mean, median, variability, skewness, percentiles, and more }\).
It is important to understand these essentials to uncover the "appearance" of your data: the first step toward interpreting and communicating insightful patterns underlying them.
These two tutorials showcase the calculation of mean, median, and mode, using two different Python libraries:
How to calculate mean, median, and mode with numpy
How to calculate mean, median, and mode in pandas
Meanwhile, other basic statistical properties are covered in these tutorials:
Calculate Sample & Population Variance in Python
Calculate the Standard Deviation of a List in Python
Calculate Skewness & Kurtosis in Python
Calculate Percentiles in Python
3. Frequencies and Distributions
After learning to calculate the most common statistics used to describe the characteristics of your data, the next natural step is to learn mechanisms to analyze the distribution of the data, bet it across categories or through intervals. Accordingly, the next few tutorials will teach you how to build frequency tables, calculate relative frequencies upon absolute frequencies, and work with contingency tables to summarize relationships between categorical variables, among others.
Create Frequency Tables in Python
Calculate Relative Frequency in Python
Create a Contingency Table in Python
Calculate Expected Value in Python
4. Correlation and Covariance Metrics
To finalize part I of this tutorial series, let's put together some tutorials aimed at exploring correlation and covariance measures.
These are key statistical metrics to analyze and explore the relationship between variables in your data. The tutorials below illustrate how to calculate and interpret several types of correlations, perform correlation tests, and build correlation and covariance matrices for uncovering hidden connections between parts of your data. These tools are very relevant and constitute part of the foundations behind predictive modeling, AI, and machine learning systems: not without reason, making predictions and inferences intelligently entails discovering the hidden relationships in our data.
Calculate Correlation in Python
Calculate Spearman Rank Correlation in Python
Perform a Correlation Test in Python
Create a Correlation Matrix in Python
Create a Covariance Matrix in Python
Coming Up Next
In the next post in this series, we will wrap up with additional and advanced descriptive statistics topics, and move on to statistical data visualization tools.
Descriptive Statistics II
Part II of the series continues introducing more important concepts from descriptive statistics, namely similarity and distance measures, and provides a brief exploration of some advanced and applied topics, predominantly under an exploratory data analysis viewpoint, such as clustering. After this, we move on to Python tutorials covering data visualization techniques.
1. Similarity and Distance Measures
Methods to quantify the similarity or dissimilarity between data points, samples, or populations, are crucial in a variety of statistical analysis methodologies and machine learning techniques: clustering, pattern recognition, classification, etc.
The following tutorials teach you how to use Python to apply metrics that indicate how close or far apart your data points are. Mastering these metrics is key to being able to compare datasets, group similar data points together in a coherent manner, or detect anomalies or data points that significantly deviate from the rest.
calculate Euclidean distance in Python
calculate Manhattan distance in Python
calculate Jaccard similarity in Python
calculate Mahalanobis distance in Python
2. Advanced and Applied Topics
Having learned at this point the descriptive statistics foundations, and before moving on to the next topic in this journey (data visualization), it’s the perfect time to have a glimpse at some more complex statistics-based techniques and practical applications. This way, you’ll gain some insight into specialized methodologies commonly used in data science and analytics. In part V of this series, we will put the lens on more advanced predictive solutions like predictions and forecasting. But for now, let’s cover some tutorials aimed at guiding you through clustering data, univariate and bivariate analysis, and multi-dimensional scaling. These are common methods for solving real-world challenges requiring some statistical rigor.
Perform K-Means Clustering in Python
Use the Elbow method in Python
Perform Multidimensional Scaling in Python
Perform Univariate Analysis in Python
Perform Bivariate Analysis in Python
Data Visualization
1. Basic Data Visualizations
Visualizing data is a valuable way of getting further insight into understanding what the data looks like, and discovering what key patterns and trends they exhibit. Choosing the right visualization or chart type heavily depends on the nature of your data and what properties of the data you want to display. The following tutorials cover some foundational plotting techniques deemed essential for visualizing data distributions and relationships clearly and effectively. They combine the use of several well-known Python libraries for data visualization, such as seaborn and matplotlib, as well as pandas for handling data structures.
Create barplots in seaborn
Create a stacked bar chart in Pandas
Create a Histogram from a Pandas Series
Create a Relative Frequency Histogram in Matplotlib
Create a Pie Chart in Seaborn
Create a Scatter Plot from a Pandas DataFrame
Create Heatmaps in Seaborn
2. Advanced Data Visualization
To go one step beyond in creating powerful and insightful data visualization, try exploring these Python tutorials that showcase the creation of more specialized and complex plots. Some of these plots are commonly used in machine learning modeling processes and model evaluation, namely for predictive solutions like classifiers and regression models. These plots can be an interesting discovery for those who already mastered the basics of data visualization.
create a Pareto chart in Python
create a Bell curve in Python
Perform a Correlation Test in Python
Plot a ROC Curve in Python
Plot a Logistic Regression curve in Python
Coming Up Next
In the next post in this series, you'll be able to learn how to deal with probabilities and probability distributions in Python, and perform a variety of data sampling techniques.
SciTech-Mathmatics-Probability+Statistics-Descriptive Statistics I + II(using Python) and Data Visualization的更多相关文章
- perl 计算方差中值平均数 Statistics::Descriptive;
http://search.cpan.org/~shlomif/Statistics-Descriptive-3.0612/lib/Statistics/Descriptive.pm use Stat ...
- 统计学基础知识(一)---描述统计学(Descriptive Statistics)
描述统计学(Descriptive Statistics):将数据的信息以表格, 图形或数值的形式进行汇总. 数据类型:分为定量数据(数值型数据)和定性数据(类别型数据).数值型数据又可以分为连续型和 ...
- descriptive statistics|inferential statistics|Observational Studies| Designed Experiments
descriptive statistics:组织和总结信息,为自身(可以是population也可以是sample)审视和探索, inferential statistics.从sample中推论p ...
- 02.描述统计 (descriptive statistics)
1.数据的可靠性和有效性 2.利用图表对数据进行可视化 2.1分类变量的可视化 2.11无序分类变量 2.12有序分类变量的可视化 2.1数值变量的可视化 数据的分布
- 《Pro SQL Server Internals, 2nd edition》的CHAPTER 3 Statistics中的Introduction to SQL Server Statistics、Statistics and Execution Plans、Statistics Maintenance(译)
<Pro SQL Server Internals> 作者: Dmitri Korotkevitch 出版社: Apress出版年: 2016-12-29页数: 804定价: USD 59 ...
- Intro to Python for Data Science Learning 8 - NumPy: Basic Statistics
NumPy: Basic Statistics from:https://campus.datacamp.com/courses/intro-to-python-for-data-science/ch ...
- Leetcode 80.删除排序数组中的重复项 II By Python
给定一个排序数组,你需要在原地删除重复出现的元素,使得每个元素最多出现两次,返回移除后数组的新长度. 不要使用额外的数组空间,你必须在原地修改输入数组并在使用 O(1) 额外空间的条件下完成. 示例 ...
- leetcode:Pascal's Triangle II【Python版】
1.将tri初始化为[1],当rowIndex=0时,return的结果是:1,而题目要求应该是:[1],故将tri初始化为[[1]],返回结果设置为tri[0]即可满足要求: 2.最开始第二层循环是 ...
- II、Python HelloWorld
大家都不是小孩子了,直接上 IDE 现在有个大问题!!没有解析器啊 这样 解析器地址比他多个 e OJBK
- leetcode 【 Remove Duplicates from Sorted List II 】 python 实现
题目: Given a sorted linked list, delete all nodes that have duplicate numbers, leaving only distinct ...
随机推荐
- EFCore先DBFirst,再CodeFirst(针对老项目迁移)
参照文章: CodeFirst命令介绍:Scaffold-DbContext 命令使用 - 跟着阿笨一起玩.NET - 博客园 (cnblogs.com) 整体流程介绍:NetCore 中 EFcor ...
- 【HUST】网络攻防实践|TCP会话劫持+序列号攻击netcat对话
文章目录 一.前言 1. 实验环境 2. 攻击对象 3. 攻击目的 4. 最终效果 docker的使用 新建docker docker常用指令 二.正式开始 过程记录 1. ARP欺骗 2. 篡改数据 ...
- 全国海域潮汐表查询微信小程序详情教程及代码
最近在做一个全国海域潮汐表查询,可以为赶海钓鱼爱好者提供涨潮退潮时间表及潮高信息.下面教大家怎么做一个这样的小程序.主要功能,根据IP定位地理位置,自动查询出省份或城市的港口,进入后预测7天内港口潮汐 ...
- Linux 常识和操作(常用命令)
1. 存放用户账号的文件在哪里? /etc/passwd 2. 如何删除一个非空的目录? rm -rf 目录名 3. 查看当前的工作目录用什么命令? pwd 4. 创建一个文件夹用什么命令? mkdi ...
- 鸿蒙Next开发实战教程-使用WebSocket实现即时聊天
鸿蒙系统提供了WebSocket库,使用它可以很方面的实现即时聊天功能,今天就使用WebSocket来实现一个完整的聊天功能. 首先创建一个WebSocket实例: let ws = webSocke ...
- LinqHelper拓展
public static class LinqHelper { //NHibernate.Hql.Ast.HqlBooleanExpression public static Expression& ...
- 用 Tarjan 算法求解有向图的强连通分量
图论中的连通性概念是许多算法与应用的基础.当我们研究网络结构.依赖关系或路径问题时,理解图中的连通性质至关重要.对于不同类型的图,连通性有着不同的表现形式和算法解决方案. 无向图与有向图的连通性 在无 ...
- GPT 1-3 简单介绍
GPT-1 简介 2018年6月,OpenAI公司发表了论文"Improving Language Understanding by Generative Pretraining" ...
- ListBox横向排布Item
<Window x:Class="TwoColumnListBox.MainWindow" xmlns="http://schemas.microsoft.com/ ...
- 赛前十天——递归(easy)
*理论上,递归与循环是等价的,即任何循环都可以重写为递归形式 eg: package javaPractice; public class Contest { public stati ...