Learn Stats for Python: Descriptive Statistics I

Learn Stats for Python: Descriptive Statistics II + Data Visualization

BY IVÁN PALOMARES CARRASCOSAPOSTED ON AUGUST 28, 2024

In today's world, pervaded by data and AI-driven technologies and solutions,

mastering their foundations is a guaranteed gateway to unlocking powerful insights from data and making effective and reliable data-driven decisions.

One such family of foundational notions comes from nowhere other than statistics. Given its versatility and capabilities, as well as its popularity in data analysis and AI applications, learning stats with the aid of the Python programming language is an ideal approach to learning statistical concepts and putting them in practice: all at the same time!

This comprehensive tutorial series, consisting of five parts, curates and links together these "learn stats for Python" tutorials, providing you with a strong foundational learning pathway in both programming and statistics. Each tutorial is designed to be short, straight to the point, and easy to digest.

Descriptive Statistics I

Part I of the series focuses on tutorials to get started with the most essential pillar of statistics: descriptive statistics. Descriptive statistics encompass tools and techniques to summarize and describe the main characteristics of a dataset, its distribution, variability, tendency, etc.

1. Data Preparation Essentials

The first group of tutorials we curated for you focuses on the essential preliminary steps needed before conducting any statistical analysis (even the most basic ones). These steps include cleaning, normalizing, and transforming your initial data to ensure consistent and accurate analysis results thereafter. Data preparation is essential as it sets the foundation for any subsequent statistical computations. These selected tutorials illustrate how to use Python to perform some of the most frequent data preparation steps:

normalize data in Python

remove outliers in Python

perform data binning in Python

transform data in Python

2. Descriptive Statistics Fundamentals

Now it's time to plunge into pure stats.

The following group of tutorials covers the central notions of descriptive statistics, that is, summarizing and describing the main characteristics of your (previously prepared) data: \(\large \text{ mean, median, variability, skewness, percentiles, and more }\).

It is important to understand these essentials to uncover the "appearance" of your data: the first step toward interpreting and communicating insightful patterns underlying them.

These two tutorials showcase the calculation of mean, median, and mode, using two different Python libraries:

How to calculate mean, median, and mode with numpy

How to calculate mean, median, and mode in pandas

Meanwhile, other basic statistical properties are covered in these tutorials:

Calculate Sample & Population Variance in Python

Calculate the Standard Deviation of a List in Python

Calculate Skewness & Kurtosis in Python

Calculate Percentiles in Python

3. Frequencies and Distributions

After learning to calculate the most common statistics used to describe the characteristics of your data, the next natural step is to learn mechanisms to analyze the distribution of the data, bet it across categories or through intervals. Accordingly, the next few tutorials will teach you how to build frequency tables, calculate relative frequencies upon absolute frequencies, and work with contingency tables to summarize relationships between categorical variables, among others.

Create Frequency Tables in Python

Calculate Relative Frequency in Python

Create a Contingency Table in Python

Calculate Expected Value in Python

4. Correlation and Covariance Metrics

To finalize part I of this tutorial series, let's put together some tutorials aimed at exploring correlation and covariance measures.

These are key statistical metrics to analyze and explore the relationship between variables in your data. The tutorials below illustrate how to calculate and interpret several types of correlations, perform correlation tests, and build correlation and covariance matrices for uncovering hidden connections between parts of your data. These tools are very relevant and constitute part of the foundations behind predictive modeling, AI, and machine learning systems: not without reason, making predictions and inferences intelligently entails discovering the hidden relationships in our data.

Calculate Correlation in Python

Calculate Spearman Rank Correlation in Python

Perform a Correlation Test in Python

Create a Correlation Matrix in Python

Create a Covariance Matrix in Python

Coming Up Next

In the next post in this series, we will wrap up with additional and advanced descriptive statistics topics, and move on to statistical data visualization tools.

Descriptive Statistics II

Part II of the series continues introducing more important concepts from descriptive statistics, namely similarity and distance measures, and provides a brief exploration of some advanced and applied topics, predominantly under an exploratory data analysis viewpoint, such as clustering. After this, we move on to Python tutorials covering data visualization techniques.

1. Similarity and Distance Measures

Methods to quantify the similarity or dissimilarity between data points, samples, or populations, are crucial in a variety of statistical analysis methodologies and machine learning techniques: clustering, pattern recognition, classification, etc.

The following tutorials teach you how to use Python to apply metrics that indicate how close or far apart your data points are. Mastering these metrics is key to being able to compare datasets, group similar data points together in a coherent manner, or detect anomalies or data points that significantly deviate from the rest.

calculate Euclidean distance in Python

calculate Manhattan distance in Python

calculate Jaccard similarity in Python

calculate Mahalanobis distance in Python

2. Advanced and Applied Topics

Having learned at this point the descriptive statistics foundations, and before moving on to the next topic in this journey (data visualization), it’s the perfect time to have a glimpse at some more complex statistics-based techniques and practical applications. This way, you’ll gain some insight into specialized methodologies commonly used in data science and analytics. In part V of this series, we will put the lens on more advanced predictive solutions like predictions and forecasting. But for now, let’s cover some tutorials aimed at guiding you through clustering data, univariate and bivariate analysis, and multi-dimensional scaling. These are common methods for solving real-world challenges requiring some statistical rigor.

Perform K-Means Clustering in Python

Use the Elbow method in Python

Perform Multidimensional Scaling in Python

Perform Univariate Analysis in Python

Perform Bivariate Analysis in Python

Data Visualization

1. Basic Data Visualizations

Visualizing data is a valuable way of getting further insight into understanding what the data looks like, and discovering what key patterns and trends they exhibit. Choosing the right visualization or chart type heavily depends on the nature of your data and what properties of the data you want to display. The following tutorials cover some foundational plotting techniques deemed essential for visualizing data distributions and relationships clearly and effectively. They combine the use of several well-known Python libraries for data visualization, such as seaborn and matplotlib, as well as pandas for handling data structures.

Create barplots in seaborn

Create a stacked bar chart in Pandas

Create a Histogram from a Pandas Series

Create a Relative Frequency Histogram in Matplotlib

Create a Pie Chart in Seaborn

Create a Scatter Plot from a Pandas DataFrame

Create Heatmaps in Seaborn

2. Advanced Data Visualization

To go one step beyond in creating powerful and insightful data visualization, try exploring these Python tutorials that showcase the creation of more specialized and complex plots. Some of these plots are commonly used in machine learning modeling processes and model evaluation, namely for predictive solutions like classifiers and regression models. These plots can be an interesting discovery for those who already mastered the basics of data visualization.

create a Pareto chart in Python

create a Bell curve in Python

Perform a Correlation Test in Python

Plot a ROC Curve in Python

Plot a Logistic Regression curve in Python

Coming Up Next

In the next post in this series, you'll be able to learn how to deal with probabilities and probability distributions in Python, and perform a variety of data sampling techniques.

SciTech-Mathmatics-Probability+Statistics-Descriptive Statistics I + II(using Python) and Data Visualization的更多相关文章

perl 计算方差中值平均数 Statistics::Descriptive;
http://search.cpan.org/~shlomif/Statistics-Descriptive-3.0612/lib/Statistics/Descriptive.pm use Stat ...
统计学基础知识（一）---描述统计学（Descriptive Statistics）
描述统计学(Descriptive Statistics):将数据的信息以表格, 图形或数值的形式进行汇总. 数据类型:分为定量数据(数值型数据)和定性数据(类别型数据).数值型数据又可以分为连续型和 ...
descriptive statistics|inferential statistics|Observational Studies| Designed Experiments
descriptive statistics:组织和总结信息,为自身(可以是population也可以是sample)审视和探索, inferential statistics.从sample中推论p ...
02.描述统计 (descriptive statistics)
1.数据的可靠性和有效性 2.利用图表对数据进行可视化 2.1分类变量的可视化 2.11无序分类变量 2.12有序分类变量的可视化 2.1数值变量的可视化数据的分布
《Pro SQL Server Internals, 2nd edition》的CHAPTER 3 Statistics中的Introduction to SQL Server Statistics、Statistics and Execution Plans、Statistics Maintenance(译）
<Pro SQL Server Internals> 作者: Dmitri Korotkevitch 出版社: Apress出版年: 2016-12-29页数: 804定价: USD 59 ...
Intro to Python for Data Science Learning 8 - NumPy: Basic Statistics
NumPy: Basic Statistics from:https://campus.datacamp.com/courses/intro-to-python-for-data-science/ch ...
Leetcode 80.删除排序数组中的重复项 II By Python
给定一个排序数组,你需要在原地删除重复出现的元素,使得每个元素最多出现两次,返回移除后数组的新长度. 不要使用额外的数组空间,你必须在原地修改输入数组并在使用 O(1) 额外空间的条件下完成. 示例 ...
leetcode:Pascal's Triangle II【Python版】
1.将tri初始化为[1],当rowIndex=0时,return的结果是:1,而题目要求应该是:[1],故将tri初始化为[[1]],返回结果设置为tri[0]即可满足要求: 2.最开始第二层循环是 ...
II、Python HelloWorld
大家都不是小孩子了,直接上 IDE 现在有个大问题!!没有解析器啊这样解析器地址比他多个 e OJBK
leetcode 【 Remove Duplicates from Sorted List II 】 python 实现
题目: Given a sorted linked list, delete all nodes that have duplicate numbers, leaving only distinct ...

随机推荐

pystinger实现不出网情况下，上线CS的方式
某hw过程中遇到如下情况: 获取到webshell,目标服务器不出网目标机:内网地址,端口映射到公网ipvps: pystinger地址: https://github.com/FunnyWolf/ ...
使用 PyInstaller 打包 Python 应用并解决依赖问题
使用 PyInstaller 打包 Python 应用并解决依赖问题在 Python 开发中,有时需要将程序打包成独立的可执行文件,以便分发和部署.PyInstaller 是一个广泛使用的工具,可以 ...
2025AI应用全景图谱报告
提供AI咨询+AI项目陪跑服务,有需要回复1 加粉丝群获取报告模型基础能力的提升加上自媒体的各种活跃,为AI应用提供了成长的温床,所以25年被称为了AI应用爆发的元年,这是有道理的,至少老板们在投钱 ...
Java 线程的常用操作方法
目录线程命名和取得线程的休眠线程优先级(priority) 线程命名和取得如果想要进行线程名称的操作,可以使用Thread类的如下方法: 构造方法:public Thread(Runnable ...
heapdump敏感信息提取工具-heapdump_tool(二)，附下载链接。
heapdump敏感信息查询工具,例如查找 spring heapdump中的密码明文,AK,SK等下载链接: heapdump_tool下载链接:heapdump_tool下载声明: 此工具 ...
HarmonyOS NEXT开发实战教程-记账app
今天分享的实战教程是一款记账app,最近分享的项目都是纯页面,没有服务端,没有数据接口,因为鸿蒙开发主要就是写页面,都是前端嘛.如果有友友想要完整的项目可以找幽蓝君定制,想学服务端开发的话幽蓝君也可以 ...
ceph存储介绍
一.ceph简介 ceph是一个开源的.统一的分布式存储系统,设计初衷是提供较好的性能.可靠性和可扩展性.其中"统一"是说ceph可以一套存储系统同时提供块存储设备.文件系统存储和 ...
Data aggregation and group operations in pandas
Data aggregation and group operations in pandas After loading,merging and preparing a dataset,you ma ...
.NET外挂系列：7. harmony在高级调试中的一些实战案例
一:背景 1. 讲故事如果你读完前六篇,我相信你对 harmony 的简单使用应该是没什么问题了,现在你处于手拿锤子看谁都是钉子的情况,那这篇我就找高级调试里非常经典的 3个钉子让大家捶一锤. 二 ...
在LLVM中的greedy Register Allocation pass代码详解
LLVM 贪婪寄存器分配器(RAGreedy)详细处理流程日期: 2025年5月29日摘要本文深入分析 LLVM 贪婪寄存器分配器(RAGreedy)的处理流程,详细描述从优先级队列获取虚拟寄存 ...

SciTech-Mathmatics-Probability+Statistics-Descriptive Statistics I + II(using Python) and Data Visualization

Descriptive Statistics I

1. Data Preparation Essentials

2. Descriptive Statistics Fundamentals

3. Frequencies and Distributions

4. Correlation and Covariance Metrics

Coming Up Next

Descriptive Statistics II

1. Similarity and Distance Measures

2. Advanced and Applied Topics

Data Visualization

1. Basic Data Visualizations

2. Advanced Data Visualization

Coming Up Next

SciTech-Mathmatics-Probability+Statistics-Descriptive Statistics I + II(using Python) and Data Visualization的更多相关文章

随机推荐

热门专题