MAT022 Foundations of Statistics
MAT022 Foundations of Statistics and Data Science Summative Assessment 2019/20
MAT022 Foundations of Statistics and Data Science
Summative Assessment 2019/20
Summative assessment for the module is by means of a single report on your statistical analysis
of data related to the decathlon, a combined event in athletics where an athlete’s performance in
ten track-and-field events is determined by a points system.
This form of assessment has been chosen because, as professional statisticians and data scientists,
you will often be asked to investigate a data set and report on whether it contains anything useful
or interesting. The assessment will also help you to prepare for writing your MSc dissertation in
the summer.
Assessment type Weight Max. length Format Deadline
Report 100% 10 pages PDF Friday 10 January 2020
Your report will be assessed according to how well you are able to
• analyse the data set, 40%
• interpret the results of your analysis, and 30%
• present the results of your analysis and interpretation of the data set. 30%
You are free to use any statistical software package to conduct your analysis (e.g. R or SPSS), and
any word processing software to prepare your report (e.g. LATEX or Microsoft Word).
1 The data
You are asked to write a report on data related to the decathlon. This is a combined event in
athletics where an athlete’s performance in ten track-and-field events is determined based on a
points system. The winner of the competition is the athlete who has the most points after all ten
events have been completed.
The basic decathlon data set records the performance of elite decathletes over the period from
1986 to 2006, and has been widely studied. The set consists of 7968 observations on 24 variables
as shown in Table 1, and is available on Learning Central as a .csv file.
The basic data set is also included with the GDAdata package in R and can be loaded as follows.
> install.packages("GDAdata")
> data(Decathlon, package("GDAdata")
> summary(Decathlon)
1
MAT022 Foundations of Statistics and Data Science Summative Assessment 2019/20
Variable Description
Totalpoints Total points achieved over all 10 events
DecathleteName Decathlete’s name
Nationality Decathlete’s nationality
m100 Time for the 100 metres (secs)
Longjump Distance jumped (metres)
Shotput Distance putting the shot (metres)
Highjump Height jumped (metres)
m400 Time for the 400 metres (secs)
m110hurdles Time for the 110 metres hurdles (secs)
Discus Distance throwing the discus (metres)
Polevault Height achieved (metres)
Javelin Distance throwing the javelin (metres)
m1500 Time for the 1500 metres (secs)
yearEvent Year of performance
P100m Points for performance in 100 metres
Plj Points for performance in long jump
Psp Points for performance in putting the shot
Phj Points for performance in high jump
P400m Points for performance in 400 metres
P110h Points for performance in 110 metres hurdles
Ppv Points for performance in pole vault
Pdt Points for performance in discus
Pjt Points for performance in javelin
P1500 Points for performance in 1500 metres
Table 1: The basic decathlon data set
To expand your analysis of the decathlon you are encouraged to find additional sources of data,
making sure that the provenance of the sources are evaluated and discussed in your report. You
are also encouraged to explore additional statistical methods that have not been discussed in the
lectures and notebooks, making sure that you provide a brief description of these methods along
with references to the relevant literature. You can nevertheless base your study entirely on the
data set provided, it has plenty of scope for you to produce an excellent report.
2 The report
The ability to write clearly and concisely is an important professional competence. To encourage
writing that is brief and to the point, your reports are limited to a maximum of 10 pages. It
is often far more difficult to express yourself in 100 words than in 1000 words, especially when
代写MAT022留学生作业、代做Data Science作业
you have a lot to say, so please do not underestimate the challenge posed by this restriction. The
modest page limit will also encourage you to be selective in the results you choose to present.
A suggested structure for your report is shown in Table 2. Note that the title page, abstract, table
of contents and list of references will not contribute towards the page count.
• The title page should contain the title of your report, your name and student number, and
the date on which your report was completed.
• The abstract should contain a short summary of the report and its main conclusions.
• The table of contents should list the number and title of each section against the number
of the page on which the section begins.
• The introduction should consist of a few short paragraphs, describing the purpose of the
2
MAT022 Foundations of Statistics and Data Science Summative Assessment 2019/20
Title 1 page
Abstract 100 words
Table of contents –
1. Introduction 1/2 page
2. Background 1/2 – 1 page
3. (descriptive analysis) 1 – 2 pages
4. (inferential analysis) 2 – 3 pages
5. (inferential analysis) 2 – 3 pages
6. Conclusion 1/2 page
References –
Appendices 2 pages max.
Table 2: Report structure
report and providing a brief outline of its contents.
• The background chapter should include a brief review of any relevant literature, and provide
a context for the work presented in the report.
• The report should contain a relatively short section on a descriptive analysis of the data,
with a title chosen to reflect what the section contains.
• The main part of the report should consist of one or more sections on an inferential analysis
of the data. Here you should formulate hypotheses, conduct statistical tests and discuss the
results of these tests. The titles of these sections should reflect what the sections contain.
• The conclusion should consist of a few short paragraphs, providing a summary of the report
and a brief outline of some ideas for future work.
• Any references should be typeset using the Harvard referencing style.
• The report may contain a single appendix for large figures and tables.
3 Assessment criteria
Detailed assessment criteria are shown in Table 3.
4 Guidelines for writing reports
The golden rule when writing is to always think of the reader. For scientific reports, readers
will typically want to read something interesting and to learn something in the process.
What do we mean by interesting?
Not interesting The average exam marks of statistics and data science students.
Quite interesting The average marks of male students, the average marks of female students, and
the results of a test of whether any difference is statistically significant.
Very interesting The average marks of male students, the average marks of female students, a
statistical test of whether any difference is significant, and some speculation
about why there is a significant difference, or alternatively why there is not.
3
MAT022 Foundations of Statistics and Data Science Summative Assessment 2019/20
Level Analysis
(40%)
Discussion
(30%)
Presentation
(30%)
Distinction
(70–100)
Hypotheses are interesting
and original.
Methods are appropriate
and applied
carefully and precisely.
An interesting descriptive
analysis is included
and reported correctly.
Inferences are valid
and supported by evidence.
Original and
interesting conclusions
are articulated. There
is some shrewd speculation
about possible
causal factors.
A high standard of
writing is maintained
throughout. The narrative
is clear, coherent,
eloquent and re-
fined. Figures and tables
are used creatively.
Merit
(60–69)
Hypotheses are formulated
correctly. Methods
are appropriate and
applied correctly. A
moderately interesting
descriptive analysis is
included and reported
correctly.
Inferences are valid and
supported by evidence.
Interesting conclusions
are articulated. There
is some speculation
about possible causal
factors.
A good standard or
writing is maintained
throughout. The narrative
is clear and coherent.
Figures and tables
are used to illustrate
the narrative.
Pass
(50–59)
Hypotheses are formulated
correctly. Methods
are applied correctly
for the most part.
A descriptive analysis is
included and reported
correctly.
Inferences are mostly
valid and supported
by some evidence.
Some relatively interesting
conclusions are
articulated.
An acceptable standard
of writing is maintained
throughout. The narrative
is lacklusture
and sometimes unclear.
Figures and tables do
not always illustrate
the narrative.
Fail
(0–49)
The analysis is bland
and almost entirely descriptive.
Inferences are invalid or
not supported by evidence.
There is little of
any interest.
The report is poorly
written. The narrative
is disjointed and hard
to follow.
Table 3: Assessment criteria
Audience. The target audience for your report is this year’s cohort students on the Foundations
of Statistics and Data Science module, so you can assume that your readers are familiar with the
methods and terminology established within the lectures and notebooks. If you choose to use
methods that have not been covered in lectures, you must ensure that any new terms are properly
defined, and references to the relevant literature included.
Analysis. The reader shoud be satisfied that you have performed your analysis correctly, and in
particular that you have verified the conditions that are necessary to apply the various methods.
Your methods should be introduced with a brief summary of their main features, but technical
details should not be discussed at length, although you might consider providing the interested
reader with references to the relevant literature.
Navigation. Do not assume that the reader will read the report from start to finish, as one might
read a novel. Reports should be made easy to navigate using numbered sections and subsections
together with cross-referencing. Once you have written a first draft, it will need careful editing
before it becomes a coherent and polished report. This stage always takes longer than you think!
4
MAT022 Foundations of Statistics and Data Science Summative Assessment 2019/20
Scientific writing. For scientific reports we aim for a style of writing that is clear and concise.
Make sure that sentences are unambiguous and that a good standard of writing is maintained
throughout the report.
• Sections should not start abruptly with the subject matter, but rather with an introductory
sentence or short paragraph. Sections should also end with concluding sentence or short
paragraph.
• All figures and tables must be numbered and have captions. Figures or tables that are not
mentioned at least once in the text should not be included.
• A qualified statement is one that express some level of uncertainty about its own accuracy,
and should always be used when drawing conclusions from the results of a statistical analysis,
and especially when speculating about possible causal factors. Common phrases that indicate
qualified statements include “This suggests that ...”, “It appears that ...”, “We might conclude
that ...”, “There is some evidence to indicate ...” and so on.
• Be careful with florid turns of phrase, whatever their merit as literature. Academic reports
are primarily a way of communicating information, and care must be taken to accommodate
readers from diverse backgrounds, including non-native English speakers. Reports should
use everyday words and simple grammatical structures as far as possible.
Plagiarism
The basic decathlon data set has been widely studied and you will find plenty of material online
about this data. Plagiarism is presenting other people’s work and ideas or ideas as your own, by
incorporating it into your work without full acknowledgement. The need to acknowledge others’
work applies not only to text, but also to computer code, figures, tables etc. You must also attribute
text, data, or other resources downloaded from websites. Following submission your report will be
analysed by the TurnitIn software, and any report in which plagiarism is detected will receive a
mark of zero.
Please submit your report via Learning Central on or before Friday 10 January 2020.
5
因为专业,所以值得信赖。如有需要,请加QQ:99515681 或 微信:codehelp
MAT022 Foundations of Statistics的更多相关文章
- 关于条件约束问题的无偏差统计——一个偏差控制型生成器(Unbiased Statistics of a Constraint Satisfaction Problem – a Controlled-Bias Generator——by Denis Berthier)
论文地址:https://hal.archives-ouvertes.fr/hal-00641955 Unbiased Statistics of a Constraint Satisfaction ...
- ABBA BABA statistics
The ABBA BABA statistics are used to detect and quantify an excess of shared derived alleles, which ...
- SQL Server 的 Statistics 簡介
當你要清空「資料表(table)」,或倒入大量「資料(data;record)」,或公司「資料庫(database)」改用新版本要資料大搬家…等情形,不只是要重建「索引(index)」,還應要重建或更 ...
- SP2-0618: 无法找到会话标识符。启用检查 PLUSTRACE 角色 SP2-0611: 启用 STATISTICS 报告时出错
援引: SP2-0618: 无法找到会话标识符.启用检查 PLUSTRACE 角色 SP2-0611: 启用 STATISTICS 报告时出错 问题描述及解决方法: SQL*Plus: Release ...
- Spark MLlib 之 Basic Statistics
Spark MLlib提供了一些基本的统计学的算法,下面主要说明一下: 1.Summary statistics 对于RDD[Vector]类型,Spark MLlib提供了colStats的统计方法 ...
- SQL优化 CREATE STATISTICS
CREATE STATISTICS 语法: https://msdn.microsoft.com/zh-cn/library/ms188038.aspx STATISTICS优化中的使用案例: htt ...
- [转] 利用SET STATISTICS IO和SET STATISTICS TIME 优化SQL Server查询性能
首先需要说明的是这篇文章的内容并不是如何调节SQL Server查询性能的(有关这方面的内容能写一本书),而是如何在SQL Server查询性能的调节中利用SET STATISTICS IO和SET ...
- 性能调优:理解Set Statistics IO输出
性能调优是DBA的重要工作之一.很多人会带着各种性能上的问题来问我们.我们需要通过SQL Server知识来处理这些问题.经常被问到的一个问题是:早上这个存储过程运行时间还是可以的,但到了晚上就很慢很 ...
- Stanford机器学习笔记-3.Bayesian statistics and Regularization
3. Bayesian statistics and Regularization Content 3. Bayesian statistics and Regularization. 3.1 Und ...
随机推荐
- [转帖]Java升级那么快,多个版本如何灵活切换和管理?
Java升级那么快,多个版本如何灵活切换和管理? https://segmentfault.com/a/1190000021037771 前言 近两年,Java 版本升级频繁,感觉刚刚掌握 Java8 ...
- Unable to resolve service for type 'Microsoft.Extensions.Logging.ILogger' while attempting to activate 'xxxxx.Controllers.xxxxController'.
Unable to resolve service for type 'Microsoft.Extensions.Logging.ILogger' while attempting to activa ...
- day04——列表、元组、range
day04 列表 列表--list 有序,可变,支持索引 列表:存储数据,支持的数据类型很多:字符串,数字,布尔值,列表,集合,元组,字典,用逗号分割的是一个元素 id() :获取对象的内存地址 ...
- 深度学习-Wasserstein GAN论文理解笔记
GAN存在问题 训练困难,G和D多次尝试没有稳定性,Loss无法知道能否优化,生成样本单一,改进方案靠暴力尝试 WGAN GAN的Loss函数选择不合适,使模型容易面临梯度消失,梯度不稳定,优化目标不 ...
- Codeforces Round #426 (Div. 1) (ABCDE)
1. 833A The Meaningless Game 大意: 初始分数为$1$, 每轮选一个$k$, 赢的人乘$k^2$, 输的人乘$k$, 给定最终分数, 求判断是否成立. 判断一下$a\cdo ...
- HttpClient参观记:.net core 2.2 对HttpClient到底做了神马
.net core 于 10月17日发布了 ASP.NET Core 2.2.0 -preview3,在这个版本中,我看到了一个很让我惊喜的新特性:HTTP Client Performance Im ...
- Java数据结构-ArrayList最细致的解析笔记
ArrayList是一个类,这个类有一个数组参数elementData,ArrayList集合中的元素正是保存在这个数组中,它继承了数组查询的高性能,参考第3篇.ArrayList还封装了很多方法,便 ...
- 继承与构造函数(base关键字)
1.背景 我:虽然通过继承减少了代码冗余,但是,每一个子类的构造函数还是需要给所有属性赋值的,很麻烦的. 师:这个好办,用base就行啦. 我:贝司?还吉他呢! 师:别急,首先我们先介绍下实例化子类对 ...
- 【转载】C#中List集合使用Max()方法查找到最大值
在C#的List集合操作中,有时候需要查找到List集合中的最大值,此时可以使用List集合的扩展方法Max方法,Max方法有2种形式,一种是不带任何参数的形式,适用于一些值类型变量的List集合,另 ...
- 解决for循环中异步处理(异步变同步)
前沿:参考ES6语法的async/await的处理机制 先上一段代码 function getMoney(){ var money=[100,200,300] for( let i=0; i<m ...