coursera课程Text Retrieval and Search Engines之Week 2 Overview
Week 2 OverviewHelp Center
Week 2
On this page:
- Instructional Activities
- Time
- Goals and Objectives
- Key Phrases/Concepts
- Guiding Questions
- Readings and Resources
- Video Lectures
- Tips for Success
- Getting and Giving Help
Instructional Activities
Below is a list of the activities and assignments available to you this week. See the How to Pass the Class page to know which assignments pertain to the badge or badges you are pursuing. Click on the name of each activity for more detailed instructions.
| Relevant Badges | Activity | Due Date* | Estimated Time Required |
|---|---|---|---|
| Week 2 Video Lectures | Sunday, April 5 (Suggested) |
3 hours | |
![]() ![]() |
Programming Assignment Part 1 | Sunday, April 5 | 2-3 hours |
![]() ![]() |
Week 2 Quiz | Sunday, April 19 | ~0.5 hours |
* All deadlines are at 11:55 PM Central Time (time zone conversion) unless otherwise noted.
Time
This module will last 7 days and should take approximately 6 hours of dedicated time to complete, with its readings and assignments.
Goals and Objectives
After you actively engage in the learning experiences in this module, you should be able to:
- Explain what an inverted index is and how to construct it for a large set of text documents that do not fit into the memory.
- Explain how variable-length encoding can be used to compress integers and how unary coding and gamma-coding work.
- Explain how scoring of documents in response to a query can be done quickly by using an inverted index.
- Explain what Zipf’s law is.
- Explain what the Cranfield evaluation methodology is and how it works for evaluating a text retrieval system.
- Explain how to evaluate a set of retrieved documents and how to compute precision, recall, and F1.
- Explain how to evaluate a ranked list of documents.
- Explain how to compute and plot a precision-recall curve.
- Explain how to compute average precision and mean average precision (MAP).
- Explain how to evaluate a ranked list with multi-level relevance judgments.
- Explain how to compute normalized discounted cumulative gain.
- Explain why it is important to perform a statistical significance test.
Key Phrases/Concepts
Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.
- Inverted index; postings
- Binary coding; unary coding; gamma-coding; d-gap
- Zipf’s law
- Cranfield evaluation methodology
- Precision; recall
- Average precision; mean average precision (MAP); geometric mean average precision (gMAP)
- Reciprocal rank; mean reciprocal rank
- F-measure
- Normalized discounted cumulative gain (nDCG)
- Statistical significance test
Guiding Questions
Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.
- What is the typical architecture of a text retrieval system?
- What is an inverted index?
- Why is it desirable for compressing an inverted index?
- How can we create an inverted index when the collection of documents does not fit into the memory?
- How can we leverage an inverted index to score documents quickly?
- Why is evaluation so critical for research and application development in text retrieval?
- How does Cranfield evaluation methodology work?
- How do we evaluate a set of retrieved documents?
- How do you compute precision, recall, and F1?
- How do we evaluate a ranked list of search results?
- How do you compute average precision? How do you compute mean average precision (MAP) and geometric mean average precision (gMAP)?
- What is mean reciprocal rank?
- Why is MAP more appropriate than precision at k documents when comparing two retrieval methods?
- Why is precision at k documents more meaningful than average precision from a user’s perspective?
- How can we evaluate a ranked list of search results using multi-level relevance judgments?
- How do you compute normalized discounted cumulative gain (nDCG)?
- Why is normalization necessary in nDCG? Does MAP need a similar normalization?
- Why is it important to perform a statistical significance test when we compare the retrieval accuracies of two search engine systems?
Readings and Resources
The following readings are optional:
- Mark Sanderson. "Test Collection Based Evaluation of Information Retrieval Systems." Foundations and Trends in Information Retrieval 4(4): 247-375 (2010).
- Ian H. Witten, Alistair Moffat, and Timothy C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images, Second Edition. Morgan Kaufmann, 1999.
Video Lectures
| Video Lecture | Lecture Notes | Transcript | Video Download | SRT Caption File | Forum |
|---|---|---|---|---|---|
2.1 Implementation of TR Systems(00:21:27) |
(28.3 MB) |
||||
2.2 System Implementation: Inverted Index Construction(00:18:21) |
(24.4 MB) |
||||
2.3 System Implementation: Fast Search(00:17:11) |
(23.0 MB) |
||||
2.4 Evaluation of TR Systems(00:10:10) |
(14.1 MB) |
||||
2.5 Evaluation of TR Systems: Basic Measures(00:12:54) |
(17.3 MB) |
||||
2.6 Evaluation of TR Systems: Evaluating a Ranked List - Part 1(00:15:51) |
(20.5 MB) |
||||
2.6 Evaluation of TR Systems: Evaluating a Ranked List - Part 2(00:10:01) |
(13.8 MB) |
||||
2.7 Evaluation of TR Systems: Multi-Level Judgements(00:10:48) |
(14.3 MB) |
||||
2.8 Evaluation of TR Systems: Practical Issues(00:15:14) |
(20.8 MB) |
Tips for Success
To do well this week, I recommend that you do the following:
- Review the video lectures a number of times to gain a solid understanding of the key questions and concepts introduced this week.
- When possible, provide tips and suggestions to your peers in this class. As a learning community, we can help each other learn and grow. One way of doing this is by helping to address the questions that your peers pose. By engaging with each other, we’ll all learn better.
- It’s always a good idea to refer to the video lectures and chapter readings we've read during this week and reference them in your responses. When appropriate, critique the information presented.
- Take notes while you read the materials and watch the lectures for this week. By taking notes, you are interacting with the material and will find that it is easier to remember and to understand. With your notes, you’ll also find that it’s easier to complete your assignments. So, go ahead, do yourself a favor; take some notes!
Getting and Giving Help
You can get/give help via the following means:
- Use the Learner Help Center to find information regarding specific technical problems. For example, technical problems would include error messages, difficulty submitting assignments, or problems with video playback. You can access the Help Center by clicking on theHelp Center link at the top right of any course page. If you cannot find an answer in the documentation, you can also report your problem to the Coursera staff by clicking on the Contact Us! link available on each topic's page within the Learner Help Center.
- Use the Content Issues forum to report errors in lecture video content, assignment questions and answers, assignment grading, text and links on course pages, or the content of other course materials. University of Illinois staff and Community TAs will monitor this forum and respond to issues.
As a reminder, the instructor is not able to answer emails sent directly to his account. Rather, all questions should be reported as described above.
from: https://class.coursera.org/textretrieval-001/wiki/Week2Overview
coursera课程Text Retrieval and Search Engines之Week 2 Overview的更多相关文章
- coursera课程Text Retrieval and Search Engines之Week 1 Overview
Week 1 OverviewHelp Center Week 1 On this page: Instructional Activities Time Goals and Objectives K ...
- coursera课程Text Retrieval and Search Engines之Week 3 Overview
Week 3 OverviewHelp Center Week 3 On this page: Instructional Activities Time Goals and Objectives K ...
- coursera课程Text Retrieval and Search Engines之Week 4 Overview
Week 4 OverviewHelp Center Week 4 On this page: Instructional Activities Time Goals and Objectives K ...
- 【Python学习笔记】Coursera课程《Using Databases with Python》 密歇根大学 Charles Severance——Week4 Many-to-Many Relationships in SQL课堂笔记
Coursera课程<Using Databases with Python> 密歇根大学 Week4 Many-to-Many Relationships in SQL 15.8 Man ...
- 【Python学习笔记】Coursera课程《Using Python to Access Web Data》 密歇根大学 Charles Severance——Week6 JSON and the REST Architecture课堂笔记
Coursera课程<Using Python to Access Web Data> 密歇根大学 Week6 JSON and the REST Architecture 13.5 Ja ...
- 【Python学习笔记】Coursera课程《Using Python to Access Web Data 》 密歇根大学 Charles Severance——Week2 Regular Expressions课堂笔记
Coursera课程<Using Python to Access Web Data > 密歇根大学 Charles Severance Week2 Regular Expressions ...
- Coursera课程下载和存档计划[转载]
上周三收到Coursera平台的群发邮件,大意是Coursera将在6月30号彻底关闭旧的课程平台,全面升级到新的课程平台上,一些旧的课程资源(课程视频.课程资料)将不再保存,如果你之前学习过相关的课 ...
- 【网页开发学习】Coursera课程《面向 Web 开发者的 HTML、CSS 与 Javascript》Week1课堂笔记
Coursera课程<面向 Web 开发者的 HTML.CSS 与 Javascript> Johns Hopkins University Yaakov Chaikin Week1 In ...
- 【DeepLearning学习笔记】Coursera课程《Neural Networks and Deep Learning》——Week2 Neural Networks Basics课堂笔记
Coursera课程<Neural Networks and Deep Learning> deeplearning.ai Week2 Neural Networks Basics 2.1 ...
随机推荐
- 【LOJ】#2066. 「SDOI2016」墙上的句子
题解 我一直也不会网络流--orz 我们分析下这道题,显然和行列没啥关系,就是想给你n + m个串 那么我们对于非回文单词之外的单词,找到两两匹配的反转单词(即使另一个反转单词不会出现也要建出来) 具 ...
- MongoDB图形化管理工具Toad Mac Edition
昨天介绍了在Mac上安装MongoDB,安装好并配置环境变量后,在终端上用mongo命令就可以进入MongoDB的命令行管理界面,但我更习惯在图形化界面下管理数据库,这样更直观.今天我再介绍一款在Ma ...
- WebLogic和Tomcat的区别
J2ee开发主要是浏览器和服务器进行交互的一种结构.逻辑都是在后台进行处理,然后再把结果传输回给浏览器.可以看出服务器在这种架构是非常重要的. 这几天接触到两种Java的web服务器,做项目用的Tom ...
- ssm框架常见问题
搭建SSM框架时,总是遇到这样那样的问题,有的一眼就能看出来,有的需要经验的积累.现将自己搭建SSM框架时遇到的典型问题总结如下: 一.Struts2框架下的action中无法使用@Autowired ...
- [leetcode shell]192. Word Frequency
统计words.txt中每个单词出现的次数并排序 解法1: cat words.txt | tr -s ' ' '\n' | sort | uniq -c | sort -r | awk '{prin ...
- 网站漏洞扫描工具Uniscan
网站漏洞扫描工具Uniscan 网站漏洞的种类有很多种,如何快速扫描寻找漏洞,是渗透测试人员面临的一个棘手问题.Uniscan是Kali Linux预先安装的一个网站漏洞扫描工具.该工具可以针对单 ...
- [ 转载 ] Java基础14--创建线程的两个方法
http://www.cnblogs.com/whgw/archive/2011/10/03/2198506.html Java提供了线程类Thread来创建多线程的程序.其实,创建线程与创建普通的类 ...
- C# 集合类-接口
所谓,程序=数据结构+算法. 我目前的日常工作就是繁琐的业务流程和增删改查之类的. 其实繁琐的业务流程也不过是改变一下数据的状态.怪不得叫,面向数据库编程.哈哈. 所以呢,了解一下各种 .net内置的 ...
- 【2005-2006 ACM-ICPC, NEERC, Moscow Subregional Contest】Problem J. Jack-pot
简单dfs,差分一下A数组和建出字典树能写得更方便,若不这么做代码时就会像我一样难受. #include<cstdio> #include<cstring> #include& ...
- 【20181103T2】图【结论+bfs最短路】
一眼最短路 --感觉是个结论啊 建超级源汇? 什么鬼 合并ab和cd? 不一样的吗 开始想的至少有一条路径是最短路 然后发现不对: 开始对着这个图瞎想 从B开始找A的最短路,然后把到B小于等于的边赋成 ...




2.1 Implementation of TR Systems