coursera课程Text Retrieval and Search Engines之Week 1 Overview
Week 1 OverviewHelp Center
Week 1
On this page:
- Instructional Activities
- Time
- Goals and Objectives
- Key Phrases/Concepts
- Guiding Questions
- Readings and Resources
- Video Lectures
- Tips for Success
- Getting and Giving Help
Instructional Activities
Below is a list of the activities and assignments available to you this week. Click on the name of each activity for more detailed instructions.
| Relevant Badges | Activity | Due Date* | Estimated Time Required |
|---|---|---|---|
| Week 1 Video Lectures | Sunday, March 29 (Suggested) | 3 hours | |
![]() ![]() |
Programming Assignments Overview | Sunday, March 29 (Suggested) |
~1 hour |
![]() ![]() |
Week 1 Quiz | Sunday, April 19 | ~ 0.5 hour |
* All deadlines are at 11:55 PM Central Time (time zone conversion) unless otherwise noted.
Time
This module will last 7 days and should take approximately 5 hours of dedicated time to complete, with its readings and assignments.
Goals and Objectives
After you actively engage in the learning experiences in this module, you should be able to:
- Explain some basic concepts in natural language processing and text information access.
- Explain why text retrieval is often defined as a ranking problem.
- Explain how the vector space retrieval model works.
- Explain what TF-IDF weighting is and why TF transformation and document length normalization is necessary for the design of an effective ranking function.
Key Phrases/Concepts
Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.
- Part-of-speech tagging; syntactic analysis; semantic analysis; ambiguity
- “Bag of words” representation
- Push, pull, querying, browsing
- Probability Ranking Principle
- Relevance
- Vector Space Model
- Term Frequency (TF)
- Document Frequency (DF); Inverse Document Frequency (IDF)
- TF Transformation
- Pivoted length normalization
- Dot product
- BM25
Guiding Questions
Develop your answers to the following guiding questions while watching the video lectures throughout the week.
- What does a computer have to do in order to understand a natural language sentence?
- What is ambiguity?
- Why is natural language processing (NLP) difficult for computers?
- What is bag-of-words representation? Why do modern search engines use this simple representation of text?
- What are the two modes of text information access? Which mode does a Web search engine such as Google support?
- When is browsing more useful than querying to help a user find relevant information?
- Why is a text retrieval task defined as a ranking task?
- What is a retrieval model?
- What are the two assumptions made by the Probability Ranking Principle?
- What is the Vector Space Retrieval Model? How does it work?
- How do we define the dimensions of the Vector Space Model?
- What are some different ways to place a document as a vector in the vector space?
- What is Term Frequency (TF)?
- What is TF Transformation?
- What is Document Frequency (DF)?
- What is Inverse Document Frequency (IDF)?
- What is TF-IDF Weighting?
- Why do we need to penalize long documents in text retrieval?
- What is pivoted document length normalization?
- What are the main ideas behind the retrieval function BM25?
Readings and Resources
The following readings are optional:
- N. J. Belkin and W. B. Croft. "Information filtering and information retrieval: Two sides of the same coin?" Commun. ACM 35, 12 (Dec. 1992): 29-38.
- A. Singhal, C. Buckley, and M. Mitra. "Pivoted document length normalization." In Proceedings of ACM SIGIR 1996.
Video Lectures
| Video Lecture | Lecture Notes | Transcript | Video Download | SRT Caption File | Forum |
|---|---|---|---|---|---|
1.1 Natural Language Processing(00:21:05) |
(35.5 MB) |
||||
1.2 Text Access(00:09:24) |
(12.8 MB) |
||||
1.3 Text Retrieval Problem(00:26:18) |
(36.7 MB) |
||||
1.4 Overview of Text Retrieval Methods(00:10:10) |
(13.7 MB) |
||||
1.5 Vector Space Model: Basic Idea(00:09:44) |
(13.0 MB) |
||||
1.6 Vector Space Model: Instantiation(00:17:30) |
(23.1 MB) |
||||
1.7 Vector Space Model: Improved Instantiation(00:16:52) |
(22.1 MB) |
||||
1.8 TF Transformation (00:18:56) |
(12.7 MB) |
||||
1.9 Doc Length Normalization(00:18:56) |
(25.6 MB) |
Tips for Success
To do well this week, I recommend that you do the following:
- Review the video lectures a number of times to gain a solid understanding of the key questions and concepts introduced this week.
- When possible, provide tips and suggestions to your peers in this class. As a learning community, we can help each other learn and grow. One way of doing this is by helping to address the questions that your peers pose. By engaging with each other, we’ll all learn better.
- It’s always a good idea to refer to the video lectures and reference them in your responses. When appropriate, critique the information presented.
- Take notes while you watch the lectures for this week. By taking notes, you are interacting with the material and will find that it is easier to remember and to understand. With your notes, you’ll also find that it’s easier to complete your assignments. So, go ahead, do yourself a favor; take some notes!
Getting and Giving Help
You can get/give help via the following means:
- Use the Learner Help Center to find information regarding specific technical problems. For example, technical problems would include error messages, difficulty submitting assignments, or problems with video playback. You can access the Help Center by clicking on theHelp Center link at the top right of any course page. If you cannot find an answer in the documentation, you can also report your problem to the Coursera staff by clicking on the Contact Us! link available on each topic's page within the Learner Help Center.
- Use the Content Issues forum to report errors in lecture video content, assignment questions and answers, assignment grading, text and links on course pages, or the content of other course materials. University of Illinois staff and Community TAs will monitor this forum and respond to issues.
As a reminder, the instructor is not able to answer emails sent directly to his account. Rather, all questions should be reported as described above.
from: https://class.coursera.org/textretrieval-001/wiki/Week1Overview
coursera课程Text Retrieval and Search Engines之Week 1 Overview的更多相关文章
- coursera课程Text Retrieval and Search Engines之Week 2 Overview
Week 2 OverviewHelp Center Week 2 On this page: Instructional Activities Time Goals and Objectives K ...
- coursera课程Text Retrieval and Search Engines之Week 3 Overview
Week 3 OverviewHelp Center Week 3 On this page: Instructional Activities Time Goals and Objectives K ...
- coursera课程Text Retrieval and Search Engines之Week 4 Overview
Week 4 OverviewHelp Center Week 4 On this page: Instructional Activities Time Goals and Objectives K ...
- 【Python学习笔记】Coursera课程《Using Databases with Python》 密歇根大学 Charles Severance——Week4 Many-to-Many Relationships in SQL课堂笔记
Coursera课程<Using Databases with Python> 密歇根大学 Week4 Many-to-Many Relationships in SQL 15.8 Man ...
- 【Python学习笔记】Coursera课程《Using Python to Access Web Data》 密歇根大学 Charles Severance——Week6 JSON and the REST Architecture课堂笔记
Coursera课程<Using Python to Access Web Data> 密歇根大学 Week6 JSON and the REST Architecture 13.5 Ja ...
- 【Python学习笔记】Coursera课程《Using Python to Access Web Data 》 密歇根大学 Charles Severance——Week2 Regular Expressions课堂笔记
Coursera课程<Using Python to Access Web Data > 密歇根大学 Charles Severance Week2 Regular Expressions ...
- Coursera课程下载和存档计划[转载]
上周三收到Coursera平台的群发邮件,大意是Coursera将在6月30号彻底关闭旧的课程平台,全面升级到新的课程平台上,一些旧的课程资源(课程视频.课程资料)将不再保存,如果你之前学习过相关的课 ...
- 【网页开发学习】Coursera课程《面向 Web 开发者的 HTML、CSS 与 Javascript》Week1课堂笔记
Coursera课程<面向 Web 开发者的 HTML.CSS 与 Javascript> Johns Hopkins University Yaakov Chaikin Week1 In ...
- 【DeepLearning学习笔记】Coursera课程《Neural Networks and Deep Learning》——Week2 Neural Networks Basics课堂笔记
Coursera课程<Neural Networks and Deep Learning> deeplearning.ai Week2 Neural Networks Basics 2.1 ...
随机推荐
- (转)python随机数用法
进行以下操作前先 import random ,导入random模块 1. random.seed(int) 给随机数对象一个种子值,用于产生随机序列. 对于同一个种子值的输入,之后产生的随机数序列也 ...
- linux下文件转码
一.工具介绍 enca是一个很好用的文件转码工具,使用命令 sudo apt-get install enca 即可安装 二.基本用法 1.查看文件编码 $ enca filename 2.文件转码 ...
- php 生成二维码(qrcode)
可以用composer安装 https://packagist.org/packages/endroid/qrcode
- Category 特性在 iOS 组件化中的应用与管控
背景 iOS Category功能简介 Category 是 Objective-C 2.0之后添加的语言特性. Category 就是对装饰模式的一种具体实现.它的主要作用是在不改变原有类的前提下, ...
- python3.6 利用requests和正则表达式爬取猫眼电影TOP100
import requests from requests.exceptions import RequestException from multiprocessing import Pool im ...
- [代码审计]eml企业通讯录管理系统v5.0 存在sql注入
0x00 前言 上周五的时候想练练手,随便找了个系统下载下来看看. 然后发现还有VIP版本,但是VIP要钱,看了一下演示站,貌似也没有什么改变,多了个导入功能?没细看. 搜了一下发现这个系统,压根就没 ...
- [leetcode sort]75. Sort Colors
Given an array with n objects colored red, white or blue, sort them so that objects of the same colo ...
- android 单位 什么是屏幕密度?
韩梦飞沙 韩亚飞 313134555@qq.com yue31313 han_meng_fei_sha sp dp px in in 表示英寸, 是屏幕的物理尺寸.1英寸是2.54厘米. dp ...
- luoguP4336 [SHOI2016]黑暗前的幻想乡 容斥原理 + 矩阵树定理
自然地想到容斥原理 然后套个矩阵树就行了 求行列式的时候只有换行要改变符号啊QAQ 复杂度为\(O(2^n * n^3)\) #include <cstdio> #include < ...
- hdu 4547 LCA **
题意:在Windows下我们可以通过cmd运行DOS的部分功能,其中CD是一条很有意思的命令,通过CD操作,我们可以改变当前目录. 这里我们简化一下问题,假设只有一个根目录,CD操作也只有两种方式: ...




1.1 Natural Language Processing