Week 4 OverviewHelp Center

Week 4

On this page:

Instructional Activities

Below is a list of the activities and assignments available to you this week. See the How to Pass the Class page to know which assignments pertain to the badge or badges you are pursuing. Click on the name of each activity for more detailed instructions.

Relevant Badges Activity Due Date* Estimated Time Required
  Week 4 Video Lectures Sunday, April 19 (suggested) 3 hours
Programming Assignment 2 Sunday, April 26 2–3 hours
Week 4 Quiz Sunday, April 19 ~0.5 hours

* All deadlines are at 11:55 PM Central Time (time zone conversion) unless otherwise noted.

Time

This module will last 7 days and should take approximately 6 hours of dedicated time to complete, with its readings and assignments.

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:

  • Explain some of the main general challenges in creating a web search engine.
  • Explain what a web crawler is and what factors have to be considered when designing a web crawler.
  • Explain the basic idea of Google File System (GFS).
  • Explain the basic idea of MapReduce and how we can use it to build an inverted index in parallel.
  • Explain how links on the web can be leveraged to improve search results.
  • Explain how PageRank and HITS algorithms work.
  • Explain the basic idea of using machine learning to combine multiple features for ranking documents (aka learning to rank).
  • Explain how we can extend a retrieval system to perform content-based information filtering (recommendation).
  • Explain how we can use a linear utility function to evaluate an information filtering system.
  • Explain the basic idea of collaborative filtering.
  • Explain how the memory-based collaborative filtering algorithm works.

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.

  • Scalability, efficiency
  • Spam
  • Crawler, focused crawling, incremental crawling
  • Google File System (GFS)
  • MapReduce
  • Link analysis, anchor text
  • PageRank, HITS
  • Learning to rank, features, logistic regression
  • Content-based filtering
  • Collaborative filtering
  • Beta-gamma threshold learning
  • Linear utility
  • User profile
  • Exploration-exploitation tradeoff
  • Memory-based collaborative filtering
  • Cold start

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.

  • What are some of the general challenges in building a web search engine?
  • What is a crawler? How can we implement a simple crawler?
  • What is focused crawling? What is incremental crawling?
  • What kind of pages should have a higher priority for recrawling in incremental crawling?
  • What can we do if the inverted index doesn’t fit in any single machine?
  • What’s the basic idea of Google File System (GFS)?
  • How does MapReduce work? What are the two key functions that a programmer needs to implement when programming with a MapReduce framework?
  • How can we use MapReduce to build an inverted index in parallel?
  • What is anchor text? Why is it useful for improving search accuracy?
  • What is a hub page? What is an authority page?
  • What kind of web pages tend to receive high scores from PageRank?
  • How can we interpret PageRank from the perspective of a random surfer “walking” on the web?
  • How exactly do you compute PageRank scores?
  • How does the HITS algorithm work?
  • What’s the basic idea of learning to rank?
  • How can logistic regression be used to combine multiple features for improving ranking accuracy of a search engine?
  • What is content-based information filtering?
  • How can we use a linear utility function to evaluate a filtering system? How should we set the coefficients in such a linear utility function?
  • How can we extend a retrieval system to perform content-based information filtering?
  • What is exploration-exploitation tradeoff?
  • How does the beta-gamma threshold learning algorithm work?
  • What is the basic idea of collaborative filtering?
  • How does the memory-based collaborative filtering algorithm work?
  • What is the “cold start” problem in collaborative filtering?

Readings and Resources

All the readings are available online

    1. For web search, read chapters 19, 20, and 21 of the following book: 
      Introduction to Information Retrieval, by Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schuetze, Cambridge University Press, 2007.
    1. For beta-gamma threshold learning, read the following paper:
      Threshold Calibration in CLARIT Adaptive Filtering, by ChengXiang Zhai, Peter Jansen, Emilia Stoica, Norbert Grot, David A. Evans, Proceedings of TREC 1998.
  1. For content-based filtering in general and memory-based collaborative filtering, read Chapters 3 & 4 of the following book:
    Recommender Systems Handbook, by Francesco Ricci, Lior Rokach, Bracha Shapira, Paul B. Kantor,  Springer 2011.

Video Lectures

Video Lecture Lecture Notes Transcript Video Download SRT Caption File Forum
 4.1. Web Search: Introduction & Web Crawler(00:11:05)    
 
(15.4 MB)
Forthcoming...
 
 4.2. Web Search: Web Indexing(00:17:19)    
 
(23.8 MB)
Forthcoming...
 
 4.3. Web Search: Link Analysis – Part 1(00:09:16)    
 
(12.4 MB)
Forthcoming...
 
 4.3. Web Search: Link Analysis – Part 2(00:17:30)    
 
(24.4 MB)
Forthcoming...
 
 4.3. Web Search: Link Analysis – Part 3(00:05:59)    
 
(8.1 MB)
Forthcoming...
 
 4.4. Web Search: Learning to Rank – Part 1(00:05:54)    
 
(8.8 MB)
Forthcoming...
 
 4.4. Web Search: Learning to Rank – Part 2(00:10:23)    
 
(14.3 MB)
Forthcoming...
 
 4.4. Web Search: Learning to Rank – Part 3(00:04:58)    
 
(7.3 MB)
Forthcoming...
 
 4.5. Web Search: Future of Web Search(00:13:09)    
 
(18.1 MB)
Forthcoming...
 
 4.6. Recommender Systems: Content-Based Filtering – Part 1 (00:12:55)    
 
(17.4 MB)
Forthcoming...
 
 4.6. Recommender Systems: Content-Based Filtering – Part 2(00:10:42)    
 
(14.5 MB)
Forthcoming...
 
 4.7. Recommender Systems: Collaborative Filtering - Part 1(00:06:20)    
 
(8.8 MB)
Forthcoming...
 
 4.7. Recommender Systems: Collaborative Filtering - Part 2(00:12:09)    
 
(16.7 MB)
Forthcoming...
 
 4.7. Recommender Systems: Collaborative Filtering - Part 3(00:04:45)    
 
(7.1 MB)
Forthcoming...
 
 4.8. Course Summary(00:09:48)    
 
(13.9 MB)
Forthcoming...
 

Tips for Success

To do well this week, I recommend that you do the following:

  • Review the video lectures a number of times to gain a solid understanding of the key questions and concepts introduced this week.
  • When possible, provide tips and suggestions to your peers in this class. As a learning community, we can help each other learn and grow. One way of doing this is by helping to address the questions that your peers pose. By engaging with each other, we’ll all learn better.
  • It’s always a good idea to refer to the video lectures and chapter readings we've read during this week and reference them in your responses. When appropriate, critique the information presented.
  • Take notes while you read the materials and watch the lectures for this week. By taking notes, you are interacting with the material and will find that it is easier to remember and to understand. With your notes, you’ll also find that it’s easier to complete your assignments. So, go ahead, do yourself a favor; take some notes!

Getting and Giving Help

You can get/give help via the following means:

  • Use the Learner Help Center to find information regarding specific technical problems. For example, technical problems would include error messages, difficulty submitting assignments, or problems with video playback. You can access the Help Center by clicking on theHelp link at the top right of any course page. If you can not find an answer in the documentation, you can also report your problem to the Coursera staff by clicking on the Contact Us! link available on each topic's page within the Learner Help Center.
  • Use the Content Issues forum to report errors in lecture video content, assignment questions and answers, assignment grading, text and links on course pages, or the content of other course materials. University of Illinois staff and Community TAs will monitor this forum and respond to issues.

As a reminder, the instructor is not able to answer emails sent directly to his account. Rather, all questions should be reported as described above.

from: https://class.coursera.org/textretrieval-001/wiki/Week4Overview

coursera课程Text Retrieval and Search Engines之Week 4 Overview的更多相关文章

  1. coursera课程Text Retrieval and Search Engines之Week 1 Overview

    Week 1 OverviewHelp Center Week 1 On this page: Instructional Activities Time Goals and Objectives K ...

  2. coursera课程Text Retrieval and Search Engines之Week 2 Overview

    Week 2 OverviewHelp Center Week 2 On this page: Instructional Activities Time Goals and Objectives K ...

  3. coursera课程Text Retrieval and Search Engines之Week 3 Overview

    Week 3 OverviewHelp Center Week 3 On this page: Instructional Activities Time Goals and Objectives K ...

  4. 【Python学习笔记】Coursera课程《Using Databases with Python》 密歇根大学 Charles Severance——Week4 Many-to-Many Relationships in SQL课堂笔记

    Coursera课程<Using Databases with Python> 密歇根大学 Week4 Many-to-Many Relationships in SQL 15.8 Man ...

  5. 【Python学习笔记】Coursera课程《Using Python to Access Web Data》 密歇根大学 Charles Severance——Week6 JSON and the REST Architecture课堂笔记

    Coursera课程<Using Python to Access Web Data> 密歇根大学 Week6 JSON and the REST Architecture 13.5 Ja ...

  6. 【Python学习笔记】Coursera课程《Using Python to Access Web Data 》 密歇根大学 Charles Severance——Week2 Regular Expressions课堂笔记

    Coursera课程<Using Python to Access Web Data > 密歇根大学 Charles Severance Week2 Regular Expressions ...

  7. Coursera课程下载和存档计划[转载]

    上周三收到Coursera平台的群发邮件,大意是Coursera将在6月30号彻底关闭旧的课程平台,全面升级到新的课程平台上,一些旧的课程资源(课程视频.课程资料)将不再保存,如果你之前学习过相关的课 ...

  8. 【网页开发学习】Coursera课程《面向 Web 开发者的 HTML、CSS 与 Javascript》Week1课堂笔记

    Coursera课程<面向 Web 开发者的 HTML.CSS 与 Javascript> Johns Hopkins University Yaakov Chaikin Week1 In ...

  9. 【DeepLearning学习笔记】Coursera课程《Neural Networks and Deep Learning》——Week2 Neural Networks Basics课堂笔记

    Coursera课程<Neural Networks and Deep Learning> deeplearning.ai Week2 Neural Networks Basics 2.1 ...

随机推荐

  1. 【LOJ】#2289. 「THUWC 2017」在美妙的数学王国中畅游

    题解 我们发现,题目告诉我们这个东西就是一个lct 首先,如果只有3,问题就非常简单了,我们算出所有a的总和,所有b的总和就好了 要是1和2也是多项式就好了--其实可以!也就是下面泰勒展开的用处,我们 ...

  2. matplotlib使用总结

    一.简介 Matplotlib 是一个 Python 的 2D绘图库,它以各种硬拷贝格式和跨平台的交互式环境生成出版质量级别的图形.通过 Matplotlib,开发者可以仅需要几行代码,便可以生成绘图 ...

  3. WCF服务发布到IIS中去(VS2013+win7系统)

    第一个WCF程序 1. 新建立空白解决方案,并在解决方案中新建项目,项目类型为:WCF服务应用程序.建立完成后如下图所示: 2.删除系统生成的两个文件IService1.cs与Service1.svc ...

  4. CIDR的IP地址的表示与划分方法

    早期的ip地址划分: 最初设计互联网络时,为了便于寻址以及层次化构造网络,每个IP地址包括两个标识码(ID),即网络ID和主机ID.同一个物理网络上的所有主机都使用同一个网络ID,网络上的一个主机(包 ...

  5. 在android studio中集成javah, ndk-build进行JNI开发

    最近在搞一个android上控制LED灯闪烁的功能,用到了串口编程,搜索了一下,发现Google发布了一个demo,android-serialport-api.有现成的代码和APK,要想自己改JNI ...

  6. 磁盘清理-安全转移C盘中软件的缓存文件

    C盘飘红啦~~~ 安装软件时,默认会安装到C盘,并不会特意去改(尤其C盘是固态硬盘时).或者,根本就没有给你修改的机会. 可是啊,有些软件的缓存数据目录会比较大,实在太占C盘空间.想移出去,但又不想重 ...

  7. Replication Controller

    RC保证在同一时间能够运行指定数量的Pod副本,保证Pod总是可用.如果实际Pod数量比指定的多就结束掉多余的,如果实际数量比指定的少就启动缺少的. 当Pod失败.被删除或被终结时,RC会自动创建新的 ...

  8. Vue-router浅识

    一.router-link及router-view :用来做导航,通过传入to属性来指定链接 :用来做路由出口,路由匹配到的组件都会渲染在这里 const router = new VueRouter ...

  9. STL 优先队列详解

    优先队列是一个保证队列里元素单调的队列,我们可以利用它来维护一个线性结构的单调性. 一般的优先队列: 当然需要加头文件 #include <queue> priority_queue &l ...

  10. HDU 5700 区间交 离线线段树

    区间交 题目连接: http://acm.hdu.edu.cn/showproblem.php?pid=5700 Description 小A有一个含有n个非负整数的数列与m个区间.每个区间可以表示为 ...