http://www.html5rocks.com/en/tutorials/internals/howbrowserswork/#Parser_Lexer_combination

Grammars

Parsing is based on the syntax rules the document obeys: the language or format it was written in. Every format you can parse must have deterministic grammar consisting of vocabulary and syntax rules. It is called a context free grammar. Human languages are not such languages and therefore cannot be parsed with conventional parsing techniques.

Parser–Lexer combination

Parsing can be separated into two sub processes: lexical analysis and syntax analysis.

Lexical analysis is the process of breaking the input into tokens. Tokens are the language vocabulary: the collection of valid building blocks. In human language it will consist of all the words that appear in the dictionary for that language.

Syntax analysis is the applying of the language syntax rules.

Parsers usually divide the work between two components: the lexer (sometimes called tokenizer) that is responsible for breaking the input into valid tokens, and the parser that is responsible for constructing the parse tree by analyzing the document structure according to the language syntax rules.
The lexer knows how to strip irrelevant characters like white spaces and line breaks.

   Figure : from source document to parse trees

The parsing process is iterative. The parser will usually ask the lexer for a new token and try to match the token with one of the syntax rules.   If a rule is matched, a node corresponding to the token will be added to the parse tree and the parser will ask for another token.

If no rule matches, the parser will store the token internally, and keep asking for tokens until a rule matching all the internally stored tokens is found. If no rule is found then the parser will raise an exception.  This means the document was not valid and contained syntax errors.

How Browsers Work: Behind the scenes of modern web browsers的更多相关文章

  1. (转载)How browsers work--Behind the scenes of modern web browsers (前端必读)

    浏览器可以被认为是使用最广泛的软件,本文将介绍浏览器的工 作原理,我们将看到,从你在地址栏输入google.com到你看到google主页过程中都发生了什么. 将讨论的浏览器 今天,有五种主流浏览器— ...

  2. 【转载】How browsers work--Behind the scenes of modern web browsers (前端必读)

    浏览器可以被认为是使用最广泛的软件,本文将介绍浏览器的工 作原理,我们将看到,从你在地址栏输入google.com到你看到google主页过程中都发生了什么. 将讨论的浏览器 今天,有五种主流浏览器- ...

  3. Building Modern Web Apps-构建现代的 Web 应用程序

    Building Modern Web Apps-构建现代的 Web 应用程序 视频长度:1 小时左右 视频作者:Scott Hunter 和 Scott Hanselman 视频背景:Visual ...

  4. a buzzword to refer to modern Web technologies

    https://html.spec.whatwg.org/multipage/introduction.html#is-this-html5? HTML Living Standard — Last ...

  5. Building Modern Web Apps-构建现代的 Web 应用程序(一些感想)

    <iframe src="http://channel9.msdn.com/Series/MVA-China/Web20140611A01/player?h=540&w=960 ...

  6. modern web application

    http://www.codeproject.com/Reference/597538/Modern-Web-Development http://www.west-wind.com/presenta ...

  7. lineman 的理念与 modern web app

    无意中翻到javascript 有个 lineman工具, 提供了一些脚手架 以及 默认的app目录结构,同时还附带了诸多前端的性能优化工具,在他的主页还发现其理念与我之前关于web app的开发模型 ...

  8. Cheatsheet: 2016 04.01 ~ 04.30

    .NET String format Setting up Ubuntu for .NET Development ASP.​NET Core and Angular2 - Part 1 - Upda ...

  9. 深入理解requestAnimationFrame

    前言 本文主要参考w3c资料,从底层实现原理的角度介绍了requestAnimationFrame.cancelAnimationFrame,给出了相关的示例代码以及我对实现原理的理解和讨论. 先来看 ...

随机推荐

  1. poj 3517 约瑟夫环

    最简单的约瑟夫环,虽然感觉永远不会考约瑟夫环,但数学正好刷到这部分,跳过去的话很难过 直接粘别人分析了 约瑟夫问题: 用数学方法解的时候需要注意应当从0开始编号,因为取余会等到0解. 实质是一个递推, ...

  2. 常用的 Python 爬虫技巧总结

    用python也差不多一年多了,python应用最多的场景还是web快速开发.爬虫.自动化运维:写过简单网站.写过自动发帖脚本.写过收发邮件脚本.写过简单验证码识别脚本. 爬虫在开发过程中也有很多复用 ...

  3. 在Windows下利用MinGW编译FFmpeg

    目录 [隐藏]  1 环境与软件 2 第一步:安装MinGW 3 第二步:配置编译环境 4 第三步:配置SDL 5 第四步:编译 5.1 编译faac 5.2 编译fdk-aac 5.3 编译x264 ...

  4. js:语言精髓笔记5----语言分类

    计算模型:源于对计算过程的不同认识: 1.基于不同计算模型一般分为://教科书的一般分类 命令式语言: 函数式语言: 逻辑式语言: 面向对象程序设计语言: 2.基于程序本质分类:  //编程的经典法则 ...

  5. JS插件

    1.Placeholders.js 所有的浏览器都支持placeholder,唯独IE不支持.现在我们有了这款插件,IE下终于可以支持了! 2.Html5shiv.js 主要解决HTML5提出的新的元 ...

  6. wrk 进程管理

    4.3.1 四种情况创建新进程并调用MmInitializeProcessAddressSpace, 调用完这个函数的时候,一个进程的地址空间基本建立了, 可执行文件的映像和ntdll.dll(内存区 ...

  7. 贪心/字符串处理 Codeforces Round #291 (Div. 2) A. Chewbaсca and Number

    题目传送门 /* WA了好几次,除了第一次不知道string不用'\0'外 都是欠考虑造成的 */ #include <cstdio> #include <cmath> #in ...

  8. 基于Extjs的web表单设计器 第五节——数据库设计

    这里列出表单设计器系列的内容,6.7.8节的内容应该在春节后才有时间出了.因为这周末就请假回老家了,准备我的结婚大事.在此提前祝大家春节快乐! 基于Extjs的web表单设计器 基于Extjs的web ...

  9. CodeForces Round 200 Div2

    这次比赛出的题真是前所未有的水!只用了一小时零十分钟就过了前4道题,不过E题还是没有在比赛时做出来,今天上午我又把E题做了一遍,发现其实也很水.昨天晚上人品爆发,居然排到Rank 55,运气好的话没准 ...

  10. sprintf() in c

    Declaration Following is the declaration for sprintf() function. int sprintf(char *str, const char * ...