构建工具是如何用 node 操作 html/js/css/md 文件的

从本质上来说，html/js/css/md ... 源代码文件都是文本文件，文本文件的内容都是字符串，对文本文件的操作其实就是对字符串的操作。

操作源代码的方式又主要分成两种：

当作字符串，进行增、删、改等操作
按照某种语法、规则，把字符串读取成一个对象，然后对这个对象进行操作，最后导出新的字符串

1. 操作 `html` 文件

html 的语法比较简单，并且一般操作 html 都是插入、替换、模板引擎渲染等在字符串上的操作，所以使用第一种方式的比较多。

比如：

html-loader
html-webpack-plugin
html-minifier
handlebars 模板引擎
pug 模板引擎
ejs 模板引擎

一般以第二种方式来操作 html 的都是将 html 文本解析成 dom 树对象，然后进行 dom 操作，最后再导出成新的代码文本。

比如：

以 `cheerio` 为例，操作 `html` 文本：

cheerio 能够加载一个 html 文本，实例化一个类 jQuery 对象，然后使用 jQuery 的 api 像操作 dom 一样操作这段文本，最后导出新的 html 文本。



const cheerio = require('cheerio');

const $ = cheerio.load('&lt;h2 class="title"&gt;Hello world&lt;/h2&gt;'); // 加载一个 html 文本

$('h2.title').text('Hello there!');

$('h2').addClass('welcome');

$.html(); // 导出新的 html 文本

//=&gt; &lt;h2 class="title welcome"&gt;Hello there!&lt;/h2&gt;

以 `jsdom` 为例，操作 `html` 文本：

jsdom 是用 js 将一个 html 文本解析为一个 dom 对象，并实现了一系列 web 标准，特别是 WHATWG 组织制定的 DOM 和 HTML 标准。



const jsdom = require("jsdom");

const { JSDOM } = jsdom;

const dom = new JSDOM(`&lt;!DOCTYPE html&gt;&lt;p&gt;Hello world&lt;/p&gt;`);

console.log(dom.window.document.querySelector("p").textContent); // "Hello world"

2. 操作 `js` 文件

因为 js 语法比较复杂，仅仅是如字符串一样进行增删改，只能做一些小的操作，意义不大。所以，一般操作 js 文件都是采用的第二种方式。

在第二种方式中，一般是工具将 js 文本解析成抽象语法树（AST，Abstract Syntax Tree，抽象语法树），然后对这棵语法树以面向对象的方式做增删改等操作，最后再导出成新的代码文本。

生成抽象语法树的工具主要有：

Acorn: 比如 webpack、rollup、UglifyJS 等工具底层都是使用的 acorn 抽象语法树解析器
babel-parser: babel 转码工具底层使用的抽象语法树解析器

以 `acorn` 为例，将 `1 + 1` 片段进行解析：



const acorn = require('acorn');

const tree = acorn.parse('1 + 1');



// tree 的 json 化表示

{

  type: 'Program',

  start: 0,

  end: 5,

  body: [{

    type: 'ExpressionStatement',

    start: 0,

    end: 5,

    expression: {

      type: 'BinaryExpression',

      start: 0,

      end: 5,

      left: { type: 'Literal', start: 0, end: 1, value: 1, raw: '1' },

      operator: '+',

      right: { type: 'Literal', start: 4, end: 5, value: 1, raw: '1' }

    }

  }],

  sourceType: 'script'

}

以 `babel-parser` 为例，将 `1 + 1` 片段进行解析：



const parser = require('@babel/parser');

const tree = parser.parse('1 + 1');



// tree 的 json 化表示

{

  type: 'File',

  start: 0,

  end: 5,

  loc: {

    start: { line: 1, column: 0 },

    end: { line: 1, column: 5 }

  },

  program: {

    type: 'Program',

    start: 0,

    end: 5,

    loc: {

      start: { line: 1, column: 0 },

      end: { line: 1, column: 5 }

    },

    sourceType: 'script',

    interpreter: null,

    body: [{

      type: 'ExpressionStatement',

      start: 0,

      end: 5,

      loc: {

        start: { line: 1, column: 0 },

        end: { line: 1, column: 5 }

      },

      expression: {

        type: 'BinaryExpression',

        start: 0,

        end: 5,

        loc: {

          start: { line: 1, column: 0 },

          end: { line: 1, column: 5 }

        },

        left: {

          type: 'NumericLiteral',

          start: 0,

          end: 1,

          loc: {

            start: { line: 1, column: 0 },

            end: { line: 1, column: 5 }

          },

          extra: { rawValue: 1, raw: '1' },

          value: 1

        },

        operator: '+',

        right: {

          type: 'NumericLiteral',

          start: 4,

          end: 5,

          loc: {

            start: { line: 1, column: 0 },

            end: { line: 1, column: 5 }

          },

          extra: { rawValue: 1, raw: '1' },

          value: 1

        }

      }

    }],

    directives: []

  },

  comments: []

}

3. 操作 `css` 文件

css 的语法比 html 要复杂一些，一些简单的操作如插入、替换，可以用直接以字符串的方式操作，但如果是压缩、auto prefix、css-modules 等复杂的功能时，就需要用第二种方式操作 css 了。

在第二种方式中，一般也是将 css 文本解析成一棵抽象语法树，然后进行操作。

比如：

postcss: 比如 css-loader、autoprefixer、cssnano 等的底层都是使用的 postcss 来解析
rework、reworkcss: 抽象语法树解析器
csstree: 比如 csso 的底层就是使用 csstree 来解析

以 `postcss` 为例，操作 `css` 文本：



const autoprefixer = require('autoprefixer');

const postcss = require('postcss');

const precss = require('precss');

const css = `

.hello {

  display: flex;

  color: red;

  backgroundColor: #ffffff;

}

`;

postcss([precss, autoprefixer({browsers: ['last 2 versions', '&gt; 5%']})])

  .process(css)

  .then(result =&gt; {

    console.log(result.css);

  });

输出的文本：



.hello {

  display: -webkit-box;

  display: -ms-flexbox;

  display: flex;

  color: red;

  backgroundColor: #ffffff;

}

以 `rework` 为例，操作 `css` 文本：



const css = require('css');

const ast = css.parse('body { font-size: 12px; }');

console.log(css.stringify(ast));

输出的文本：



body {

  font-size: 12px;

}

4. 操作 `markdown/md` 文件

一般来说，操作 markdown 文本的目的有两个：

作为编辑器编辑 markdown 文本，或作为渲染器渲染 markdown 文本为 html 文本
从 markdown 文本中读取信息、校验嵌入的源代码、优化格式等

所以，尽管 markdown 的语法也很简单，但一般并不会直接去使用字符串的方式去操作 markdown 文本，一般都是使用的第二种方式。

比如：

markdown-it: 作为编辑器或渲染器的好手
remark: 构建抽象语法树进行操作的好手

以 `markdown-it` 为例，操作 `markdown` 文本：



const md = require('markdown-it')();

const result = md.render('# markdown-it rulezz!');

console.log(result);

输出的文本：



&lt;h1&gt;markdown-it rulezz!&lt;/h1&gt;

以 `remark` 为例，操作 `markdown` 文本：



const remark = require('remark')

const recommended = require('remark-preset-lint-recommended')

const html = require('remark-html')

const report = require('vfile-reporter')

remark()

  .use(recommended)

  .use(html)

  .process('## Hello world!', function(err, file) {

    console.error(report(err || file))

    console.log(String(file))

  })

校验错误提示：



1:1  warning  Missing newline character at end of file  final-newline  remark-lint

⚠ 1 warning

输出的文本：



&lt;h2&gt;Hello world!&lt;/h2&gt;

后续

更多博客，查看 https://github.com/senntyou/blogs

作者：深予之 (@senntyou)