TFhpple是一个用于解析html数据的第三方库,本人感觉功能还算可以,只不过在使用前必须配置项目。

  

  配置

1.导入libxml2.tbd

2.设置编译路径

  使用

这里使用一个例子来说明

http://so.gushiwen.org/guwen/book_2.aspx

1.创建TFHpple对象,data为网站返回的数据

TFHpple *htmlParser = [[TFHpple alloc] initWithHTMLData:data];

2.使用searchWithXPathQuery方法得到有用数据,XPATH知识具体百度

NSArray *temp1 = [htmlParser searchWithXPathQuery:@"//div[@class='shileft']/div[@class='bookcont']"]

这样我们获取了论语的数据

3。获取并分析元素

TFHppleElement *element = [elements objectAtIndex:i];

TFHppleElement对象包含许多属性,下面简单介绍一下各属性

1。

@property (nonatomic, copy, readonly) NSString *raw

raw是包含html标记的网页数据

<div class="bookcont">&#;
<ul>&#;
&#;
<span><a href="/guwen/bookv_19.aspx">学而篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_20.aspx">为政篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_21.aspx">八佾篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_22.aspx">里仁篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_23.aspx">公冶长篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_24.aspx">雍也篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_25.aspx">述而篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_26.aspx">泰伯篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_27.aspx">子罕篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_28.aspx">乡党篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_29.aspx">先进篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_30.aspx">颜渊篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_31.aspx">子路篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_32.aspx">宪问篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_33.aspx">卫灵公篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_34.aspx">季氏篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_35.aspx">阳货篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_36.aspx">微子篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_37.aspx">子张篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_38.aspx">尧曰篇</a></span>&#;
&#;
</ul>&#;
</div>

raw数据

2.content是网页的具体数据,不包含html标记

学而篇

              为政篇

              八佾篇

              里仁篇

              公冶长篇

              雍也篇

              述而篇

              泰伯篇

              子罕篇

              乡党篇

              先进篇

              颜渊篇

              子路篇

              宪问篇

              卫灵公篇

              季氏篇

              阳货篇

              微子篇

              子张篇

              尧曰篇

content数据

3.tagName是html标签

输出只有div

4.attributes,属性。。。。。。。

class = bookcont;

5.children子节点

(
"{\n nodeContent = \"
\\n \";\n nodeName = text;\n}",
"{\n nodeChildArray = (\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_19.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b66\\U800c\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b66\\U800c\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_19.aspx\\\">\\U5b66\\U800c\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5b66\\U800c\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_19.aspx\\\">\\U5b66\\U800c\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_20.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U4e3a\\U653f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U4e3a\\U653f\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_20.aspx\\\">\\U4e3a\\U653f\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U4e3a\\U653f\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_20.aspx\\\">\\U4e3a\\U653f\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_21.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U516b\\U4f7e\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U516b\\U4f7e\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_21.aspx\\\">\\U516b\\U4f7e\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U516b\\U4f7e\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_21.aspx\\\">\\U516b\\U4f7e\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_22.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U91cc\\U4ec1\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U91cc\\U4ec1\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_22.aspx\\\">\\U91cc\\U4ec1\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U91cc\\U4ec1\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_22.aspx\\\">\\U91cc\\U4ec1\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_23.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U516c\\U51b6\\U957f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U516c\\U51b6\\U957f\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_23.aspx\\\">\\U516c\\U51b6\\U957f\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U516c\\U51b6\\U957f\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_23.aspx\\\">\\U516c\\U51b6\\U957f\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_24.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U96cd\\U4e5f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U96cd\\U4e5f\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_24.aspx\\\">\\U96cd\\U4e5f\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U96cd\\U4e5f\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_24.aspx\\\">\\U96cd\\U4e5f\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_25.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U8ff0\\U800c\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U8ff0\\U800c\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_25.aspx\\\">\\U8ff0\\U800c\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U8ff0\\U800c\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_25.aspx\\\">\\U8ff0\\U800c\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_26.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U6cf0\\U4f2f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U6cf0\\U4f2f\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_26.aspx\\\">\\U6cf0\\U4f2f\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U6cf0\\U4f2f\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_26.aspx\\\">\\U6cf0\\U4f2f\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_27.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b50\\U7f55\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b50\\U7f55\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_27.aspx\\\">\\U5b50\\U7f55\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5b50\\U7f55\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_27.aspx\\\">\\U5b50\\U7f55\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_28.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U4e61\\U515a\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U4e61\\U515a\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_28.aspx\\\">\\U4e61\\U515a\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U4e61\\U515a\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_28.aspx\\\">\\U4e61\\U515a\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_29.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5148\\U8fdb\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5148\\U8fdb\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_29.aspx\\\">\\U5148\\U8fdb\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5148\\U8fdb\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_29.aspx\\\">\\U5148\\U8fdb\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_30.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U989c\\U6e0a\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U989c\\U6e0a\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_30.aspx\\\">\\U989c\\U6e0a\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U989c\\U6e0a\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_30.aspx\\\">\\U989c\\U6e0a\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_31.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b50\\U8def\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b50\\U8def\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_31.aspx\\\">\\U5b50\\U8def\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5b50\\U8def\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_31.aspx\\\">\\U5b50\\U8def\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_32.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5baa\\U95ee\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5baa\\U95ee\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_32.aspx\\\">\\U5baa\\U95ee\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5baa\\U95ee\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_32.aspx\\\">\\U5baa\\U95ee\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_33.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U536b\\U7075\\U516c\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U536b\\U7075\\U516c\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_33.aspx\\\">\\U536b\\U7075\\U516c\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U536b\\U7075\\U516c\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_33.aspx\\\">\\U536b\\U7075\\U516c\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_34.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b63\\U6c0f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b63\\U6c0f\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_34.aspx\\\">\\U5b63\\U6c0f\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5b63\\U6c0f\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_34.aspx\\\">\\U5b63\\U6c0f\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_35.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U9633\\U8d27\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U9633\\U8d27\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_35.aspx\\\">\\U9633\\U8d27\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U9633\\U8d27\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_35.aspx\\\">\\U9633\\U8d27\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_36.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5fae\\U5b50\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5fae\\U5b50\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_36.aspx\\\">\\U5fae\\U5b50\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5fae\\U5b50\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_36.aspx\\\">\\U5fae\\U5b50\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_37.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b50\\U5f20\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b50\\U5f20\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_37.aspx\\\">\\U5b50\\U5f20\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5b50\\U5f20\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_37.aspx\\\">\\U5b50\\U5f20\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_38.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5c27\\U66f0\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5c27\\U66f0\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_38.aspx\\\">\\U5c27\\U66f0\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5c27\\U66f0\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_38.aspx\\\">\\U5c27\\U66f0\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n }\n );\n nodeContent = \"
\\n
\\n \\U5b66\\U800c\\U7bc7
\\n
\\n \\U4e3a\\U653f\\U7bc7
\\n
\\n \\U516b\\U4f7e\\U7bc7
\\n
\\n \\U91cc\\U4ec1\\U7bc7
\\n
\\n \\U516c\\U51b6\\U957f\\U7bc7
\\n
\\n \\U96cd\\U4e5f\\U7bc7
\\n
\\n \\U8ff0\\U800c\\U7bc7
\\n
\\n \\U6cf0\\U4f2f\\U7bc7
\\n
\\n \\U5b50\\U7f55\\U7bc7
\\n
\\n \\U4e61\\U515a\\U7bc7
\\n
\\n \\U5148\\U8fdb\\U7bc7
\\n
\\n \\U989c\\U6e0a\\U7bc7
\\n
\\n \\U5b50\\U8def\\U7bc7
\\n
\\n \\U5baa\\U95ee\\U7bc7
\\n
\\n \\U536b\\U7075\\U516c\\U7bc7
\\n
\\n \\U5b63\\U6c0f\\U7bc7
\\n
\\n \\U9633\\U8d27\\U7bc7
\\n
\\n \\U5fae\\U5b50\\U7bc7
\\n
\\n \\U5b50\\U5f20\\U7bc7
\\n
\\n \\U5c27\\U66f0\\U7bc7
\\n
\\n \";\n nodeName = ul;\n raw = \"<ul> \\n \\n <span><a href=\\\"/guwen/bookv_19.aspx\\\">\\U5b66\\U800c\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_20.aspx\\\">\\U4e3a\\U653f\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_21.aspx\\\">\\U516b\\U4f7e\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_22.aspx\\\">\\U91cc\\U4ec1\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_23.aspx\\\">\\U516c\\U51b6\\U957f\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_24.aspx\\\">\\U96cd\\U4e5f\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_25.aspx\\\">\\U8ff0\\U800c\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_26.aspx\\\">\\U6cf0\\U4f2f\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_27.aspx\\\">\\U5b50\\U7f55\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_28.aspx\\\">\\U4e61\\U515a\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_29.aspx\\\">\\U5148\\U8fdb\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_30.aspx\\\">\\U989c\\U6e0a\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_31.aspx\\\">\\U5b50\\U8def\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_32.aspx\\\">\\U5baa\\U95ee\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_33.aspx\\\">\\U536b\\U7075\\U516c\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_34.aspx\\\">\\U5b63\\U6c0f\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_35.aspx\\\">\\U9633\\U8d27\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_36.aspx\\\">\\U5fae\\U5b50\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_37.aspx\\\">\\U5b50\\U5f20\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_38.aspx\\\">\\U5c27\\U66f0\\U7bc7</a></span> \\n \\n </ul>\";\n}",
"{\n nodeContent = \"
\\n \";\n nodeName = text;\n}"
)

children

6.firstChild

{
nodeContent = "
\n ";
nodeName = text;
}

上面属性都是涉及HTML语言的标记,我们一般使用的时content属性,然后处理得到的NSString对象

这样我们就得到并处理为我们想要的数据。TFHppleElement是一个很重要的类,具体使用在这里就不介绍了。

【推荐】oc解析HTML数据的类库(爬取网页数据)的更多相关文章

  1. 使用webdriver+urllib爬取网页数据(模拟登陆,过验证码)

    urilib是python的标准库,当我们使用Python爬取网页数据时,往往用的是urllib模块,通过调用urllib模块的urlopen(url)方法返回网页对象,并使用read()方法获得ur ...

  2. python之爬取网页数据总结(一)

    今天尝试使用python,爬取网页数据.因为python是新安装好的,所以要正常运行爬取数据的代码需要提前安装插件.分别为requests    Beautifulsoup4   lxml  三个插件 ...

  3. Python网页解析库:用requests-html爬取网页

    Python网页解析库:用requests-html爬取网页 1. 开始 Python 中可以进行网页解析的库有很多,常见的有 BeautifulSoup 和 lxml 等.在网上玩爬虫的文章通常都是 ...

  4. python爬虫——爬取网页数据和解析数据

    1.网络爬虫的基本概念 网络爬虫(又称网络蜘蛛,机器人),就是模拟客户端发送网络请求,接收请求响应,一种按照一定的规则,自动地抓取互联网信息的程序.只要浏览器能够做的事情,原则上,爬虫都能够做到. 2 ...

  5. 接着上次的python爬虫,今天进阶一哈,局部解析爬取网页数据

    *解析网页数据的仓库 用Beatifulsoup基于lxml包lxml包基于html和xml的标记语言的解析包.可以去解析网页的内容,把我们想要的提取出来. 第一步.导入两个包,项目中必须包含beau ...

  6. 使用 Python 爬取网页数据

    1. 使用 urllib.request 获取网页 urllib 是 Python 內建的 HTTP 库, 使用 urllib 可以只需要很简单的步骤就能高效采集数据; 配合 Beautiful 等 ...

  7. 03:requests与BeautifulSoup结合爬取网页数据应用

    1.1 爬虫相关模块命令回顾 1.requests模块 1. pip install requests 2. response = requests.get('http://www.baidu.com ...

  8. Selenium+Tesseract-OCR智能识别验证码爬取网页数据

    1.项目需求描述 通过订单号获取某系统内订单的详细数据,不需要账号密码的登录验证,但有图片验证码的动态识别,将获取到的数据存到数据库. 2.整体思路 1.通过Selenium技术,无窗口模式打开浏览器 ...

  9. 使用puppeteer爬取网页数据实践小结

    简单介绍Puppeteer Puppeteer是一个Node库,它通过DevTools协议提供高级API来控制Chrome或Chromium.Puppeteer默认以无头方式运行,但可以配置为有头方式 ...

随机推荐

  1. windows环境下Eclipse开发MapReduce程序遇到的四个问题及解决办法

    按此文章<Hadoop集群(第7期)_Eclipse开发环境设置>进行MapReduce开发环境搭建的过程中遇到一些问题,饶了一些弯路,解决办法记录在此: 文档目的: 记录windows环 ...

  2. Android系列---JSON数据解析

    您可以通过点击 右下角 的按钮 来对文章内容作出评价, 也可以通过左下方的 关注按钮 来关注我的博客的最新动态. 如果文章内容对您有帮助, 不要忘记点击右下角的 推荐按钮 来支持一下哦 如果您对文章内 ...

  3. Redis之高可用方案

    Redis之高可用方案   Redis以其高效的访问速度著称.但由于官方还未发布redis-cluster,而redis的replica又有诸多不便:比如一组master-slave的机器,如果之间有 ...

  4. 使用jQuery页面加载函数启动定时任务

    var timer; $(document).ready(function(){ if (timer == undefined) timer = window.setInterval(showTime ...

  5. 基于Bootstrap的后台通用模板

    人总是比较刁的,世界的时尚趋势不断变化,对系统UI的审美也在不断疲劳中前进,之前觉得好好的UI,过了半年觉得平平无奇,不想再碰,需要寻求新的兴奋点. 下面这套UI就是半年前的(今日:2015-12), ...

  6. C# 字典排序Array.Sort

    Array.Sort可以实现便捷的字典排序,但如果完全相信他,那么就容易产生些异常!太顺利了,往往是前面有坑等你. 比如:微信接口,好多地方需要签名认证,签名的时候需要用的字典排序,如果只用Array ...

  7. 写essay和research paper必用的17个网站

    1.http://scholar.google.com/ 虽然还是Beta版,但个人已觉得现在已经是很好很强大了,Google学术搜索滤掉了普通搜索结果中大量的垃圾信息,排列出文章的不同版本以及被其它 ...

  8. 【Linux】——ctags

    ctags的功能:扫描指定的源文件,找出其中所包含的语法元素,并将找到的相关内容记录下来. ctags 可以在官网上下载源代码,然后编译安装.最后在 ~/.vimrc 文件中写入以下配置: " ...

  9. 画蛇添足-记spring3 hibernate4整合时遇到问题的处理办法

    最近在来到一个新公司,使用新的spring3,hibernate4框架,在使用注解事务总是不起作用. 首先看配置文件,然后再讲解. 首先是springmvc-servlet.xml,这个配置文件是se ...

  10. SZ,RZ传送文件

    linux 和window之间通过xshell的命令 SZ,RZ传送文件: