TFhpple是一个用于解析html数据的第三方库,本人感觉功能还算可以,只不过在使用前必须配置项目。

  

  配置

1.导入libxml2.tbd

2.设置编译路径

  使用

这里使用一个例子来说明

http://so.gushiwen.org/guwen/book_2.aspx

1.创建TFHpple对象,data为网站返回的数据

TFHpple *htmlParser = [[TFHpple alloc] initWithHTMLData:data];

2.使用searchWithXPathQuery方法得到有用数据,XPATH知识具体百度

NSArray *temp1 = [htmlParser searchWithXPathQuery:@"//div[@class='shileft']/div[@class='bookcont']"]

这样我们获取了论语的数据

3。获取并分析元素

TFHppleElement *element = [elements objectAtIndex:i];

TFHppleElement对象包含许多属性,下面简单介绍一下各属性

1。

@property (nonatomic, copy, readonly) NSString *raw

raw是包含html标记的网页数据

<div class="bookcont">&#;
<ul>&#;
&#;
<span><a href="/guwen/bookv_19.aspx">学而篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_20.aspx">为政篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_21.aspx">八佾篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_22.aspx">里仁篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_23.aspx">公冶长篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_24.aspx">雍也篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_25.aspx">述而篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_26.aspx">泰伯篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_27.aspx">子罕篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_28.aspx">乡党篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_29.aspx">先进篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_30.aspx">颜渊篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_31.aspx">子路篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_32.aspx">宪问篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_33.aspx">卫灵公篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_34.aspx">季氏篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_35.aspx">阳货篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_36.aspx">微子篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_37.aspx">子张篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_38.aspx">尧曰篇</a></span>&#;
&#;
</ul>&#;
</div>

raw数据

2.content是网页的具体数据,不包含html标记

学而篇

              为政篇

              八佾篇

              里仁篇

              公冶长篇

              雍也篇

              述而篇

              泰伯篇

              子罕篇

              乡党篇

              先进篇

              颜渊篇

              子路篇

              宪问篇

              卫灵公篇

              季氏篇

              阳货篇

              微子篇

              子张篇

              尧曰篇

content数据

3.tagName是html标签

输出只有div

4.attributes,属性。。。。。。。

class = bookcont;

5.children子节点

(
"{\n nodeContent = \"
\\n \";\n nodeName = text;\n}",
"{\n nodeChildArray = (\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_19.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b66\\U800c\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b66\\U800c\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_19.aspx\\\">\\U5b66\\U800c\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5b66\\U800c\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_19.aspx\\\">\\U5b66\\U800c\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_20.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U4e3a\\U653f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U4e3a\\U653f\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_20.aspx\\\">\\U4e3a\\U653f\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U4e3a\\U653f\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_20.aspx\\\">\\U4e3a\\U653f\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_21.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U516b\\U4f7e\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U516b\\U4f7e\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_21.aspx\\\">\\U516b\\U4f7e\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U516b\\U4f7e\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_21.aspx\\\">\\U516b\\U4f7e\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_22.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U91cc\\U4ec1\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U91cc\\U4ec1\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_22.aspx\\\">\\U91cc\\U4ec1\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U91cc\\U4ec1\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_22.aspx\\\">\\U91cc\\U4ec1\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_23.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U516c\\U51b6\\U957f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U516c\\U51b6\\U957f\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_23.aspx\\\">\\U516c\\U51b6\\U957f\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U516c\\U51b6\\U957f\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_23.aspx\\\">\\U516c\\U51b6\\U957f\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_24.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U96cd\\U4e5f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U96cd\\U4e5f\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_24.aspx\\\">\\U96cd\\U4e5f\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U96cd\\U4e5f\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_24.aspx\\\">\\U96cd\\U4e5f\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_25.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U8ff0\\U800c\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U8ff0\\U800c\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_25.aspx\\\">\\U8ff0\\U800c\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U8ff0\\U800c\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_25.aspx\\\">\\U8ff0\\U800c\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_26.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U6cf0\\U4f2f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U6cf0\\U4f2f\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_26.aspx\\\">\\U6cf0\\U4f2f\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U6cf0\\U4f2f\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_26.aspx\\\">\\U6cf0\\U4f2f\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_27.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b50\\U7f55\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b50\\U7f55\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_27.aspx\\\">\\U5b50\\U7f55\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5b50\\U7f55\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_27.aspx\\\">\\U5b50\\U7f55\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_28.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U4e61\\U515a\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U4e61\\U515a\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_28.aspx\\\">\\U4e61\\U515a\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U4e61\\U515a\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_28.aspx\\\">\\U4e61\\U515a\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_29.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5148\\U8fdb\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5148\\U8fdb\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_29.aspx\\\">\\U5148\\U8fdb\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5148\\U8fdb\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_29.aspx\\\">\\U5148\\U8fdb\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_30.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U989c\\U6e0a\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U989c\\U6e0a\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_30.aspx\\\">\\U989c\\U6e0a\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U989c\\U6e0a\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_30.aspx\\\">\\U989c\\U6e0a\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_31.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b50\\U8def\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b50\\U8def\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_31.aspx\\\">\\U5b50\\U8def\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5b50\\U8def\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_31.aspx\\\">\\U5b50\\U8def\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_32.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5baa\\U95ee\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5baa\\U95ee\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_32.aspx\\\">\\U5baa\\U95ee\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5baa\\U95ee\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_32.aspx\\\">\\U5baa\\U95ee\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_33.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U536b\\U7075\\U516c\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U536b\\U7075\\U516c\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_33.aspx\\\">\\U536b\\U7075\\U516c\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U536b\\U7075\\U516c\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_33.aspx\\\">\\U536b\\U7075\\U516c\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_34.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b63\\U6c0f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b63\\U6c0f\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_34.aspx\\\">\\U5b63\\U6c0f\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5b63\\U6c0f\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_34.aspx\\\">\\U5b63\\U6c0f\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_35.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U9633\\U8d27\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U9633\\U8d27\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_35.aspx\\\">\\U9633\\U8d27\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U9633\\U8d27\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_35.aspx\\\">\\U9633\\U8d27\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_36.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5fae\\U5b50\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5fae\\U5b50\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_36.aspx\\\">\\U5fae\\U5b50\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5fae\\U5b50\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_36.aspx\\\">\\U5fae\\U5b50\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_37.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b50\\U5f20\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b50\\U5f20\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_37.aspx\\\">\\U5b50\\U5f20\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5b50\\U5f20\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_37.aspx\\\">\\U5b50\\U5f20\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_38.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5c27\\U66f0\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5c27\\U66f0\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_38.aspx\\\">\\U5c27\\U66f0\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5c27\\U66f0\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_38.aspx\\\">\\U5c27\\U66f0\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n }\n );\n nodeContent = \"
\\n
\\n \\U5b66\\U800c\\U7bc7
\\n
\\n \\U4e3a\\U653f\\U7bc7
\\n
\\n \\U516b\\U4f7e\\U7bc7
\\n
\\n \\U91cc\\U4ec1\\U7bc7
\\n
\\n \\U516c\\U51b6\\U957f\\U7bc7
\\n
\\n \\U96cd\\U4e5f\\U7bc7
\\n
\\n \\U8ff0\\U800c\\U7bc7
\\n
\\n \\U6cf0\\U4f2f\\U7bc7
\\n
\\n \\U5b50\\U7f55\\U7bc7
\\n
\\n \\U4e61\\U515a\\U7bc7
\\n
\\n \\U5148\\U8fdb\\U7bc7
\\n
\\n \\U989c\\U6e0a\\U7bc7
\\n
\\n \\U5b50\\U8def\\U7bc7
\\n
\\n \\U5baa\\U95ee\\U7bc7
\\n
\\n \\U536b\\U7075\\U516c\\U7bc7
\\n
\\n \\U5b63\\U6c0f\\U7bc7
\\n
\\n \\U9633\\U8d27\\U7bc7
\\n
\\n \\U5fae\\U5b50\\U7bc7
\\n
\\n \\U5b50\\U5f20\\U7bc7
\\n
\\n \\U5c27\\U66f0\\U7bc7
\\n
\\n \";\n nodeName = ul;\n raw = \"<ul> \\n \\n <span><a href=\\\"/guwen/bookv_19.aspx\\\">\\U5b66\\U800c\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_20.aspx\\\">\\U4e3a\\U653f\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_21.aspx\\\">\\U516b\\U4f7e\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_22.aspx\\\">\\U91cc\\U4ec1\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_23.aspx\\\">\\U516c\\U51b6\\U957f\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_24.aspx\\\">\\U96cd\\U4e5f\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_25.aspx\\\">\\U8ff0\\U800c\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_26.aspx\\\">\\U6cf0\\U4f2f\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_27.aspx\\\">\\U5b50\\U7f55\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_28.aspx\\\">\\U4e61\\U515a\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_29.aspx\\\">\\U5148\\U8fdb\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_30.aspx\\\">\\U989c\\U6e0a\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_31.aspx\\\">\\U5b50\\U8def\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_32.aspx\\\">\\U5baa\\U95ee\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_33.aspx\\\">\\U536b\\U7075\\U516c\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_34.aspx\\\">\\U5b63\\U6c0f\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_35.aspx\\\">\\U9633\\U8d27\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_36.aspx\\\">\\U5fae\\U5b50\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_37.aspx\\\">\\U5b50\\U5f20\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_38.aspx\\\">\\U5c27\\U66f0\\U7bc7</a></span> \\n \\n </ul>\";\n}",
"{\n nodeContent = \"
\\n \";\n nodeName = text;\n}"
)

children

6.firstChild

{
nodeContent = "
\n ";
nodeName = text;
}

上面属性都是涉及HTML语言的标记,我们一般使用的时content属性,然后处理得到的NSString对象

这样我们就得到并处理为我们想要的数据。TFHppleElement是一个很重要的类,具体使用在这里就不介绍了。

【推荐】oc解析HTML数据的类库(爬取网页数据)的更多相关文章

  1. 使用webdriver+urllib爬取网页数据(模拟登陆,过验证码)

    urilib是python的标准库,当我们使用Python爬取网页数据时,往往用的是urllib模块,通过调用urllib模块的urlopen(url)方法返回网页对象,并使用read()方法获得ur ...

  2. python之爬取网页数据总结(一)

    今天尝试使用python,爬取网页数据.因为python是新安装好的,所以要正常运行爬取数据的代码需要提前安装插件.分别为requests    Beautifulsoup4   lxml  三个插件 ...

  3. Python网页解析库:用requests-html爬取网页

    Python网页解析库:用requests-html爬取网页 1. 开始 Python 中可以进行网页解析的库有很多,常见的有 BeautifulSoup 和 lxml 等.在网上玩爬虫的文章通常都是 ...

  4. python爬虫——爬取网页数据和解析数据

    1.网络爬虫的基本概念 网络爬虫(又称网络蜘蛛,机器人),就是模拟客户端发送网络请求,接收请求响应,一种按照一定的规则,自动地抓取互联网信息的程序.只要浏览器能够做的事情,原则上,爬虫都能够做到. 2 ...

  5. 接着上次的python爬虫,今天进阶一哈,局部解析爬取网页数据

    *解析网页数据的仓库 用Beatifulsoup基于lxml包lxml包基于html和xml的标记语言的解析包.可以去解析网页的内容,把我们想要的提取出来. 第一步.导入两个包,项目中必须包含beau ...

  6. 使用 Python 爬取网页数据

    1. 使用 urllib.request 获取网页 urllib 是 Python 內建的 HTTP 库, 使用 urllib 可以只需要很简单的步骤就能高效采集数据; 配合 Beautiful 等 ...

  7. 03:requests与BeautifulSoup结合爬取网页数据应用

    1.1 爬虫相关模块命令回顾 1.requests模块 1. pip install requests 2. response = requests.get('http://www.baidu.com ...

  8. Selenium+Tesseract-OCR智能识别验证码爬取网页数据

    1.项目需求描述 通过订单号获取某系统内订单的详细数据,不需要账号密码的登录验证,但有图片验证码的动态识别,将获取到的数据存到数据库. 2.整体思路 1.通过Selenium技术,无窗口模式打开浏览器 ...

  9. 使用puppeteer爬取网页数据实践小结

    简单介绍Puppeteer Puppeteer是一个Node库,它通过DevTools协议提供高级API来控制Chrome或Chromium.Puppeteer默认以无头方式运行,但可以配置为有头方式 ...

随机推荐

  1. 【BootStrap】初步教程

    <span style="font-family: Arial, Helvetica, sans-serif;">最近刚刚接触到BootStrap,在这里总结一下Boo ...

  2. java项目中读取properties文件

    这里的配置文件都放在src下面, System.properties的内容 exceptionMapping=exceptionMapping.properties config=config.pro ...

  3. Navi.Soft30.开放平台.聚合.开发手册

    1系统简介 1.1功能简述 现在是一个信息时代,并且正在高速发展.以前获取信息的途径非常少,可能只有电视台,收音机等有限的来源,而现在的途径数不胜数,如:QQ,微信,官方网站,个人网站等等 本开发手册 ...

  4. 菜鸟教程之工具使用(十四)——Maven项目右击没有“Maven”菜单选项

    从Git导入一个Maven项目,右击想更新Maven引用的jar包,却发现右键菜单根本没有“Maven”菜单项.怎么办?很简单,按如下步骤操作即可: 从Git导入后,右击项目没有“Maven”菜单项: ...

  5. Difference between SET, SETQ and SETF in LISP

    SET can set the value of symbols;   SETQ can set the value of variables;   SETF is a macro that will ...

  6. mac系统如何关闭root账户

    第一步:系统偏好设置 ->用户与群组 第二步:登录选项 ->解锁 ->单击网络帐户服务器加入 第三步:打开目录实用工具 第四步:菜单栏 ->编辑 ->停用 Root 用户 ...

  7. 消息中间件的技术选型心得-RabbitMQ、ActiveMQ和ZeroMQ

    消息中间件的技术选型心得-RabbitMQ.ActiveMQ和ZeroMQ 作者:chszs,转载需注明.博客主页:http://blog.csdn.net/chszs RabbitMQ.Active ...

  8. 图片上传代码(C#)

    //上传 protected void Button1_Click(object sender, EventArgs e)        {            if (FileUpload1.Ha ...

  9. HTML5本地存储之localStorage、sessionStorage

    1.概述 localStorage和sessionStorage统称为Web Storage,它使得网页可以在浏览器端储存数据. sessionStorage保存的数据用于浏览器的一次会话,当会话结束 ...

  10. C# 使用 SAP NCO3.0 调用SAP RFC函数接口

    最近使用C#调用SAP RFC函数,SAP提供了NCO3.0组件. 下载组件安装,之后引用“sapnco.dll”和“sapnco_utils.dll”两个文件. 在程序中 using SAP.Mid ...