【推荐】oc解析HTML数据的类库(爬取网页数据)
TFhpple是一个用于解析html数据的第三方库,本人感觉功能还算可以,只不过在使用前必须配置项目。
配置
1.导入libxml2.tbd
2.设置编译路径

使用
这里使用一个例子来说明
http://so.gushiwen.org/guwen/book_2.aspx
1.创建TFHpple对象,data为网站返回的数据
TFHpple *htmlParser = [[TFHpple alloc] initWithHTMLData:data];
2.使用searchWithXPathQuery方法得到有用数据,XPATH知识具体百度
NSArray *temp1 = [htmlParser searchWithXPathQuery:@"//div[@class='shileft']/div[@class='bookcont']"]
这样我们获取了论语的数据
3。获取并分析元素
TFHppleElement *element = [elements objectAtIndex:i];
TFHppleElement对象包含许多属性,下面简单介绍一下各属性
1。
@property (nonatomic, copy, readonly) NSString *raw
raw是包含html标记的网页数据
<div class="bookcont">&#;
<ul>&#;
&#;
<span><a href="/guwen/bookv_19.aspx">学而篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_20.aspx">为政篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_21.aspx">八佾篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_22.aspx">里仁篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_23.aspx">公冶长篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_24.aspx">雍也篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_25.aspx">述而篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_26.aspx">泰伯篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_27.aspx">子罕篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_28.aspx">乡党篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_29.aspx">先进篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_30.aspx">颜渊篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_31.aspx">子路篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_32.aspx">宪问篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_33.aspx">卫灵公篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_34.aspx">季氏篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_35.aspx">阳货篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_36.aspx">微子篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_37.aspx">子张篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_38.aspx">尧曰篇</a></span>&#;
&#;
</ul>&#;
</div>
raw数据
2.content是网页的具体数据,不包含html标记
学而篇
为政篇
八佾篇
里仁篇
公冶长篇
雍也篇
述而篇
泰伯篇
子罕篇
乡党篇
先进篇
颜渊篇
子路篇
宪问篇
卫灵公篇
季氏篇
阳货篇
微子篇
子张篇
尧曰篇
content数据
3.tagName是html标签
输出只有div
4.attributes,属性。。。。。。。
class = bookcont;
5.children子节点
(
"{\n nodeContent = \"
\\n \";\n nodeName = text;\n}",
"{\n nodeChildArray = (\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_19.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b66\\U800c\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b66\\U800c\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_19.aspx\\\">\\U5b66\\U800c\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5b66\\U800c\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_19.aspx\\\">\\U5b66\\U800c\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_20.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U4e3a\\U653f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U4e3a\\U653f\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_20.aspx\\\">\\U4e3a\\U653f\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U4e3a\\U653f\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_20.aspx\\\">\\U4e3a\\U653f\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_21.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U516b\\U4f7e\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U516b\\U4f7e\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_21.aspx\\\">\\U516b\\U4f7e\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U516b\\U4f7e\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_21.aspx\\\">\\U516b\\U4f7e\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_22.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U91cc\\U4ec1\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U91cc\\U4ec1\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_22.aspx\\\">\\U91cc\\U4ec1\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U91cc\\U4ec1\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_22.aspx\\\">\\U91cc\\U4ec1\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_23.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U516c\\U51b6\\U957f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U516c\\U51b6\\U957f\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_23.aspx\\\">\\U516c\\U51b6\\U957f\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U516c\\U51b6\\U957f\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_23.aspx\\\">\\U516c\\U51b6\\U957f\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_24.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U96cd\\U4e5f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U96cd\\U4e5f\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_24.aspx\\\">\\U96cd\\U4e5f\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U96cd\\U4e5f\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_24.aspx\\\">\\U96cd\\U4e5f\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_25.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U8ff0\\U800c\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U8ff0\\U800c\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_25.aspx\\\">\\U8ff0\\U800c\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U8ff0\\U800c\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_25.aspx\\\">\\U8ff0\\U800c\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_26.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U6cf0\\U4f2f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U6cf0\\U4f2f\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_26.aspx\\\">\\U6cf0\\U4f2f\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U6cf0\\U4f2f\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_26.aspx\\\">\\U6cf0\\U4f2f\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_27.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b50\\U7f55\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b50\\U7f55\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_27.aspx\\\">\\U5b50\\U7f55\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5b50\\U7f55\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_27.aspx\\\">\\U5b50\\U7f55\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_28.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U4e61\\U515a\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U4e61\\U515a\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_28.aspx\\\">\\U4e61\\U515a\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U4e61\\U515a\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_28.aspx\\\">\\U4e61\\U515a\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_29.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5148\\U8fdb\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5148\\U8fdb\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_29.aspx\\\">\\U5148\\U8fdb\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5148\\U8fdb\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_29.aspx\\\">\\U5148\\U8fdb\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_30.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U989c\\U6e0a\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U989c\\U6e0a\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_30.aspx\\\">\\U989c\\U6e0a\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U989c\\U6e0a\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_30.aspx\\\">\\U989c\\U6e0a\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_31.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b50\\U8def\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b50\\U8def\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_31.aspx\\\">\\U5b50\\U8def\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5b50\\U8def\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_31.aspx\\\">\\U5b50\\U8def\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_32.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5baa\\U95ee\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5baa\\U95ee\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_32.aspx\\\">\\U5baa\\U95ee\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5baa\\U95ee\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_32.aspx\\\">\\U5baa\\U95ee\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_33.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U536b\\U7075\\U516c\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U536b\\U7075\\U516c\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_33.aspx\\\">\\U536b\\U7075\\U516c\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U536b\\U7075\\U516c\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_33.aspx\\\">\\U536b\\U7075\\U516c\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_34.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b63\\U6c0f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b63\\U6c0f\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_34.aspx\\\">\\U5b63\\U6c0f\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5b63\\U6c0f\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_34.aspx\\\">\\U5b63\\U6c0f\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_35.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U9633\\U8d27\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U9633\\U8d27\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_35.aspx\\\">\\U9633\\U8d27\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U9633\\U8d27\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_35.aspx\\\">\\U9633\\U8d27\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_36.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5fae\\U5b50\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5fae\\U5b50\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_36.aspx\\\">\\U5fae\\U5b50\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5fae\\U5b50\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_36.aspx\\\">\\U5fae\\U5b50\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_37.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b50\\U5f20\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b50\\U5f20\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_37.aspx\\\">\\U5b50\\U5f20\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5b50\\U5f20\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_37.aspx\\\">\\U5b50\\U5f20\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_38.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5c27\\U66f0\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5c27\\U66f0\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_38.aspx\\\">\\U5c27\\U66f0\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5c27\\U66f0\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_38.aspx\\\">\\U5c27\\U66f0\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n }\n );\n nodeContent = \"
\\n
\\n \\U5b66\\U800c\\U7bc7
\\n
\\n \\U4e3a\\U653f\\U7bc7
\\n
\\n \\U516b\\U4f7e\\U7bc7
\\n
\\n \\U91cc\\U4ec1\\U7bc7
\\n
\\n \\U516c\\U51b6\\U957f\\U7bc7
\\n
\\n \\U96cd\\U4e5f\\U7bc7
\\n
\\n \\U8ff0\\U800c\\U7bc7
\\n
\\n \\U6cf0\\U4f2f\\U7bc7
\\n
\\n \\U5b50\\U7f55\\U7bc7
\\n
\\n \\U4e61\\U515a\\U7bc7
\\n
\\n \\U5148\\U8fdb\\U7bc7
\\n
\\n \\U989c\\U6e0a\\U7bc7
\\n
\\n \\U5b50\\U8def\\U7bc7
\\n
\\n \\U5baa\\U95ee\\U7bc7
\\n
\\n \\U536b\\U7075\\U516c\\U7bc7
\\n
\\n \\U5b63\\U6c0f\\U7bc7
\\n
\\n \\U9633\\U8d27\\U7bc7
\\n
\\n \\U5fae\\U5b50\\U7bc7
\\n
\\n \\U5b50\\U5f20\\U7bc7
\\n
\\n \\U5c27\\U66f0\\U7bc7
\\n
\\n \";\n nodeName = ul;\n raw = \"<ul> \\n \\n <span><a href=\\\"/guwen/bookv_19.aspx\\\">\\U5b66\\U800c\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_20.aspx\\\">\\U4e3a\\U653f\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_21.aspx\\\">\\U516b\\U4f7e\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_22.aspx\\\">\\U91cc\\U4ec1\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_23.aspx\\\">\\U516c\\U51b6\\U957f\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_24.aspx\\\">\\U96cd\\U4e5f\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_25.aspx\\\">\\U8ff0\\U800c\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_26.aspx\\\">\\U6cf0\\U4f2f\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_27.aspx\\\">\\U5b50\\U7f55\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_28.aspx\\\">\\U4e61\\U515a\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_29.aspx\\\">\\U5148\\U8fdb\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_30.aspx\\\">\\U989c\\U6e0a\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_31.aspx\\\">\\U5b50\\U8def\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_32.aspx\\\">\\U5baa\\U95ee\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_33.aspx\\\">\\U536b\\U7075\\U516c\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_34.aspx\\\">\\U5b63\\U6c0f\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_35.aspx\\\">\\U9633\\U8d27\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_36.aspx\\\">\\U5fae\\U5b50\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_37.aspx\\\">\\U5b50\\U5f20\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_38.aspx\\\">\\U5c27\\U66f0\\U7bc7</a></span> \\n \\n </ul>\";\n}",
"{\n nodeContent = \"
\\n \";\n nodeName = text;\n}"
)
children
6.firstChild
{
nodeContent = "
\n ";
nodeName = text;
}
上面属性都是涉及HTML语言的标记,我们一般使用的时content属性,然后处理得到的NSString对象
这样我们就得到并处理为我们想要的数据。TFHppleElement是一个很重要的类,具体使用在这里就不介绍了。
【推荐】oc解析HTML数据的类库(爬取网页数据)的更多相关文章
- 使用webdriver+urllib爬取网页数据(模拟登陆,过验证码)
urilib是python的标准库,当我们使用Python爬取网页数据时,往往用的是urllib模块,通过调用urllib模块的urlopen(url)方法返回网页对象,并使用read()方法获得ur ...
- python之爬取网页数据总结(一)
今天尝试使用python,爬取网页数据.因为python是新安装好的,所以要正常运行爬取数据的代码需要提前安装插件.分别为requests Beautifulsoup4 lxml 三个插件 ...
- Python网页解析库:用requests-html爬取网页
Python网页解析库:用requests-html爬取网页 1. 开始 Python 中可以进行网页解析的库有很多,常见的有 BeautifulSoup 和 lxml 等.在网上玩爬虫的文章通常都是 ...
- python爬虫——爬取网页数据和解析数据
1.网络爬虫的基本概念 网络爬虫(又称网络蜘蛛,机器人),就是模拟客户端发送网络请求,接收请求响应,一种按照一定的规则,自动地抓取互联网信息的程序.只要浏览器能够做的事情,原则上,爬虫都能够做到. 2 ...
- 接着上次的python爬虫,今天进阶一哈,局部解析爬取网页数据
*解析网页数据的仓库 用Beatifulsoup基于lxml包lxml包基于html和xml的标记语言的解析包.可以去解析网页的内容,把我们想要的提取出来. 第一步.导入两个包,项目中必须包含beau ...
- 使用 Python 爬取网页数据
1. 使用 urllib.request 获取网页 urllib 是 Python 內建的 HTTP 库, 使用 urllib 可以只需要很简单的步骤就能高效采集数据; 配合 Beautiful 等 ...
- 03:requests与BeautifulSoup结合爬取网页数据应用
1.1 爬虫相关模块命令回顾 1.requests模块 1. pip install requests 2. response = requests.get('http://www.baidu.com ...
- Selenium+Tesseract-OCR智能识别验证码爬取网页数据
1.项目需求描述 通过订单号获取某系统内订单的详细数据,不需要账号密码的登录验证,但有图片验证码的动态识别,将获取到的数据存到数据库. 2.整体思路 1.通过Selenium技术,无窗口模式打开浏览器 ...
- 使用puppeteer爬取网页数据实践小结
简单介绍Puppeteer Puppeteer是一个Node库,它通过DevTools协议提供高级API来控制Chrome或Chromium.Puppeteer默认以无头方式运行,但可以配置为有头方式 ...
随机推荐
- XMPP系列1:简介
通俗解释其实XMPP 是一种很类似于http协议的一种数据传输协议,它的过程就如同“解包装--〉包装”的过程,用户只需要明白它接收的类型,并理解它返回的类型,就可以很好的利用xmpp来进行数据通讯.系 ...
- Windows共享内存示例
共享内存主要是通过映射机制实现的. Windows 下进程的地址空间在逻辑上是相互隔离的,但在物理上却是重叠的.所谓的重叠是指同一块内存区域可能被多个进程同时使用.当调用 CreateFileMapp ...
- Scala 深入浅出实战经典 第58讲:Scala中Abstract Types实战详解
王家林亲授<DT大数据梦工厂>大数据实战视频 Scala 深入浅出实战经典(1-87讲)完整视频.PPT.代码下载: 百度云盘:http://pan.baidu.com/s/1c0noOt ...
- Codeforces Round #382 (Div. 2) D. Taxes 哥德巴赫猜想
D. Taxes 题目链接 http://codeforces.com/contest/735/problem/D 题面 Mr. Funt now lives in a country with a ...
- Windows下msysGit使用及相关配置
Windows下msysGit使用 目前我们git通过ssh进行通信,所以需要你也安装ssh以及将ssh key发给我,省得每次都需要输入用户名和密码 1.创建工程目录 windows下进入msysG ...
- ubuntu 下安装 lxml 失败
/tmp/pip-build-7HN4t8/lxml/src/lxml/includes/etree_defs.h:14:31: fatal error: libxml/xmlversion.h: N ...
- Window 通过cmd查看端口占用、相应进程、杀死进程等的命令【转】
一. 查看所有进程占用的端口 在开始-运行-cmd,输入:netstat –ano可以查看所有进程 二.查看占用指定端口的程序 当你在用tomcat发布程序时,经常会遇到端口被占用的情况,我们想知 ...
- 没有公网IP的服务器如何通过有公网的服务器实现远程管理的功能?即VPN服务器搭建过程
由于很多PPPoE帐号都没有公网IP了,那我们如何实现远程管理呢? 答案是比较简单的,首先,你要拥有一台有公网IP的路由器,其他品牌的也可以,但是下面我以WayOs作为VPN服务器,给大家一个教程. ...
- GitHub上排名前100的Android开源库介绍(来自github)
本项目主要对目前 GitHub 上排名前 100 的 Android 开源库进行简单的介绍,至于排名完全是根据 GitHub 搜索 Java 语言选择 (Best Match) 得到的结果,然后过滤了 ...
- Rxlifecycle(一):使用
Rxlifecycle使用非常方便简单,如下: 1.集成 build.gradle添加 //Rxlifecycle compile 'com.trello:rxlifecycle:0.3.1' com ...