【推荐】oc解析HTML数据的类库(爬取网页数据)
TFhpple是一个用于解析html数据的第三方库,本人感觉功能还算可以,只不过在使用前必须配置项目。
配置
1.导入libxml2.tbd
2.设置编译路径

使用
这里使用一个例子来说明
http://so.gushiwen.org/guwen/book_2.aspx
1.创建TFHpple对象,data为网站返回的数据
TFHpple *htmlParser = [[TFHpple alloc] initWithHTMLData:data];
2.使用searchWithXPathQuery方法得到有用数据,XPATH知识具体百度
NSArray *temp1 = [htmlParser searchWithXPathQuery:@"//div[@class='shileft']/div[@class='bookcont']"]
这样我们获取了论语的数据
3。获取并分析元素
TFHppleElement *element = [elements objectAtIndex:i];
TFHppleElement对象包含许多属性,下面简单介绍一下各属性
1。
@property (nonatomic, copy, readonly) NSString *raw
raw是包含html标记的网页数据
<div class="bookcont">&#;
<ul>&#;
&#;
<span><a href="/guwen/bookv_19.aspx">学而篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_20.aspx">为政篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_21.aspx">八佾篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_22.aspx">里仁篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_23.aspx">公冶长篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_24.aspx">雍也篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_25.aspx">述而篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_26.aspx">泰伯篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_27.aspx">子罕篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_28.aspx">乡党篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_29.aspx">先进篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_30.aspx">颜渊篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_31.aspx">子路篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_32.aspx">宪问篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_33.aspx">卫灵公篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_34.aspx">季氏篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_35.aspx">阳货篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_36.aspx">微子篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_37.aspx">子张篇</a></span>&#;
&#;
<span><a href="/guwen/bookv_38.aspx">尧曰篇</a></span>&#;
&#;
</ul>&#;
</div>
raw数据
2.content是网页的具体数据,不包含html标记
学而篇
为政篇
八佾篇
里仁篇
公冶长篇
雍也篇
述而篇
泰伯篇
子罕篇
乡党篇
先进篇
颜渊篇
子路篇
宪问篇
卫灵公篇
季氏篇
阳货篇
微子篇
子张篇
尧曰篇
content数据
3.tagName是html标签
输出只有div
4.attributes,属性。。。。。。。
class = bookcont;
5.children子节点
(
"{\n nodeContent = \"
\\n \";\n nodeName = text;\n}",
"{\n nodeChildArray = (\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_19.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b66\\U800c\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b66\\U800c\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_19.aspx\\\">\\U5b66\\U800c\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5b66\\U800c\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_19.aspx\\\">\\U5b66\\U800c\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_20.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U4e3a\\U653f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U4e3a\\U653f\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_20.aspx\\\">\\U4e3a\\U653f\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U4e3a\\U653f\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_20.aspx\\\">\\U4e3a\\U653f\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_21.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U516b\\U4f7e\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U516b\\U4f7e\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_21.aspx\\\">\\U516b\\U4f7e\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U516b\\U4f7e\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_21.aspx\\\">\\U516b\\U4f7e\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_22.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U91cc\\U4ec1\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U91cc\\U4ec1\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_22.aspx\\\">\\U91cc\\U4ec1\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U91cc\\U4ec1\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_22.aspx\\\">\\U91cc\\U4ec1\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_23.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U516c\\U51b6\\U957f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U516c\\U51b6\\U957f\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_23.aspx\\\">\\U516c\\U51b6\\U957f\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U516c\\U51b6\\U957f\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_23.aspx\\\">\\U516c\\U51b6\\U957f\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_24.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U96cd\\U4e5f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U96cd\\U4e5f\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_24.aspx\\\">\\U96cd\\U4e5f\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U96cd\\U4e5f\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_24.aspx\\\">\\U96cd\\U4e5f\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_25.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U8ff0\\U800c\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U8ff0\\U800c\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_25.aspx\\\">\\U8ff0\\U800c\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U8ff0\\U800c\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_25.aspx\\\">\\U8ff0\\U800c\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_26.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U6cf0\\U4f2f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U6cf0\\U4f2f\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_26.aspx\\\">\\U6cf0\\U4f2f\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U6cf0\\U4f2f\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_26.aspx\\\">\\U6cf0\\U4f2f\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_27.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b50\\U7f55\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b50\\U7f55\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_27.aspx\\\">\\U5b50\\U7f55\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5b50\\U7f55\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_27.aspx\\\">\\U5b50\\U7f55\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_28.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U4e61\\U515a\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U4e61\\U515a\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_28.aspx\\\">\\U4e61\\U515a\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U4e61\\U515a\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_28.aspx\\\">\\U4e61\\U515a\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_29.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5148\\U8fdb\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5148\\U8fdb\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_29.aspx\\\">\\U5148\\U8fdb\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5148\\U8fdb\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_29.aspx\\\">\\U5148\\U8fdb\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_30.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U989c\\U6e0a\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U989c\\U6e0a\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_30.aspx\\\">\\U989c\\U6e0a\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U989c\\U6e0a\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_30.aspx\\\">\\U989c\\U6e0a\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_31.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b50\\U8def\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b50\\U8def\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_31.aspx\\\">\\U5b50\\U8def\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5b50\\U8def\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_31.aspx\\\">\\U5b50\\U8def\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_32.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5baa\\U95ee\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5baa\\U95ee\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_32.aspx\\\">\\U5baa\\U95ee\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5baa\\U95ee\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_32.aspx\\\">\\U5baa\\U95ee\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_33.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U536b\\U7075\\U516c\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U536b\\U7075\\U516c\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_33.aspx\\\">\\U536b\\U7075\\U516c\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U536b\\U7075\\U516c\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_33.aspx\\\">\\U536b\\U7075\\U516c\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_34.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b63\\U6c0f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b63\\U6c0f\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_34.aspx\\\">\\U5b63\\U6c0f\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5b63\\U6c0f\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_34.aspx\\\">\\U5b63\\U6c0f\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_35.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U9633\\U8d27\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U9633\\U8d27\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_35.aspx\\\">\\U9633\\U8d27\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U9633\\U8d27\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_35.aspx\\\">\\U9633\\U8d27\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_36.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5fae\\U5b50\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5fae\\U5b50\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_36.aspx\\\">\\U5fae\\U5b50\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5fae\\U5b50\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_36.aspx\\\">\\U5fae\\U5b50\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_37.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b50\\U5f20\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b50\\U5f20\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_37.aspx\\\">\\U5b50\\U5f20\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5b50\\U5f20\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_37.aspx\\\">\\U5b50\\U5f20\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_38.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5c27\\U66f0\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5c27\\U66f0\\U7bc7\";\n nodeName = a;\n raw = \"<a href=\\\"/guwen/bookv_38.aspx\\\">\\U5c27\\U66f0\\U7bc7</a>\";\n }\n );\n nodeContent = \"\\U5c27\\U66f0\\U7bc7\";\n nodeName = span;\n raw = \"<span><a href=\\\"/guwen/bookv_38.aspx\\\">\\U5c27\\U66f0\\U7bc7</a></span>\";\n },\n {\n nodeContent = \"
\\n
\\n \";\n nodeName = text;\n }\n );\n nodeContent = \"
\\n
\\n \\U5b66\\U800c\\U7bc7
\\n
\\n \\U4e3a\\U653f\\U7bc7
\\n
\\n \\U516b\\U4f7e\\U7bc7
\\n
\\n \\U91cc\\U4ec1\\U7bc7
\\n
\\n \\U516c\\U51b6\\U957f\\U7bc7
\\n
\\n \\U96cd\\U4e5f\\U7bc7
\\n
\\n \\U8ff0\\U800c\\U7bc7
\\n
\\n \\U6cf0\\U4f2f\\U7bc7
\\n
\\n \\U5b50\\U7f55\\U7bc7
\\n
\\n \\U4e61\\U515a\\U7bc7
\\n
\\n \\U5148\\U8fdb\\U7bc7
\\n
\\n \\U989c\\U6e0a\\U7bc7
\\n
\\n \\U5b50\\U8def\\U7bc7
\\n
\\n \\U5baa\\U95ee\\U7bc7
\\n
\\n \\U536b\\U7075\\U516c\\U7bc7
\\n
\\n \\U5b63\\U6c0f\\U7bc7
\\n
\\n \\U9633\\U8d27\\U7bc7
\\n
\\n \\U5fae\\U5b50\\U7bc7
\\n
\\n \\U5b50\\U5f20\\U7bc7
\\n
\\n \\U5c27\\U66f0\\U7bc7
\\n
\\n \";\n nodeName = ul;\n raw = \"<ul> \\n \\n <span><a href=\\\"/guwen/bookv_19.aspx\\\">\\U5b66\\U800c\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_20.aspx\\\">\\U4e3a\\U653f\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_21.aspx\\\">\\U516b\\U4f7e\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_22.aspx\\\">\\U91cc\\U4ec1\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_23.aspx\\\">\\U516c\\U51b6\\U957f\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_24.aspx\\\">\\U96cd\\U4e5f\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_25.aspx\\\">\\U8ff0\\U800c\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_26.aspx\\\">\\U6cf0\\U4f2f\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_27.aspx\\\">\\U5b50\\U7f55\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_28.aspx\\\">\\U4e61\\U515a\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_29.aspx\\\">\\U5148\\U8fdb\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_30.aspx\\\">\\U989c\\U6e0a\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_31.aspx\\\">\\U5b50\\U8def\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_32.aspx\\\">\\U5baa\\U95ee\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_33.aspx\\\">\\U536b\\U7075\\U516c\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_34.aspx\\\">\\U5b63\\U6c0f\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_35.aspx\\\">\\U9633\\U8d27\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_36.aspx\\\">\\U5fae\\U5b50\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_37.aspx\\\">\\U5b50\\U5f20\\U7bc7</a></span> \\n \\n <span><a href=\\\"/guwen/bookv_38.aspx\\\">\\U5c27\\U66f0\\U7bc7</a></span> \\n \\n </ul>\";\n}",
"{\n nodeContent = \"
\\n \";\n nodeName = text;\n}"
)
children
6.firstChild
{
nodeContent = "
\n ";
nodeName = text;
}
上面属性都是涉及HTML语言的标记,我们一般使用的时content属性,然后处理得到的NSString对象
这样我们就得到并处理为我们想要的数据。TFHppleElement是一个很重要的类,具体使用在这里就不介绍了。
【推荐】oc解析HTML数据的类库(爬取网页数据)的更多相关文章
- 使用webdriver+urllib爬取网页数据(模拟登陆,过验证码)
urilib是python的标准库,当我们使用Python爬取网页数据时,往往用的是urllib模块,通过调用urllib模块的urlopen(url)方法返回网页对象,并使用read()方法获得ur ...
- python之爬取网页数据总结(一)
今天尝试使用python,爬取网页数据.因为python是新安装好的,所以要正常运行爬取数据的代码需要提前安装插件.分别为requests Beautifulsoup4 lxml 三个插件 ...
- Python网页解析库:用requests-html爬取网页
Python网页解析库:用requests-html爬取网页 1. 开始 Python 中可以进行网页解析的库有很多,常见的有 BeautifulSoup 和 lxml 等.在网上玩爬虫的文章通常都是 ...
- python爬虫——爬取网页数据和解析数据
1.网络爬虫的基本概念 网络爬虫(又称网络蜘蛛,机器人),就是模拟客户端发送网络请求,接收请求响应,一种按照一定的规则,自动地抓取互联网信息的程序.只要浏览器能够做的事情,原则上,爬虫都能够做到. 2 ...
- 接着上次的python爬虫,今天进阶一哈,局部解析爬取网页数据
*解析网页数据的仓库 用Beatifulsoup基于lxml包lxml包基于html和xml的标记语言的解析包.可以去解析网页的内容,把我们想要的提取出来. 第一步.导入两个包,项目中必须包含beau ...
- 使用 Python 爬取网页数据
1. 使用 urllib.request 获取网页 urllib 是 Python 內建的 HTTP 库, 使用 urllib 可以只需要很简单的步骤就能高效采集数据; 配合 Beautiful 等 ...
- 03:requests与BeautifulSoup结合爬取网页数据应用
1.1 爬虫相关模块命令回顾 1.requests模块 1. pip install requests 2. response = requests.get('http://www.baidu.com ...
- Selenium+Tesseract-OCR智能识别验证码爬取网页数据
1.项目需求描述 通过订单号获取某系统内订单的详细数据,不需要账号密码的登录验证,但有图片验证码的动态识别,将获取到的数据存到数据库. 2.整体思路 1.通过Selenium技术,无窗口模式打开浏览器 ...
- 使用puppeteer爬取网页数据实践小结
简单介绍Puppeteer Puppeteer是一个Node库,它通过DevTools协议提供高级API来控制Chrome或Chromium.Puppeteer默认以无头方式运行,但可以配置为有头方式 ...
随机推荐
- WIN7、WIN8 右键在目录当前打开命令行Cmd窗口(图文)
Win7系统大家习惯“Win+R”的组合键打开命令提示符. 通常右击文件夹是没有进入命令行 进入某个文件夹里面,先按住Shift键,然后鼠标右键,出现选项“在此处打开命令窗口(W)”也可以打开命令行. ...
- Passwordless SSH Login
原文地址:http://manjeetdahiya.com/2011/03/03/passwordless-ssh-login/ Consider two machines A and B. We w ...
- word2007无法执行语言识别
步驟1:取消“啟用自動語言檢測”在“審閱”選項卡上的“校對”組中,單擊“設置語言”(一個圖標,看起來類似於前麵帶有複選標記的地球).取消“自動檢測語言”複選框.步驟2:取消“鍵入入時檢查拚寫”到Wor ...
- [GraphQL] Use GraphQL's List Type for Collections
In order to handle collections of items in a GraphQL Schema, GraphQL has a List Type. In this video, ...
- 【C#|.NET】lock(this)其实是个坑
这里不考虑分布式或者多台负载均衡的情况只考虑单台机器,多台服务器可以使用分布式锁.出于线程安全的原因,很多种场景大家可能看代码中看到lock的出现,尤其是在资金类的处理环节. 但是lock(this) ...
- Hybris电商方案介绍(企业全渠道) B2B B2C O2O建设
1). 什么是Hybris: hybris software成立于1997年,2013年与SAP整合,成为SAP旗下的一份子,提供全渠道客户互动与商务解决方案,该解决方案能够为各机构提供客户的实时背景 ...
- .Net基础
标题 状态 内容 NET应用程序是如何执行的? http://www.cnblogs.com/kingmoon/archive/2012/07/16/2594459.html ...
- HTTP协议学习
面试过程中又一个常见的问题,http协议,因为做服务器开发如果用http协议的话,现在各种开源软件都封装好了,python中只需要简单的继承定义好的类,重写get或者post等方法,几行代码就可以搭建 ...
- Solution to “VirtualBox can't operate in VMX root mode” error in Windows 7
I was trying out various virtualization solutions on Windows 7, including Microsoft Virtual PC and V ...
- vxworks下网络编程一:网络字节序问题
inet_addr("192.168.1.1");//返回网络字节序整型ip地址inet_ntoa(saddr);//将包含网络字节序整型ip地址的in_addr对象转换成本地ch ...