【译】Asp.net mvc 使用ITextSharp PDF to HTML （解决img标签问题）

前言：因项目需求，需要将HTML代码转成PDF。大致上已经实现了，可以是发现使用ITextSharp（我现在的版本是5.5.9）的时候，img标签中的src只能跟绝对路径。

在百度上找了一个上午，有一点关联的解决方案都没有。最后去谷歌求助，终于找到了。

这是原文：http://www.am22tech.com/html-to-pdf/（可能需要翻墙）

这是我总结后做的一个例子（使用的是第二个解决方法）：http://files.cnblogs.com/files/zuochengsi-9/HTML%E8%BD%ACPDF.zip

不懂的也可以参考我的这篇博客：http://www.cnblogs.com/zuochengsi-9/p/5483808.html

------------------------------------------------------------------

正文：

我正在到处寻找完美的例子，可是没有一个能完美的解决我的需求。我的需求非常简单，如下：

创建一个PDF文档从一个HTML页面。这个HTML里的代码包含的了img标签，同时用的相对路径。

我找到了有价值的信息从这几个地方http://kuujinbo.info/iTextSharp/tableWithImageToPdf.aspx

和http://hamang.net/2008/08/14/html-to-pdf-in-net/

最终，可以用下面的asp.net代码解决了我的问题。我希望这也能帮助到你！

必要条件：

Download and copy iTextSharp.dll 我的版本是5.1.1

问题和解决方案：

这个新的ITextSharp库，对于HTML代码转PDF已经做的很好了。可是有个主要的缺陷，图片的URL映射只能是绝对路径。

不然HTMLworlker类就会抛异常，如果你用的相对路径。

这里有两个解决方法对于这个问题：

1、用IImageProvider 接口取出所有的图片从HTML代码中，然后再"paste"PDF中。

但是HTML代码中对img修饰的style，例如height和width都不会保留下来。

2、解析HTML代码，同时用绝对的URL替换相对的URL在写入PDF文件之前。

这个方法会保存HTML代码中对与<img>设置的height和width。当然这个方法更好。

不过我还是提供两种解决方案让你自己去选择。

基本的准备：

添加一个新的page在你的代码中

PostToPDF_AM22.aspx

PostToPDF_AM22.aspx.cs

方法1：

using System;

    using System.Collections.Generic;

    using System.Linq;

    using System.Web;

    using System.Web.UI;

    using System.Web.UI.WebControls;

    //HTML to PDF 的引用

    using iTextSharp.text;

    using iTextSharp.text.html;

    using iTextSharp.text.pdf;

    using iTextSharp.text.xml;

    using iTextSharp.text.html.simpleparser;

    using System.IO;

    using System.util;

    using System.Text.RegularExpressions;

    //For converting HTML TO PDF- END

    public partial class PostToPDF_AM22 : System.Web.UI.Page

    {

    protected void Page_Load(object sender, EventArgs e)

    {

    //Get the HTML code from your database or whereever you have stored it and store

    //it in HTMLCode variable.

    string HTMLCode = string.Empty;

    ConvertHTMLToPDF(HTMLCode);

    }

    protected void ConvertHTMLToPDF(string HTMLCode)

    {

    HttpContext context = HttpContext.Current;

    //Render PlaceHolder to temporary stream

    System.IO.StringWriter stringWrite = new StringWriter();

    System.Web.UI.HtmlTextWriter htmlWrite = new HtmlTextWriter(stringWrite);

    StringReader reader = new StringReader(HTMLCode);

    //Create PDF document

    Document doc = new Document(PageSize.A4);

    HTMLWorker parser = new HTMLWorker(doc);

    PdfWriter.GetInstance(doc, new FileStream(Server.MapPath("~") + "/App_Data/HTMLToPDF.pdf",

    FileMode.Create));

    doc.Open();

    /********************************************************************************/

    var interfaceProps = new Dictionary<string, Object>();

    var ih = new ImageHander() { BaseUri = Request.Url.ToString() };

    interfaceProps.Add(HTMLWorker.IMG_PROVIDER, ih);

    foreach (IElement element in HTMLWorker.ParseToList(

    new StringReader(HTMLCode), null))

    {

    doc.Add(element);

    }

    doc.Close();

    Response.End();

    /********************************************************************************/

    }

    //handle Image relative and absolute URL's

    public class ImageHander : IImageProvider

    {

    public string BaseUri;

    public iTextSharp.text.Image GetImage(string src,

    IDictionary<string, string> h,

    ChainedProperties cprops,

    IDocListener doc)

    {

    string imgPath = string.Empty;

    if (src.ToLower().Contains("http://") == false)

    {

    imgPath = HttpContext.Current.Request.Url.Scheme + "://" +

    HttpContext.Current.Request.Url.Authority + src;

    }

    else

    {

    imgPath = src;

    }

    return iTextSharp.text.Image.GetInstance(imgPath);

    }

    }

    }

方法2：

using System;

    using System.Collections.Generic;

    using System.Linq;

    using System.Web;

    using System.Web.UI;

    using System.Web.UI.WebControls;

    //For converting HTML TO PDF- START

    using iTextSharp.text;

    using iTextSharp.text.html;

    using iTextSharp.text.pdf;

    using iTextSharp.text.xml;

    using iTextSharp.text.html.simpleparser;

    using System.IO;

    using System.util;

    using System.Text.RegularExpressions;

    //For converting HTML TO PDF- END

    public partial class PostToPDF_AM22 : System.Web.UI.Page

    {

    protected void Page_Load(object sender, EventArgs e)

    {

    //Get the HTML code from your database or whereever you have stored it and store

    //it in HTMLCode variable.

    string HTMLCode = string.Empty;

    ConvertHTMLToPDF(HTMLCode);

    }

    protected void ConvertHTMLToPDF(string HTMLCode)

    {

    HttpContext context = HttpContext.Current;

    //Render PlaceHolder to temporary stream

    System.IO.StringWriter stringWrite = new StringWriter();

    System.Web.UI.HtmlTextWriter htmlWrite = new HtmlTextWriter(stringWrite);

    /********************************************************************************/

    //Try adding source strings for each image in content

    string tempPostContent = getImage(HTMLCode);

    /*********************************************************************************/

    StringReader reader = new StringReader(tempPostContent);

    //Create PDF document

    Document doc = new Document(PageSize.A4);

    HTMLWorker parser = new HTMLWorker(doc);

    PdfWriter.GetInstance(doc, new FileStream(Server.MapPath("~") + "/App_Data/HTMLToPDF.pdf",

    FileMode.Create));

    doc.Open();

    try

    {

    //Parse Html and dump the result in PDF file

    parser.Parse(reader);

    }

    catch (Exception ex)

    {

    //Display parser errors in PDF.

    Paragraph paragraph = new Paragraph("Error!" + ex.Message);

    Chunk text = paragraph.Chunks[] as Chunk;

    if (text != null)

    {

    text.Font.Color = BaseColor.RED;

    }

    doc.Add(paragraph);

    }

    finally

    {

    doc.Close();

    }

    }

    public string getImage(string input)

    {

    if (input == null)

    return string.Empty;

    string tempInput = input;

    string pattern = @"<img(.|\n)+?>";

    string src = string.Empty;

    HttpContext context = HttpContext.Current;

    //Change the relative URL's to absolute URL's for an image, if any in the HTML code.

    foreach (Match m in Regex.Matches(input, pattern, RegexOptions.IgnoreCase | RegexOptions.Multiline |

    RegexOptions.RightToLeft))

    {

    if (m.Success)

    {

    string tempM = m.Value;

    string pattern1 = "src=[\'|\"](.+?)[\'|\"]";

    Regex reImg = new Regex(pattern1, RegexOptions.IgnoreCase | RegexOptions.Multiline);

    Match mImg = reImg.Match(m.Value);

    if (mImg.Success)

    {

    src = mImg.Value.ToLower().Replace("src=", "").Replace("\"", "");

    if (src.ToLower().Contains("http://") == false)

    {

    //Insert new URL in img tag

    src = "src=\"" + context.Request.Url.Scheme + "://" +

    context.Request.Url.Authority + src + "\"";

    try

    {

    tempM = tempM.Remove(mImg.Index, mImg.Length);

    tempM = tempM.Insert(mImg.Index, src);

    //insert new url img tag in whole html code

    tempInput = tempInput.Remove(m.Index, m.Length);

    tempInput = tempInput.Insert(m.Index, tempM);

    }

    catch (Exception e)

    {

    }

    }

    }

    }

    }

    return tempInput;

    }

    string getSrc(string input)

    {

    string pattern = "src=[\'|\"](.+?)[\'|\"]";

    System.Text.RegularExpressions.Regex reImg = new System.Text.RegularExpressions.Regex(pattern,

    System.Text.RegularExpressions.RegexOptions.IgnoreCase |

    System.Text.RegularExpressions.RegexOptions.Multiline);

    System.Text.RegularExpressions.Match mImg = reImg.Match(input);

    if (mImg.Success)

    {

    return mImg.Value.Replace("src=", "").Replace("\"", ""); ;

    }

    return string.Empty;

    }

    }

说明：

上面的两种方案，都有一个方法ConvertHTMLToPDF，对于得到的HTML代码的格式是有要求的，具体可以去ITextSharp官网看看。

最后结果存储的一个PDF文档的名字叫HTMLToPDF.pdf在你的web站点的App_Data文件夹里

记得，你需要写代码去拿到HTML代码从你的数据库中或者其他文件里在上面的Page_Load事件中。

通过HTML代码转换函数,它将为您创建PDF文件。

如果你面临任何问题，写在评论中，我会尽力帮助你。

——————————————————————————————————————

初次翻译，就直接原样翻译了。但通过这次就感觉看英文资料没有以前那种抗拒感了。果然还是有尝试，就会有收获！

【译】Asp.net mvc 使用ITextSharp PDF to HTML （解决img标签问题）的更多相关文章

[译]Asp.net MVC 之 Contorllers（二）
URL路由模块取代URL重写路由请求 URL路由模块的内部结构应用程序路由 URL模式和路由定义应用程序路由处理路由路由处理程序处理物理文件请求防止路由定义的URL 属性路由书接上回 ...
[转][译]ASP.NET MVC 4 移动特性
此教程将讨论ASP.NET MVC 4 Web应用程序里的移动特性.对于此教程,可以使用 Visual Studio Express 2012 或者 Visual Web Developer 2010 ...
[译]Asp.net MVC 之 Contorllers（一）
Asp.net MVC contorllers 在Ajax全面开花的时代,ASP.NET Web Forms 开始慢慢变得落后.有人说,Ajax已经给了Asp.net致命一击.Ajax使越来越多的控制 ...
IIS+Asp.Net Mvc必须知道的事（解决启动/重启/自动回收站点后第一次访问慢问题）
问题现象: Asp.net Mvc站点部署在IIS上后,第一个用户第一次访问站点,都会比较慢,确切的说是访问站点的Action页面(即非静态页面,因为静态页面直接由IIS处理返回给用户即完成请求,而A ...
ASP.NET MVC AJAX 请求中加入 antiforgerytoken 解决“所需的防伪表单字段“__RequestVerificationToken”不存在”问题
在ASP.NET mvc中如果在表中使用了@Html.AntiForgeryToken(),ajax post不会请求成功解决方法是在ajax中加入__RequestVerificationToke ...
asp.net mvc 控制器中操作方法重载问题解决
Controllers: public ActionResult Index() { return View(db.GuestBooks.ToList()); } // // GET: /Guest2 ...
ASP.Net MVC——使用 ITextSharp 完美解决HTML转PDF（中文也可以）
前言: 最近在做老师交代的一个在线写实验报告的小项目中,有这么个需求:把学生提交的实验报告(HTML形式)直接转成PDF,方便下载和打印. 以前都是直接用rdlc报表实现的,可这次牵扯到图片,并且更为 ...
[译] ASP.NET MVC 6 attribute routing – the [controller] and [action] tokens
原文:http://www.strathweb.com/2015/01/asp-net-mvc-6-attribute-routing-controller-action-tokens/ 当在Web ...
[转]ASP.NET MVC Json()处理大数据异常解决方法 json maxjsonlength
本文转自:http://blog.csdn.net/blacksource/article/details/18797055 先对项目做个简单介绍: 整个项目采用微软的ASP.NET MVC3进行开发 ...

随机推荐

SQL中字符串拼接
1. 概述在SQL语句中经常需要进行字符串拼接,以sqlserver,oracle,mysql三种数据库为例,因为这三种数据库具有代表性. sqlserver: select '123'+'456' ...
github入门到上传本地项目【网上资源整合】
[在原文章的基础上,修改了描述的不够详细的地方,对内容进行了扩充,整合了网上的一些资料] [内容主要来自http://www.cnblogs.com/specter45/p/github.html#g ...
Android快乐贪吃蛇游戏实战项目开发教程-01项目概述与目录
一.项目简介贪吃蛇是一个很经典的游戏,也很适合用来学习.本教程将和大家一起做一个Android版的贪吃蛇游戏. 我已经将做好的案例上传到了应用宝,无病毒.无广告,大家可以放心下载下来把玩一下.应用宝 ...
bootstrap
访问Bootstrap中文网,下载bootstrap中文文档,选择用于生产环境的bootstrap. 在官网使用ctrl+f查找想要的内容. 这里记一下Visual Studio Code软件的用法: ...
用apt-file解决找不到头文件的问题
在编译C语言的开源项目的时候,经常会出现头文件找不到的问题. 解决这类问题有一个特别好用的工具apt-file 1.在ubuntu下安装 sudo apt install apt-file 2.更新索 ...
Ubuntu下利用Mono,Jexus搭建Asp.Net(MVC) Web服务器
最近在Ubuntu上搭建了Asp.Net的Web服务器,其中遇到很多问题,整理一下思路,以备后用. 搭建环境以及配套软件 Ubuntu: 11.10 Mono:3.0.6 下载地址(http://do ...
通过Mono 在 Heroku 上运行 .NET 应用
英文原文:Running .NET on Heroku 中文原文:在 Heroku 上运行 .NET 应用自从加入了Heroku之后,我就想在这个平台上运行.NET程序.现在我很高兴向大家宣布,我们 ...
Angular2学习笔记——Observable
Reactive Extensions for Javascript 诞生于几年前,随着angular2正式版的发布,它将会被更多开发者所认知.RxJs提供的核心是Observable对象,它是一个使 ...
关于apue.3e中apue.h的使用
关于apue.3e中apue.h的使用近来要学一遍APUE第三版,并于此开博做为记录. 先下载源文件: # url: http://http//www.apuebook.com/code3e.htm ...
pixi.js webgl库
分析pixi源码,刚搭建环境gulp+webpack,目前正在看... https://github.com/JsAaron/webgl-demo

【译】Asp.net mvc 使用ITextSharp PDF to HTML （解决img标签问题）

【译】Asp.net mvc 使用ITextSharp PDF to HTML （解决img标签问题）的更多相关文章

随机推荐

热门专题