Java实现Word/Pdf/TXT转html

引言:

最近公司在做一个教育培训学习及在线考试的项目,本人主要从事网络课程模块,主要做课程分类,课程,课件的创建及在线学习和统计的功能,因为课件涉及到多种类型,像视频,音频,图文,外部链接及文档类型.其中就涉及到一个问题,就是文档型课件课程在网页上的展示和学习问题,因为要在线统计学习的课程,学习的人员,学习的时长,所以不能像传统做法将文档下载到本地学习,那样就不受系统控制了,所以最终的方案是,在上传文档型课件的时候,将其文件对应的转换成HTML文件,以便在网页上能够浏览学习

下边主要针对word,pdf和txt文本文件进行转换

一:Java实现将word转换为html

1:引入依赖

 <dependency>

   <groupId>fr.opensagres.xdocreport</groupId>

   <artifactId>fr.opensagres.xdocreport.document</artifactId>

   <version>1.0.5</version>

 </dependency>

 <dependency>

   <groupId>fr.opensagres.xdocreport</groupId>

   <artifactId>org.apache.poi.xwpf.converter.xhtml</artifactId>

   <version>1.0.5</version>

 </dependency>

   <dependency>

   <groupId>org.apache.poi</groupId>

   <artifactId>poi</artifactId>

   <version>3.12</version>

 </dependency>

 <dependency>

   <groupId>org.apache.poi</groupId>

   <artifactId>poi-scratchpad</artifactId>

   <version>3.12</version>

 </dependency>

2:代码demo

 package com.svse.controller;

 import javax.xml.parsers.DocumentBuilderFactory;

 import javax.xml.parsers.ParserConfigurationException;

 import javax.xml.transform.OutputKeys;

 import javax.xml.transform.Transformer;

 import javax.xml.transform.TransformerException;

 import javax.xml.transform.TransformerFactory;

 import javax.xml.transform.dom.DOMSource;

 import javax.xml.transform.stream.StreamResult;

 import org.apache.poi.hwpf.HWPFDocument;

 import org.apache.poi.hwpf.converter.PicturesManager;

 import org.apache.poi.hwpf.converter.WordToHtmlConverter;

 import org.apache.poi.hwpf.usermodel.PictureType;

 import org.apache.poi.xwpf.converter.core.BasicURIResolver;

 import org.apache.poi.xwpf.converter.core.FileImageExtractor;

 import org.apache.poi.xwpf.converter.core.FileURIResolver;

 import org.apache.poi.xwpf.converter.core.IURIResolver;

 import org.apache.poi.xwpf.converter.core.IXWPFConverter;

 import org.apache.poi.xwpf.converter.xhtml.XHTMLConverter;

 import org.apache.poi.xwpf.converter.xhtml.XHTMLOptions;

 import org.apache.poi.xwpf.usermodel.XWPFDocument;

 /**

  * word 转换成html

  */

 public class TestWordToHtml {

     public static  final String STORAGEPATH="C://works//files//";

     public static  final String IP="192.168.30.222";

     public static  final String PORT="8010";

     public static void main(String[] args) throws IOException, TransformerException, ParserConfigurationException {

         TestWordToHtml wt=new TestWordToHtml();

         //wt.Word2003ToHtml("甲骨文考证.doc");

         wt.Word2007ToHtml("甲骨文考证.docx");

     }

      /**

      * 2003版本word转换成html

      * @throws IOException

      * @throws TransformerException

      * @throws ParserConfigurationException

      */

     public void Word2003ToHtml(String fileName) throws IOException, TransformerException, ParserConfigurationException {

         final String imagepath = STORAGEPATH+"fileImage/";//解析时候如果doc文件中有图片  图片会保存在此路径

         final String strRanString=getRandomNum();

         String filepath =STORAGEPATH;

         String htmlName =fileName.substring(0, fileName.indexOf("."))+ "2003.html";

         final String file = filepath + fileName;

         InputStream input = new FileInputStream(new File(file));

         HWPFDocument wordDocument = new HWPFDocument(input);

         WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());

         //设置图片存放的位置

         wordToHtmlConverter.setPicturesManager(new PicturesManager() {

             public String savePicture(byte[] content, PictureType pictureType, String suggestedName, float widthInches, float heightInches) {

                 File imgPath = new File(imagepath);

                 if(!imgPath.exists()){//图片目录不存在则创建

                     imgPath.mkdirs();

                 }

                 File file = new File(imagepath +strRanString+suggestedName);

                 try {

                     OutputStream os = new FileOutputStream(file);

                     os.write(content);

                     os.close();

                 } catch (FileNotFoundException e) {

                     e.printStackTrace();

                 } catch (IOException e) {

                     e.printStackTrace();

                 }

                 return  "http://"+IP+":"+PORT+"//uploadFile/fileImage/"+strRanString+suggestedName;

                // return imagepath +strRanString+suggestedName;

             }

         });

         //解析word文档

         wordToHtmlConverter.processDocument(wordDocument);

         Document htmlDocument = wordToHtmlConverter.getDocument();

         File htmlFile = new File(filepath +strRanString+htmlName);

         OutputStream outStream = new FileOutputStream(htmlFile);

         DOMSource domSource = new DOMSource(htmlDocument);

         StreamResult streamResult = new StreamResult(outStream);

         TransformerFactory factory = TransformerFactory.newInstance();

         Transformer serializer = factory.newTransformer();

         serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");

         serializer.setOutputProperty(OutputKeys.INDENT, "yes");

         serializer.setOutputProperty(OutputKeys.METHOD, "html");

         serializer.transform(domSource, streamResult);

         outStream.close();

         System.out.println("生成html文件路径:"+ "http://"+IP+":"+PORT+"//uploadFile/"+strRanString+htmlName);

     }

     /**

      * 2007版本word转换成html

      * @throws IOException

      */

     public void Word2007ToHtml(String fileName) throws IOException {

        final String strRanString=getRandomNum();

         String filepath = STORAGEPATH+strRanString;

         String htmlName =fileName.substring(0, fileName.indexOf("."))+ "2007.html";

         File f = new File(STORAGEPATH+fileName);

         if (!f.exists()) {

             System.out.println("Sorry File does not Exists!");

         } else {

             if (f.getName().endsWith(".docx") || f.getName().endsWith(".DOCX")) {

                 try {

                     // 1) 加载word文档生成 XWPFDocument对象

                     InputStream in = new FileInputStream(f);

                     XWPFDocument document = new XWPFDocument(in);  

                     // 2) 解析 XHTML配置 (这里设置IURIResolver来设置图片存放的目录)

                     File imageFolderFile = new File(filepath);

                     XHTMLOptions options = XHTMLOptions.create().URIResolver(new FileURIResolver(imageFolderFile));

                     options.setExtractor(new FileImageExtractor(imageFolderFile));

                     options.URIResolver(new IURIResolver() {

                         public String resolve(String uri) {

                             //http://192.168.30.222:8010//uploadFile/....

                             return "http://"+IP+":"+PORT+"//uploadFile/"+strRanString +"/"+ uri;

                         }

                     });

                     options.setIgnoreStylesIfUnused(false);

                     options.setFragment(true);  

                     // 3) 将 XWPFDocument转换成XHTML

                     OutputStream out = new FileOutputStream(new File(filepath + htmlName));

                     IXWPFConverter<XHTMLOptions> converter = XHTMLConverter.getInstance();

                     converter.convert(document,out, options);

                     //XHTMLConverter.getInstance().convert(document, out, options);

                     System.out.println("html路径:"+"http://"+IP+":"+PORT+"//uploadFile/"+strRanString+htmlName);

                 } catch (Exception e) {

                     e.printStackTrace();

                 }

             } else {

                 System.out.println("Enter only MS Office 2007+ files");

             }

         }

     }  

      /**

      *功能说明:生成时间戳

      *创建人:zsq

      *创建时间:2019年12月7日 下午2:37:09

      *

      */

      public static String getRandomNum(){

          Date dt = new Date();

          SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMddHHmmss");

          String str=sdf.format(dt);

          return str;

      }

    }

二:Java实现将Pdf转换为html

1: 引入依赖

 <dependency>

             <groupId>net.sf.cssbox</groupId>

             <artifactId>pdf2dom</artifactId>

             <version>1.7</version>

         </dependency>

         <dependency>

             <groupId>org.apache.pdfbox</groupId>

             <artifactId>pdfbox</artifactId>

             <version>2.0.12</version>

         </dependency>

         <dependency>

             <groupId>org.apache.pdfbox</groupId>

             <artifactId>pdfbox-tools</artifactId>

             <version>2.0.12</version>

  </dependency>

2:代码Demo

 public class PdfToHtml {

   /*

     pdf转换html

      */

     public void pdfToHtmlTest(String inPdfPath,String outputHtmlPath)  {

        // String outputPath = "C:\\works\\files\\ZSQ保密知识测试题库.html";

            //try() 写在()里面会自动关闭流

         try{

             BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(new File(outputHtmlPath)),"utf-8"));

             //加载PDF文档

             //PDDocument document = PDDocument.load(bytes);

             PDDocument document = PDDocument.load(new File(inPdfPath));

             PDFDomTree pdfDomTree = new PDFDomTree();

             pdfDomTree.writeText(document,out);

         } catch (Exception e) {

             e.printStackTrace();

         }

     }

     public static void main(String[] args) throws IOException {

         PdfToHtml ph=new PdfToHtml();

         String pdfPath="C:\\works\\files\\武研中心行政考勤制度.pdf";

         String outputPath="C:\\works\\files\\武研中心行政考勤制度.html";

         ph.pdfToHtmlTest(pdfPath,outputPath);

   }

 }

三:Java实现将TXT转换为html

  /*

      * txt文档转html

        filePath:txt原文件路径

        htmlPosition:转化后生成的html路径

     */

     public static void txtToHtml(String filePath, String htmlPosition) {

         try {

             //String encoding = "GBK";

             File file = new File(filePath);

             if (file.isFile() && file.exists()) { // 判断文件是否存在

                 InputStreamReader read = new InputStreamReader(new FileInputStream(file), "GBK");

                 // 考虑到编码格式

                 BufferedReader bufferedReader = new BufferedReader(read);

                 // 写文件

                 FileOutputStream fos = new FileOutputStream(new File(htmlPosition));

                 OutputStreamWriter osw = new OutputStreamWriter(fos, "GBK");

                 BufferedWriter bw = new BufferedWriter(osw);

                 String lineTxt = null;

                 while ((lineTxt = bufferedReader.readLine()) != null) {

                     bw.write("&nbsp&nbsp&nbsp"+lineTxt + "</br>");

                 }

                 bw.close();

                 osw.close();

                 fos.close();

                 read.close();

             } else {

                 System.out.println("找不到指定的文件");

             }

         } catch (Exception e) {

             System.out.println("读取文件内容出错");

             e.printStackTrace();

         }

     }

Java实现Word/Pdf/TXT转html的更多相关文章

java 抽取 word,pdf 的四种武器
转自:https://www.ibm.com/developerworks/cn/java/l-java-tips/ 感谢作者发布的文章用 jacob 其实 jacob 是一个 bridag ...
搜索引擎Solr6.2.1 索引富文本(word/pdf/txt/html)
一:首先建立Core 在core下面新建lib文件夹,存放相关的jar包,如图所示: lib文件夹打开所示,这些类库在solr6.2.1解压之后都能找到: 修改solrconfig.xml,把刚刚建的 ...
word和.txt文件转html 及pdf文件，使用poi jsoup itext心得
word和.txt文件转html 及pdf文件, 使用poi jsoup itext心得本人第一次写博客,有上面不足的或者需要改正的希望大家指出来,一起学习交流讨论.由于在项目中遇到了这一个问题,在 ...
java操作word，excel，pdf
在平常应用程序中,对office和pdf文档进行读取数据是比较常见的功能,尤其在很多web应用程序中.所以今天我们就简单来看一下java对word.excel.pdf文件的读取.本篇博客只是讲解简单应 ...
iOS 应用中加载文档pdf／word／txt
一.加载PDF文档:应用内打开文档.手机中其他应用打开文档 Demo 首先拖入一个文档pdf.word.txt,打开不同的文档知识文件名字.类型修改即可 #import "ReadView ...
java操作office和pdf文件java读取word，excel和pdf文档内容
在平常应用程序中,对office和pdf文档进行读取数据是比较常见的功能,尤其在很多web应用程序中.所以今天我们就简单来看一下Java对word.excel.pdf文件的读取.本篇博客只是讲解简单应 ...
c#上传文件并将word pdf转化成txt存储并将内容写入数据库
c#上传文件并将word pdf转化成txt存储并将内容写入数据库 using System; using System.Data; using System.Configuration; using ...
利用aspose-words 实现 java中word转pdf文件
利用aspose-words 实现 java中word转pdf文件首先下载aspose-words-15.8.0-jdk16.jar包引入jar包,编写Java代码 package test; ...
java 实现word 转 pdf
java 实现word 转 pdf 不知道网上为啥道友们写的这么复杂 ,自己看到过一篇还不错的 , 自己动手改了改 ,测试一下可以用 , 希望大家可以参考一下 , 对大家有帮助 1.引入jar ...

随机推荐

摇一摇—微信7.0.8版本audio无法自动播放问题
近日有一个项目出现audio无法自动播放,查看原因才发现是微信版本更新为7.0.8版本,需要有交互行为,第一次播放需要用户手动点击一下,无法使用DOM中的play()进行直接播放操作,那怎么办呢? 通 ...
c# 调用c++类库控制usb继电器
网上找不到调用此类库的文章,简单写一下,以备后用. 下面是封装后的调用c++类库的类 public class UsbRelayDeviceHelper { /// <summary> / ...
docker镜像ubuntu封装jdk1.8.0【dockerfile】
github地址:https://github.com/laileman/Docker/Dockerfile/ubuntu-jdk1.8.0_172 1-目录结构 2- dockerfile内容 3- ...
maven配置文件pom.xml小记
1.pom.xml主要描述了项目:包括配置文件:开发者需要遵循的规则,缺陷管理系统,组织和licenses,项目的url,项目的依赖性,以及其他所有的项目相关因素 2.基础设置: <modelV ...
51Nod 1182 完美字符串（贪心）
约翰认为字符串的完美度等于它里面所有字母的完美度之和.每个字母的完美度可以由你来分配,不同字母的完美度不同,分别对应一个1-26之间的整数. 约翰不在乎字母大小写.(也就是说字母F和f)的完美度相同. ...
Xmanager6
Xmanager6企业版 6.0096 含产品秘钥: https://www.newasp.net/soft/467373.html
[CF235A] LCM Challenge - 贪心
找到3个不超过n的正整数(可以相同),使得它们的lcm(最小公倍数)最大. Solution 可以做得很优雅吧,但我喜欢(只会)暴力一点根据质数密度分布性质,最后所取的这三个数一定不会比 \(n\) ...
Selenium3+python自动化006+自动化测试概述
自动化测试概述 1.自动化分类: (1)单元测试自动化: 单元测试(Unit):模拟各种异常场景,外部依赖较少,且可以做测试单元到最小的一种测试方法. Java单元测试框架Junit.TestNG; ...
Centos 修改yum源为aliyun
修改服务器源,避免长途跋涉到国外: 位置: vim /etc/yum.repos.d/CentOS-Base.repo aliyun地址: 设置aliyun的yum源 wget -O /etc/yu ...
sql简单练习语句
排序是每个软件工程师和开发人员都需要掌握的技能.不仅需要通过编程面试,还要对程序本身有一个全面的理解.不同的排序算法很好地展示了算法设计上如何强烈的影响程序的复杂度.运行速度和效率. 排序有很多种实现 ...

Java实现Word/Pdf/TXT转html

Java实现Word/Pdf/TXT转html的更多相关文章

随机推荐

热门专题