word文档操作-doc转docx、合并多个docx

前言：

临时来了一条新的需求：多个doc文档进行合并。

在网上苦苦搜罗了很久才找到可用的文件（原文出处到不到了所以暂时不能加链接地址了），现在记录下留给有需要的人。

一：doc转docx

所需jar包：链接: https://pan.baidu.com/s/1WQ33HDsON8lpFQKgLu8pCQ 提取码: n1xt

具体代码

public class Doc2Docx {

    public static void main(String[] args) {

        String docFile = "D:/files/dfa3cbb9-a0a0-497a-aa9f-d26cbee9a25b_linux.doc";

        String docxFile = "D:/print/linux.docx";

        doc2docx(docFile, docxFile);

    }

    /**

     * doc转docx

     * @param docFile     源文件

     * @param docxFile    目标文件

     */

    public static void doc2docx(String docFile, String docxFile) {

        Document doc;

        try {

            String tempFile = docxFile.substring(0, docxFile.lastIndexOf(".")) + "_temp"

                    + docxFile.substring(docxFile.lastIndexOf("."), docxFile.length());

            doc = new Document(docFile);

            doc.save(tempFile);

            Map<String, String> map = new HashMap<String, String>();

            map.put("Evaluation Only. Created with Aspose.Words. Copyright 2003-2018 Aspose Pty Ltd.", "");

            DocxReplace.replaceAndGenerateWord(tempFile, docxFile, map);

//            forceDelete(new File(tempFile));

        } catch (Exception e1) {

            log.error("doc 2 docx exception:{}",e1.getMessage());

        }

    }

}

public class DocxReplace {

    // 返回Docx中需要替换的特殊字符，没有重复项

    // 推荐传入正则表达式参数"\\$\\{[^{}]+\\}"

    public ArrayList<String> getReplaceElementsInWord(String filePath,

            String regex) {

        String[] p = filePath.split("\\.");

        if (p.length > 0) {// 判断文件有无扩展名

            // 比较文件扩展名

            if (p[p.length - 1].equalsIgnoreCase("doc")) {

                ArrayList<String> al = new ArrayList<>();

                File file = new File(filePath);

                HWPFDocument document = null;

                try {

                    InputStream is = new FileInputStream(file);

                    document = new HWPFDocument(is);

                } catch (FileNotFoundException e) {

                    e.printStackTrace();

                } catch (IOException e) {

                    e.printStackTrace();

                }

                Range range = document.getRange();

                String rangeText = range.text();

                CharSequence cs = rangeText.subSequence(0, rangeText.length());

                Pattern pattern = Pattern.compile(regex);

                Matcher matcher = pattern.matcher(cs);

                int startPosition = 0;

                while (matcher.find(startPosition)) {

                    if (!al.contains(matcher.group())) {

                        al.add(matcher.group());

                    }

                    startPosition = matcher.end();

                }

                return al;

            } else if (p[p.length - 1].equalsIgnoreCase("docx")) {

                ArrayList<String> al = new ArrayList<>();

                XWPFDocument document = null;

                try {

                    document = new XWPFDocument(

                            POIXMLDocument.openPackage(filePath));

                } catch (IOException e) {

                    e.printStackTrace();

                }

                // 遍历段落

                Iterator<XWPFParagraph> itPara = document

                        .getParagraphsIterator();

                while (itPara.hasNext()) {

                    XWPFParagraph paragraph = (XWPFParagraph) itPara.next();

                    String paragraphString = paragraph.getText();

                    CharSequence cs = paragraphString.subSequence(0,

                            paragraphString.length());

                    Pattern pattern = Pattern.compile(regex);

                    Matcher matcher = pattern.matcher(cs);

                    int startPosition = 0;

                    while (matcher.find(startPosition)) {

                        if (!al.contains(matcher.group())) {

                            al.add(matcher.group());

                        }

                        startPosition = matcher.end();

                    }

                }

                // 遍历表

                Iterator<XWPFTable> itTable = document.getTablesIterator();

                while (itTable.hasNext()) {

                    XWPFTable table = (XWPFTable) itTable.next();

                    int rcount = table.getNumberOfRows();

                    for (int i = 0; i < rcount; i++) {

                        XWPFTableRow row = table.getRow(i);

                        List<XWPFTableCell> cells = row.getTableCells();

                        for (XWPFTableCell cell : cells) {

                            String cellText = "";

                            cellText = cell.getText();

                            CharSequence cs = cellText.subSequence(0,

                                    cellText.length());

                            Pattern pattern = Pattern.compile(regex);

                            Matcher matcher = pattern.matcher(cs);

                            int startPosition = 0;

                            while (matcher.find(startPosition)) {

                                if (!al.contains(matcher.group())) {

                                    al.add(matcher.group());

                                }

                                startPosition = matcher.end();

                            }

                        }

                    }

                }

                return al;

            } else {

                return null;

            }

        } else {

            return null;

        }

    }  

    // 替换word中需要替换的特殊字符

    public static boolean replaceAndGenerateWord(String srcPath, String destPath, Map<String, String> map) {

        String[] sp = srcPath.split("\\.");

        String[] dp = destPath.split("\\.");

        if ((sp.length > 0) && (dp.length > 0)) {// 判断文件有无扩展名

            // 比较文件扩展名

            if (sp[sp.length - 1].equalsIgnoreCase("docx")) {

                try {

                    XWPFDocument document = new XWPFDocument(POIXMLDocument.openPackage(srcPath));

                    // 替换段落中的指定文字

                    Iterator<XWPFParagraph> itPara = document.getParagraphsIterator();

                    while (itPara.hasNext()) {

                        XWPFParagraph paragraph = (XWPFParagraph) itPara.next();

                        List<XWPFRun> runs = paragraph.getRuns();

                        for (int i = 0; i < runs.size(); i++) {

                            String oneparaString = runs.get(i).getText(runs.get(i).getTextPosition());

                            for (Entry<String, String> entry : map.entrySet()) {

                                if(oneparaString.indexOf(entry.getKey())!=-1){

                                    oneparaString = oneparaString.replace(entry.getKey(), entry.getValue());

                                    runs.get(i).setText(oneparaString, 0);

                                }

                            }

                        }

                    }

                    // 替换表格中的指定文字

                    Iterator<XWPFTable> itTable = document.getTablesIterator();

                    while (itTable.hasNext()) {

                        XWPFTable table = (XWPFTable) itTable.next();

                        int rcount = table.getNumberOfRows();

                        for (int i = 0; i < rcount; i++) {

                            XWPFTableRow row = table.getRow(i);

                            List<XWPFTableCell> cells = row.getTableCells();

                            for (XWPFTableCell cell : cells) {

                                String cellTextString = cell.getText();

                                for (Entry<String, String> e : map.entrySet()) {

                                    if (cellTextString.contains(e.getKey())) {

                                         cellTextString = cellTextString.replace(e.getKey(),e.getValue());

                                         cell.removeParagraph(0);

                                         cell.setText(cellTextString);

                                         cell.setColor(cell.getColor());

                                    }

                                }

                            }

                        }

                    }

                    FileOutputStream outStream = null;

                    outStream = new FileOutputStream(destPath);

                    document.write(outStream);

                    outStream.close();

                    return true;

                } catch (Exception e) {

                    e.printStackTrace();

                    return false;

                }

            } else

            // doc只能生成doc，如果生成docx会出错

            if ((sp[sp.length - 1].equalsIgnoreCase("doc"))

                    && (dp[dp.length - 1].equalsIgnoreCase("doc"))) {

                HWPFDocument document = null;

                try {

                    document = new HWPFDocument(new FileInputStream(srcPath));

                    Range range = document.getRange();

                    for (Entry<String, String> entry : map.entrySet()) {

                        range.replaceText(entry.getKey(), entry.getValue());

                    }

                    FileOutputStream outStream = null;

                    outStream = new FileOutputStream(destPath);

                    document.write(outStream);

                    outStream.close();

                    return true;

                } catch (FileNotFoundException e) {

                    e.printStackTrace();

                    return false;

                } catch (IOException e) {

                    e.printStackTrace();

                    return false;

                }

            } else {

                return false;

            }

        } else {

            return false;

        }

    }  

}

二：合并多个docx

public class WordMergeUtil {

    public static void main (String[] args) throws Exception {

        File newFile = new File("D:\\wdpj\\t.docx");

        List<File> srcfile = new ArrayList<>();

//        File file1 = new File("D:\\wdpj\\函.docx");

//        File file2 = new File("D:\\wdpj\\1.docx");

        File file1 = new File("D:\\wdpj\\2.docx");

        File file2 = new File("D:\\wdpj\\3.docx");

        File file3 = new File("D:\\wdpj\\函.docx");

//        srcfile.add(file2);

//        srcfile.add(file1);

        srcfile.add(file1);

        srcfile.add(file2);

        srcfile.add(file3);

        try {

            OutputStream dest = new FileOutputStream(newFile);

            ArrayList<XWPFDocument> documentList = new ArrayList<>();

            XWPFDocument doc = null;

            for (int i = 0; i < srcfile.size(); i++) {

                FileInputStream in = new FileInputStream(srcfile.get(i).getPath());

                OPCPackage open = OPCPackage.open(in);

                XWPFDocument document = new XWPFDocument(open);

                documentList.add(document);

            }

            for (int i = 0; i < documentList.size(); i++) {

                doc = documentList.get(0);

                if(i != 0){

                    documentList.get(i).createParagraph().setPageBreak(true);

                    appendBody(doc,documentList.get(i));

                }

            }

//            doc.createParagraph().setPageBreak(true);

            doc.write(dest);

        } catch (Exception e) {

            e.printStackTrace();

        }

    }

    public static void appendBody(XWPFDocument src, XWPFDocument append) throws Exception {

        XWPFParagraph p = src.createParagraph();

        //设置分页符

        p.setPageBreak(true);

        CTBody src1Body = src.getDocument().getBody();

        CTBody src2Body = append.getDocument().getBody();

        List<XWPFPictureData> allPictures = append.getAllPictures();

        // 记录图片合并前及合并后的ID

        Map<String,String> map = new HashMap();

        for (XWPFPictureData picture : allPictures) {

            String before = append.getRelationId(picture);

            //将原文档中的图片加入到目标文档中

            String after = src.addPictureData(picture.getData(), Document.PICTURE_TYPE_PNG);

            map.put(before, after);

        }

        appendBody(src1Body, src2Body,map);

    }

    private static void appendBody(CTBody src, CTBody append,Map<String,String> map) throws Exception {

        XmlOptions optionsOuter = new XmlOptions();

        optionsOuter.setSaveOuter();

        String appendString = append.xmlText(optionsOuter);

        String srcString = src.xmlText();

        String prefix = srcString.substring(0,srcString.indexOf(">")+1);

        String mainPart = srcString.substring(srcString.indexOf(">")+1,srcString.lastIndexOf("<"));

        String sufix = srcString.substring( srcString.lastIndexOf("<") );

        String addPart = appendString.substring(appendString.indexOf(">") + 1, appendString.lastIndexOf("<"));

        if (map != null && !map.isEmpty()) {

            //对xml字符串中图片ID进行替换

            for (Map.Entry<String, String> set : map.entrySet()) {

                addPart = addPart.replace(set.getKey(), set.getValue());

            }

        }

        //将两个文档的xml内容进行拼接

        CTBody makeBody = CTBody.Factory.parse(prefix+mainPart+addPart+sufix);

        src.set(makeBody);

    }

word文档操作-doc转docx、合并多个docx的更多相关文章

Word文档操作知识
Word文档操作知识 #持续更新本次更新时间:2019-03-06 14:34 一.换行时字体空间过大问题情景:当我们编写中文的文档时,中间插入了西方的字体或符号,在以它为行尾换行时: 会出现字体 ...
word文档操作
1.如何把word文档修改的地方标记出来 : https://zhidao.baidu.com/question/73648149.html 2.word 的几种视图:https://zhid ...
poi导出word文档，doc和docx
maven <dependency> <gro ...
JAVA Asponse.Word Office 操作神器，借助 word 模板生成 word 文档，并转化为 pdf，png 等多种格式的文件
一,由于该 jar 包不是免费的, maven 仓库一般不会有,需要我们去官网下载并安装到本地 maven 仓库 1,用地址 https://www-evget-com/product/564 ...
利用POI操作不同版本号word文档中的图片以及创建word文档
我们都知道要想利用java对office操作最经常使用的技术就应该是POI了,在这里本人就不多说到底POI是什么和怎么用了. 先说本人遇到的问题,不同于利用POI去向word文档以及excel文档去写 ...
JAVA Freemarker + Word 模板生成 Word 文档（普通的变量替换，数据的循环，表格数据的循环，以及图片的东替换）
1,最近有个需求,动态生成 Word 文当并供前端下载,网上找了一下,发现基本都是用 word 生成 xml 然后用模板替换变量的方式 1.1,这种方式虽然可行,但是生成的 xml 是在是太乱了,整理 ...
word 文档导出（freemaker+jacob）--java开发
工作中终于遇到了需要导出word文旦的需求了.由于以前没有操作过,所以就先百度下了,基本上是:博客园,简书,CDSN,这几大机构的相关帖子比较多,然后花了2周时间才初步弄懂. 学习顺序: 第一阶 ...
C# 设置Word文档保护（加密、解密、权限设置）
对于一些重要的word文档,出于防止资料被他人查看,或者防止文档被修改的目的,我们在选择文档保护时可以选择文档打开添加密码或者设置文档操作权限等,在下面的文章中将介绍如何使用类库Free Spire. ...
C#/VB.NET 给Word文档添加/撤销书签
在现代办公环境中,阅读或者编辑较长篇幅的Word文档时,想要在文档中某一处或者几处留下标记,方便日后查找.修改时,需要在相对应的文档位置插入书签.那对于开发者而言,在C#或者VB.NET语言环境中,如 ...

随机推荐

English: Class logogram
IT # this is a IT type ISP ANOTHER # following is another logogram LCD PDA
English:Day-to-day 1015
device session stroll pants & trousers gist deep depth diameter D radius R merge ..
Android utils 之日志工具类
工具类在开发的过程中,我们时常会对代码执行特定的处理,而这部分处理在代码中可能多次用到,为了代码的统一性.规范性等,通过建工具类的方式统一处理.接下来我会罗列各种工具类. 日志工具类在utils文 ...
Java学习笔记 -- 头代码
每次写Java程序都会忘记这个main代码怎么写,特意把他写下来,之后忘了还可以温故而知新. 程序猿们请千万不要鄙视我o(╯□╰)o public static void main(String[] ...
java8的捕获多个异常的一个写法
这是按intellij idea的提示知道的, 可以写成 catch(xxxException | yyyException | zzzException e){ } 这样的形式,对几个不同的异常使用 ...
Springcloud 微服务高并发（实战1）：第1版秒杀
疯狂创客圈 Java 高并发[ 亿级流量聊天室实战]实战系列之15 [博客园总入口 ] 前言前言疯狂创客圈(笔者尼恩创建的高并发研习社群)Springcloud 高并发系列文章,将为大家介绍三个版 ...
Oracle数据库的sql语句性能优化
在应用系统开发初期,由于开发数据库数据比较少,对于查询sql语句,复杂试图的编写等体会不出sql语句各种写法的性能优劣,但是如果将应用系统提交实际应用后,随着数据库中数据的增加,系统的响应速度就成为目 ...
如何使用1行代码让你的C++程序控制台输出彩色log信息
本文首发于个人博客https://kezunlin.me/post/a201e11b/,欢迎阅读最新内容! colorwheel for colored print and trace for cpp ...
在线程中显示一个窗口(多个UI线程)
多数耗时操作可以异步执行,推荐async/await. 但和UI相关的部分仅能在UI线程执行,这时UI线程的耗时操作,导致界面卡死,不够友好. 我们可以创建一个单独的UI线程显示一个正在加载的窗口,可 ...
立 Flag
行动目标立Flag时间计划开始时间开始时间行动寄语通关目标打卡1 打卡2 打卡3 打卡4 打卡5 打卡6 C# 7.0 核心技术指南 2019-11-1 1号 1号打好C#基础看完.实 ...

word文档操作-doc转docx、合并多个docx

前言：

一：doc转docx

二：合并多个docx

word文档操作-doc转docx、合并多个docx的更多相关文章

随机推荐

热门专题