org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents
异常:org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)
1、场景
项目中需要使用到读取 word 文档中的内容,使用的工具是 apache poi 来实现 word 、ppt 、excel 等文件的读取。在开发过程中,读取文件的过程中,出现了异常: org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)

2、分析
office中,ppt 文档的保存是有 ppt(office 2003-2007) 和 pptx 两种格式的。在 apche poi 中,对 不同格式的 ppt 文档是不同类进行支持的。
图示中使用的是 XMLSlideShow 类读取 ppt 格式的文档,而 XMLSlideShow 是只支持 pptx 格式的文档的读取,所以会报错。
错误示例:

3、ppt 和 pptx 文档读取详解
现在对读取两种格式的ppt的读取,做正确的示例代码详解:
读取 ppt
// 使用 HSLFSlideShow 类读取 ppt 格式文档
// --------- ppt -----------
File file = new File("E:\\search-file\\44.ppt");
FileInputStream fis = null;
HSLFSlideShow document = null;
SlideShowExtractor extractor = null;
try {
fis = new FileInputStream(file);
document = new HSLFSlideShow(fis);
extractor = new SlideShowExtractor(document);
log.info("extractor.getText:{}", extractor.getText());
} catch (Exception e) {
e.printStackTrace();
}
格式使用错误就会报错:org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
读取 pptx
// 使用 XMLSlideShow 类读取 pptx 格式的文档
// --------- pptx -----------
File file = new File("E:\\search-file\\33.pptx");
FileInputStream fis = null;
XMLSlideShow document = null;
SlideShowExtractor extractor = null;
try {
fis = new FileInputStream(file);
document = new XMLSlideShow(fis);
extractor = new SlideShowExtractor(document);
log.info("extractor.getText:{}", extractor.getText());
} catch (Exception e) {
e.printStackTrace();
}
XWPFDocument 类读取 doc 格式文档使用错误会报错:org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)

4、总结
apache poi 工具还是很强大的,功能非常多,对具体使用也可参考 apache poi 的官方文档:
https://poi.apache.org/apidocs/index.html
请注意自己使用的 apache poi 的版本,参考对应版本的 javadocs
org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents的更多相关文章
- org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML.
org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Offic ...
- java解析excel2003和excel2007:The supplied data appears to be in the office 2007+XML Polonly supports OLE2 office documents
上传excel解析存到数据库时报: org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears ...
- hadoop错误org.apache.hadoop.yarn.exceptions.YarnException Unauthorized request to start container
错误: 14/04/29 02:45:07 INFO mapreduce.Job: Job job_1398704073313_0021 failed with state FAILED due to ...
- Why Apache Spark is a Crossover Hit for Data Scientists [FWD]
Spark is a compelling multi-purpose platform for use cases that span investigative, as well as opera ...
- org.apache.kafka.common.errors.SerializationException: Error deserializing... Caused by: org.apache.kafka.common.errors.SerializationException: Size of data received by IntegerDeserializer is not 4
原因,最近开发的kafka消息接收,突然报如下错: org.apache.kafka.common.errors.SerializationException: Error deserializing ...
- java.lang.IllegalAccessError: tried to access method org.apache.poi.util.POILogger.log from class org.apache.poi.openxml4j.opc.ZipPackage
代码说简单也简单,说复杂那还真是寸步难行. 之前好好的excel导出功能,本地启动调试的时候突然就不行了,一直报上面的错. 一直在本地折腾了半天,去测试环境上看,又是好的,可以正常导出excel. 搜 ...
- spark on yarn 动态资源分配报错的解决:org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist
组件:cdh5.14.0 spark是自己编译的spark2.1.0-cdh5.14.0 第一步:确认spark-defaults.conf中添加了如下配置: spark.shuffle.servic ...
- org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService: mapreduce_shuffle do
在yarn-site.xml 配置文件中增加: <property> <name>yarn.nodemanager.aux-services</name> < ...
- Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/exceptions/YarnException
这个是Flink 1.11.1 使用yarn-session 出现的错误:原因是在Flink1.11 之后不再提供flink-shaded-hadoop-*” jars 需要在yarn-sessio ...
- 根据xlsx模板生成excel数据文件发送邮件代码
package mail; import java.io.File; import java.io.FileInputStream; import java.io.FileNotFoundExcept ...
随机推荐
- Beats:如何安装Packetbeat
- Vmware虚拟机设置主机端口映射
转载自:https://blog.csdn.net/Mrqiang9001/article/details/80820321
- a除于b
a=eval(input()) b=eval(input()) if b!=0: print("{}".format(round(a/b,2))) else: print(&quo ...
- js基础知识--BOM
之前说过,在js的 运行环境为浏览器时,js就主要有三部分组成: ECMAScript核心语法.BOM.DOM.今天就和大家详细说一下BOM的一些基础知识. BOM BOM通常被称为浏览器对象模型,主 ...
- 洛谷P1395 会议 (树的重心)
这道题考察了树的重心的性质,所有点到中心的距离之和是最小的,所以我们一遍dfs求出树的重心,在跑一次dfs统计距离之和. 1 #include<bits/stdc++.h> 2 using ...
- 220501 T1 困难的图论 (tarjan 点双)
求满足题目要求的简单环,做出图中所有的点双,用vector存储点双中的边,如果该点双满足点数=边数,就是我们想要的,求边的异或和即可:如果该点双点数小于边数,说明有不只一个环覆盖,不满足题意. 1 # ...
- 简单将Springboot项目部署到linux服务器上
1.使用springboot的jar包方式 直接使用maven工具按照步骤点击就可以直接打包 2.到target目录下找到 jar包 3.将jar包放到linux的任意文件夹下(此项目是之前的kafk ...
- JUC(2)使用condition实现精准通知唤醒
package com.orderPC; import java.util.concurrent.locks.Condition; import java.util.concurrent.locks. ...
- 二十一、Pod的存储之Secret
Pod 的存储之Secret 一.Secret 存在意义 Secret 解决了密码.token.密钥等敏感数据的配置问题,而不需要把这些敏感数据暴露到镜像或者 Pod Spec中.Secret 可以 ...
- nodered获取简单的时间
1.添加simpletime 的节点 2. 添加一个inject节点用来每1s循环获取当点的信息 3.添加一个函数节点对simpletime发来的msg进行解析 var payload=msg;var ...