org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents
异常:org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)
1、场景
项目中需要使用到读取 word 文档中的内容,使用的工具是 apache poi 来实现 word 、ppt 、excel 等文件的读取。在开发过程中,读取文件的过程中,出现了异常: org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)

2、分析
office中,ppt 文档的保存是有 ppt(office 2003-2007) 和 pptx 两种格式的。在 apche poi 中,对 不同格式的 ppt 文档是不同类进行支持的。
图示中使用的是 XMLSlideShow 类读取 ppt 格式的文档,而 XMLSlideShow 是只支持 pptx 格式的文档的读取,所以会报错。
错误示例:

3、ppt 和 pptx 文档读取详解
现在对读取两种格式的ppt的读取,做正确的示例代码详解:
读取 ppt
// 使用 HSLFSlideShow 类读取 ppt 格式文档
// --------- ppt -----------
File file = new File("E:\\search-file\\44.ppt");
FileInputStream fis = null;
HSLFSlideShow document = null;
SlideShowExtractor extractor = null;
try {
fis = new FileInputStream(file);
document = new HSLFSlideShow(fis);
extractor = new SlideShowExtractor(document);
log.info("extractor.getText:{}", extractor.getText());
} catch (Exception e) {
e.printStackTrace();
}
格式使用错误就会报错:org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
读取 pptx
// 使用 XMLSlideShow 类读取 pptx 格式的文档
// --------- pptx -----------
File file = new File("E:\\search-file\\33.pptx");
FileInputStream fis = null;
XMLSlideShow document = null;
SlideShowExtractor extractor = null;
try {
fis = new FileInputStream(file);
document = new XMLSlideShow(fis);
extractor = new SlideShowExtractor(document);
log.info("extractor.getText:{}", extractor.getText());
} catch (Exception e) {
e.printStackTrace();
}
XWPFDocument 类读取 doc 格式文档使用错误会报错:org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)

4、总结
apache poi 工具还是很强大的,功能非常多,对具体使用也可参考 apache poi 的官方文档:
https://poi.apache.org/apidocs/index.html
请注意自己使用的 apache poi 的版本,参考对应版本的 javadocs
org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents的更多相关文章
- org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML.
org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Offic ...
- java解析excel2003和excel2007:The supplied data appears to be in the office 2007+XML Polonly supports OLE2 office documents
上传excel解析存到数据库时报: org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears ...
- hadoop错误org.apache.hadoop.yarn.exceptions.YarnException Unauthorized request to start container
错误: 14/04/29 02:45:07 INFO mapreduce.Job: Job job_1398704073313_0021 failed with state FAILED due to ...
- Why Apache Spark is a Crossover Hit for Data Scientists [FWD]
Spark is a compelling multi-purpose platform for use cases that span investigative, as well as opera ...
- org.apache.kafka.common.errors.SerializationException: Error deserializing... Caused by: org.apache.kafka.common.errors.SerializationException: Size of data received by IntegerDeserializer is not 4
原因,最近开发的kafka消息接收,突然报如下错: org.apache.kafka.common.errors.SerializationException: Error deserializing ...
- java.lang.IllegalAccessError: tried to access method org.apache.poi.util.POILogger.log from class org.apache.poi.openxml4j.opc.ZipPackage
代码说简单也简单,说复杂那还真是寸步难行. 之前好好的excel导出功能,本地启动调试的时候突然就不行了,一直报上面的错. 一直在本地折腾了半天,去测试环境上看,又是好的,可以正常导出excel. 搜 ...
- spark on yarn 动态资源分配报错的解决:org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist
组件:cdh5.14.0 spark是自己编译的spark2.1.0-cdh5.14.0 第一步:确认spark-defaults.conf中添加了如下配置: spark.shuffle.servic ...
- org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService: mapreduce_shuffle do
在yarn-site.xml 配置文件中增加: <property> <name>yarn.nodemanager.aux-services</name> < ...
- Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/exceptions/YarnException
这个是Flink 1.11.1 使用yarn-session 出现的错误:原因是在Flink1.11 之后不再提供flink-shaded-hadoop-*” jars 需要在yarn-sessio ...
- 根据xlsx模板生成excel数据文件发送邮件代码
package mail; import java.io.File; import java.io.FileInputStream; import java.io.FileNotFoundExcept ...
随机推荐
- 分布式MinIO快速入门
官方文档地址:http://docs.minio.org.cn/docs/master/distributed-minio-quickstart-guide Minio服务基于命令行传入的参数自动切换 ...
- Elasticsearch方案选型必须了解的10件事!
文章转载自: https://mp.weixin.qq.com/s?__biz=MzI2NDY1MTA3OQ==&mid=2247484372&idx=1&sn=e863e46 ...
- Logstash:如何处理 Logstash pipeline 错误信息
转载自:https://elasticstack.blog.csdn.net/article/details/114290663 在我们使用 Logstash 的时候经常会出现一些错误.比如当我们使用 ...
- Leetcode刷题笔记(双指针)
1.何为双指针 双指针主要用来遍历数组,两个指针指向不同的元素,从而协同完成任务.我们也可以类比这个概念,推广到多个数组的多个指针. 若两个指针指向同一数组,遍历方向相同且不会相交,可以称之为滑动窗口 ...
- Kafka 之 Streams
Kafka 之 Streams 一.概述 1.1 Kafka Streams Kafka Streams.Apache Kafka开源项目的一个组成部分.是一个功能强大,易于使用的库.用于在Kafka ...
- 2022年最新最详细的tomcat安装教程和常见问的解决
文章目录 1.官网直接下载 1.1.jdk的版本和tomcat版本应该相对应或者兼容 1.2. 在官网找对应的tomcat版本进行下载 1.3 .根据电脑版本下载64-bit windows zip( ...
- C语言------结构体和共用体
仅供借鉴.仅供借鉴.仅供借鉴(整理了一下大一C语言每个章节的练习题.没得题目.只有程序了) 文章目录 1 .实训名称 2 .实训目的及要求 3.源代码及运行截图 4 .小结 1 .实训名称 实训8:结 ...
- 函数柯里化实现sum函数
需求 实现sum函数,使其可以传入不定长参数,以及不定次数调用 //示例 console.log(sum(1,2)(3)()) //6 console.log(sum(2,3,4,5)(1,2)(3) ...
- MySQL索引报错
今天在MySQL 5.7版本的数据库中导库InnoDB表字段长度时遇到了"ERROR 1071 (42000): Specified key was too long; max key le ...
- python中的if条件语句
# 如果...就... # 1. print('1.') if 1+1 == 2: print('1+1是等于2的') print('1+1还是等于2的') print('1+1就等于2的') # 2 ...