异常:org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)

1、场景

项目中需要使用到读取 word 文档中的内容,使用的工具是 apache poi 来实现 word 、ppt 、excel 等文件的读取。在开发过程中,读取文件的过程中,出现了异常: org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)


2、分析

office中,ppt 文档的保存是有 ppt(office 2003-2007) 和 pptx 两种格式的。在 apche poi 中,对 不同格式的 ppt 文档是不同类进行支持的。

图示中使用的是 XMLSlideShow 类读取 ppt 格式的文档,而 XMLSlideShow 是只支持 pptx 格式的文档的读取,所以会报错。

错误示例:


3、ppt 和 pptx 文档读取详解

现在对读取两种格式的ppt的读取,做正确的示例代码详解:

读取 ppt

// 使用 HSLFSlideShow 类读取 ppt 格式文档

// --------- ppt -----------
File file = new File("E:\\search-file\\44.ppt");
FileInputStream fis = null;
HSLFSlideShow document = null;
SlideShowExtractor extractor = null;
try {
fis = new FileInputStream(file);
document = new HSLFSlideShow(fis);
extractor = new SlideShowExtractor(document);
log.info("extractor.getText:{}", extractor.getText());
} catch (Exception e) {
e.printStackTrace();
}

格式使用错误就会报错:org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)

读取 pptx

// 使用 XMLSlideShow 类读取 pptx 格式的文档

// --------- pptx -----------
File file = new File("E:\\search-file\\33.pptx");
FileInputStream fis = null;
XMLSlideShow document = null;
SlideShowExtractor extractor = null;
try {
fis = new FileInputStream(file);
document = new XMLSlideShow(fis);
extractor = new SlideShowExtractor(document);
log.info("extractor.getText:{}", extractor.getText());
} catch (Exception e) {
e.printStackTrace();
}

XWPFDocument 类读取 doc 格式文档使用错误会报错:org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)

4、总结

apache poi 工具还是很强大的,功能非常多,对具体使用也可参考 apache poi 的官方文档:

https://poi.apache.org/apidocs/index.html

请注意自己使用的 apache poi 的版本,参考对应版本的 javadocs

org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents的更多相关文章

  1. org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML.

    org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Offic ...

  2. java解析excel2003和excel2007:The supplied data appears to be in the office 2007+XML Polonly supports OLE2 office documents

    上传excel解析存到数据库时报: org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears ...

  3. hadoop错误org.apache.hadoop.yarn.exceptions.YarnException Unauthorized request to start container

    错误: 14/04/29 02:45:07 INFO mapreduce.Job: Job job_1398704073313_0021 failed with state FAILED due to ...

  4. Why Apache Spark is a Crossover Hit for Data Scientists [FWD]

    Spark is a compelling multi-purpose platform for use cases that span investigative, as well as opera ...

  5. org.apache.kafka.common.errors.SerializationException: Error deserializing... Caused by: org.apache.kafka.common.errors.SerializationException: Size of data received by IntegerDeserializer is not 4

    原因,最近开发的kafka消息接收,突然报如下错: org.apache.kafka.common.errors.SerializationException: Error deserializing ...

  6. java.lang.IllegalAccessError: tried to access method org.apache.poi.util.POILogger.log from class org.apache.poi.openxml4j.opc.ZipPackage

    代码说简单也简单,说复杂那还真是寸步难行. 之前好好的excel导出功能,本地启动调试的时候突然就不行了,一直报上面的错. 一直在本地折腾了半天,去测试环境上看,又是好的,可以正常导出excel. 搜 ...

  7. spark on yarn 动态资源分配报错的解决:org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist

    组件:cdh5.14.0 spark是自己编译的spark2.1.0-cdh5.14.0 第一步:确认spark-defaults.conf中添加了如下配置: spark.shuffle.servic ...

  8. org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService: mapreduce_shuffle do

    在yarn-site.xml 配置文件中增加: <property> <name>yarn.nodemanager.aux-services</name> < ...

  9. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/exceptions/YarnException

    这个是Flink 1.11.1  使用yarn-session 出现的错误:原因是在Flink1.11 之后不再提供flink-shaded-hadoop-*” jars 需要在yarn-sessio ...

  10. 根据xlsx模板生成excel数据文件发送邮件代码

    package mail; import java.io.File; import java.io.FileInputStream; import java.io.FileNotFoundExcept ...

随机推荐

  1. 分布式MinIO快速入门

    官方文档地址:http://docs.minio.org.cn/docs/master/distributed-minio-quickstart-guide Minio服务基于命令行传入的参数自动切换 ...

  2. Elasticsearch方案选型必须了解的10件事!

    文章转载自: https://mp.weixin.qq.com/s?__biz=MzI2NDY1MTA3OQ==&mid=2247484372&idx=1&sn=e863e46 ...

  3. Logstash:如何处理 Logstash pipeline 错误信息

    转载自:https://elasticstack.blog.csdn.net/article/details/114290663 在我们使用 Logstash 的时候经常会出现一些错误.比如当我们使用 ...

  4. Leetcode刷题笔记(双指针)

    1.何为双指针 双指针主要用来遍历数组,两个指针指向不同的元素,从而协同完成任务.我们也可以类比这个概念,推广到多个数组的多个指针. 若两个指针指向同一数组,遍历方向相同且不会相交,可以称之为滑动窗口 ...

  5. Kafka 之 Streams

    Kafka 之 Streams 一.概述 1.1 Kafka Streams Kafka Streams.Apache Kafka开源项目的一个组成部分.是一个功能强大,易于使用的库.用于在Kafka ...

  6. 2022年最新最详细的tomcat安装教程和常见问的解决

    文章目录 1.官网直接下载 1.1.jdk的版本和tomcat版本应该相对应或者兼容 1.2. 在官网找对应的tomcat版本进行下载 1.3 .根据电脑版本下载64-bit windows zip( ...

  7. C语言------结构体和共用体

    仅供借鉴.仅供借鉴.仅供借鉴(整理了一下大一C语言每个章节的练习题.没得题目.只有程序了) 文章目录 1 .实训名称 2 .实训目的及要求 3.源代码及运行截图 4 .小结 1 .实训名称 实训8:结 ...

  8. 函数柯里化实现sum函数

    需求 实现sum函数,使其可以传入不定长参数,以及不定次数调用 //示例 console.log(sum(1,2)(3)()) //6 console.log(sum(2,3,4,5)(1,2)(3) ...

  9. MySQL索引报错

    今天在MySQL 5.7版本的数据库中导库InnoDB表字段长度时遇到了"ERROR 1071 (42000): Specified key was too long; max key le ...

  10. python中的if条件语句

    # 如果...就... # 1. print('1.') if 1+1 == 2: print('1+1是等于2的') print('1+1还是等于2的') print('1+1就等于2的') # 2 ...