异常:org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)

1、场景

项目中需要使用到读取 word 文档中的内容,使用的工具是 apache poi 来实现 word 、ppt 、excel 等文件的读取。在开发过程中,读取文件的过程中,出现了异常: org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)


2、分析

office中,ppt 文档的保存是有 ppt(office 2003-2007) 和 pptx 两种格式的。在 apche poi 中,对 不同格式的 ppt 文档是不同类进行支持的。

图示中使用的是 XMLSlideShow 类读取 ppt 格式的文档,而 XMLSlideShow 是只支持 pptx 格式的文档的读取,所以会报错。

错误示例:


3、ppt 和 pptx 文档读取详解

现在对读取两种格式的ppt的读取,做正确的示例代码详解:

读取 ppt

// 使用 HSLFSlideShow 类读取 ppt 格式文档

// --------- ppt -----------
File file = new File("E:\\search-file\\44.ppt");
FileInputStream fis = null;
HSLFSlideShow document = null;
SlideShowExtractor extractor = null;
try {
fis = new FileInputStream(file);
document = new HSLFSlideShow(fis);
extractor = new SlideShowExtractor(document);
log.info("extractor.getText:{}", extractor.getText());
} catch (Exception e) {
e.printStackTrace();
}

格式使用错误就会报错:org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)

读取 pptx

// 使用 XMLSlideShow 类读取 pptx 格式的文档

// --------- pptx -----------
File file = new File("E:\\search-file\\33.pptx");
FileInputStream fis = null;
XMLSlideShow document = null;
SlideShowExtractor extractor = null;
try {
fis = new FileInputStream(file);
document = new XMLSlideShow(fis);
extractor = new SlideShowExtractor(document);
log.info("extractor.getText:{}", extractor.getText());
} catch (Exception e) {
e.printStackTrace();
}

XWPFDocument 类读取 doc 格式文档使用错误会报错:org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)

4、总结

apache poi 工具还是很强大的,功能非常多,对具体使用也可参考 apache poi 的官方文档:

https://poi.apache.org/apidocs/index.html

请注意自己使用的 apache poi 的版本,参考对应版本的 javadocs

org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents的更多相关文章

  1. org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML.

    org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Offic ...

  2. java解析excel2003和excel2007:The supplied data appears to be in the office 2007+XML Polonly supports OLE2 office documents

    上传excel解析存到数据库时报: org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears ...

  3. hadoop错误org.apache.hadoop.yarn.exceptions.YarnException Unauthorized request to start container

    错误: 14/04/29 02:45:07 INFO mapreduce.Job: Job job_1398704073313_0021 failed with state FAILED due to ...

  4. Why Apache Spark is a Crossover Hit for Data Scientists [FWD]

    Spark is a compelling multi-purpose platform for use cases that span investigative, as well as opera ...

  5. org.apache.kafka.common.errors.SerializationException: Error deserializing... Caused by: org.apache.kafka.common.errors.SerializationException: Size of data received by IntegerDeserializer is not 4

    原因,最近开发的kafka消息接收,突然报如下错: org.apache.kafka.common.errors.SerializationException: Error deserializing ...

  6. java.lang.IllegalAccessError: tried to access method org.apache.poi.util.POILogger.log from class org.apache.poi.openxml4j.opc.ZipPackage

    代码说简单也简单,说复杂那还真是寸步难行. 之前好好的excel导出功能,本地启动调试的时候突然就不行了,一直报上面的错. 一直在本地折腾了半天,去测试环境上看,又是好的,可以正常导出excel. 搜 ...

  7. spark on yarn 动态资源分配报错的解决:org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist

    组件:cdh5.14.0 spark是自己编译的spark2.1.0-cdh5.14.0 第一步:确认spark-defaults.conf中添加了如下配置: spark.shuffle.servic ...

  8. org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService: mapreduce_shuffle do

    在yarn-site.xml 配置文件中增加: <property> <name>yarn.nodemanager.aux-services</name> < ...

  9. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/exceptions/YarnException

    这个是Flink 1.11.1  使用yarn-session 出现的错误:原因是在Flink1.11 之后不再提供flink-shaded-hadoop-*” jars 需要在yarn-sessio ...

  10. 根据xlsx模板生成excel数据文件发送邮件代码

    package mail; import java.io.File; import java.io.FileInputStream; import java.io.FileNotFoundExcept ...

随机推荐

  1. Elasticsearch方案选型必须了解的10件事!

    文章转载自: https://mp.weixin.qq.com/s?__biz=MzI2NDY1MTA3OQ==&mid=2247484372&idx=1&sn=e863e46 ...

  2. Qemu/Limbo/KVM镜像:Ubuntu Mate 22.04+Wine 7.8

    链接: https://pan.baidu.com/s/1cf2c_ylu7-SUaYl8ddztog 提取码: b9mi 密码 空格 手机推荐使用termux里面的Qemu运行,速度最快. 镜像特征 ...

  3. 洛谷P2627 [USACO11OPEN]Mowing the Lawn G (单调队列优化DP)

    一道单调队列优化DP的入门题. f[i]表示到第i头牛时获得的最大效率. 状态转移方程:f[i]=max(f[j-1]-sum[j])+sum[i] ,i-k<=j<=i.j的意义表示断点 ...

  4. 从 Paxos 到 ZooKeeper

    分布式一致性 分布式文件系统.缓存系统和数据库等大型分布式存储系统中,分布式一致性都是一个重要的问题. 什么是分布式一致性?分布式一致性分为哪些类型?分布式系统达到一致性后将会是一个什么样的状态? 如 ...

  5. 详解ROMA Connect API 流控实现技术

    摘要:本文将详细描述API Gateway流控实现,揭开高性能秒级流控的技术细节. 1.概述 ROMA平台的核心系统ROMA Connect源自华为流程IT的集成平台,在华为内部有超过15年的企业业务 ...

  6. Python解决千年虫问题

    #避免千年虫(日期bug)问题 千年虫:部分计算机程序使用年份后两位作为记录年份,当日期跳转到00时候,默认会解析为1900,造成系统紊乱 lst=[45,89,1998,00,75,33,1968, ...

  7. Dubbo2.7详解

    Spring与Dubbo整合原理与源码分析 [1]注解@EnableDubbo @Target({ElementType.TYPE}) @Retention(RetentionPolicy.RUNTI ...

  8. 分支结构之二:switch-case

    1.格式 switch(表达式){case 常量1: 执行语句1; //break; case 常量2: 执行语句2; //break; ... default: 执行语句n; //break; } ...

  9. 驱动开发:内核枚举进程与线程ObCall回调

    在笔者上一篇文章<驱动开发:内核枚举Registry注册表回调>中我们通过特征码定位实现了对注册表回调的枚举,本篇文章LyShark将教大家如何枚举系统中的ProcessObCall进程回 ...

  10. Vue router简单配置入门案例

    { 注意驼峰命名法,不然会报错 } 1.在Views文件夹下创建Vue路由文件,例如: <template> </template>  <script> </ ...