org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents

异常：org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)

1、场景

项目中需要使用到读取 word 文档中的内容，使用的工具是 apache poi 来实现 word 、ppt 、excel 等文件的读取。在开发过程中，读取文件的过程中，出现了异常： org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)

2、分析

office中，ppt 文档的保存是有 ppt（office 2003-2007）和 pptx 两种格式的。在 apche poi 中，对不同格式的 ppt 文档是不同类进行支持的。

图示中使用的是 XMLSlideShow 类读取 ppt 格式的文档，而 XMLSlideShow 是只支持 pptx 格式的文档的读取，所以会报错。

错误示例：

3、ppt 和 pptx 文档读取详解

现在对读取两种格式的ppt的读取，做正确的示例代码详解：

读取 ppt

// 使用 HSLFSlideShow 类读取 ppt 格式文档

// --------- ppt -----------

File file = new File("E:\\search-file\\44.ppt");

FileInputStream fis = null;

HSLFSlideShow document = null;

SlideShowExtractor extractor = null;

try {

    fis = new FileInputStream(file);

    document = new HSLFSlideShow(fis);

    extractor = new SlideShowExtractor(document);

    log.info("extractor.getText:{}", extractor.getText());

} catch (Exception e) {

    e.printStackTrace();

}

格式使用错误就会报错：org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)

读取 pptx

// 使用 XMLSlideShow 类读取 pptx 格式的文档

// --------- pptx -----------

File file = new File("E:\\search-file\\33.pptx");

FileInputStream fis = null;

XMLSlideShow document = null;

SlideShowExtractor extractor = null;

try {

    fis = new FileInputStream(file);

    document = new XMLSlideShow(fis);

    extractor = new SlideShowExtractor(document);

    log.info("extractor.getText:{}", extractor.getText());

} catch (Exception e) {

    e.printStackTrace();

}

XWPFDocument 类读取 doc 格式文档使用错误会报错：org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)

4、总结

apache poi 工具还是很强大的，功能非常多，对具体使用也可参考 apache poi 的官方文档：

https://poi.apache.org/apidocs/index.html

请注意自己使用的 apache poi 的版本，参考对应版本的 javadocs