使用C#程序处理PowerPoint文件中的字符串

最近，有同事偶然发现Microsoft Office PowerPoint可以被看作是一个压缩包，然后通过WinRAR解压出来一组XML文件。解压出来的文件包括：

一个索引文件名称为：[Content_Types].xml,

一个名为ppt的文件夹，在其内有两个重要的子文件夹：slides 和notesSlides

其中, [Content_Types].xml记录了每一张Slide的相对路径，每一个Slide note的相对路径。其内容如下图：

我们发现PPT中的所有内容都被记录在XML的<a:t></a:t>节点中，所以，我们通过把所有a:t节点的内容导出，然后对内容进行修改，修改后再替换回原文件，这样将这一组文件进行压缩，生成了修改后的PowerPoint文件，该过程为PowerPoint的内容本地化提供了便捷途径。

这种做法相比较于调用Microsoft.Office.Interop.PowerPoint中的API的做法来说，保留了原文的100%的格式，不需要后期PPT刷格式的操作。

以下是我们写的C#代码，思路是将每张Slide的字符串导出到一个txt文件,通过trados翻译txt文件中的字符串，然后将修改后内容导入到PPT包内相应的XML文件中。

PPTZIPCommon

class PPTZIPCommon

    {

        /// <summary>

        /// read file [Content_Types].xml

        /// </summary>

        /// <param name="root"></param>

        /// <param name="SlideFiles">return slides </param>

        /// <param name="NotesFiles">return slide notes</param>

        internal static void ReadContentTypes(string root, ref List<string> SlideFiles, ref List<string> NotesFiles)

        {

            string ct_file = @"[Content_Types].xml";

            string ct_fullName = Path.Combine(root, ct_file);

            if (!File.Exists(ct_fullName))

            {

                MessageBox.Show(string.Format("the [Content_Types].xml not exist in {0}", root));

                return;

            }

            XmlDocument xml_doc = new XmlDocument();

            xml_doc.Load(ct_fullName);

            XmlElement rootElement = xml_doc.DocumentElement;

            string slide_types = "application/vnd.openxmlformats-officedocument.presentationml.slide+xml";

            string notes_types = "application/vnd.openxmlformats-officedocument.presentationml.notesSlide+xml";

            XmlNodeList nodes = rootElement.ChildNodes;

            foreach (XmlElement node in nodes)

            {

                if (node.Attributes["ContentType"].Value == slide_types)

                {

                    string relatedPath = node.Attributes["PartName"].Value.Remove(, ).Replace(@"/", @"\");

                    string file = Path.Combine(root, relatedPath);

                    SlideFiles.Add(file);

                }

                else if (node.Attributes["ContentType"].Value == notes_types)

                {

                    string relatedPath = node.Attributes["PartName"].Value.Remove(, ).Replace(@"/", @"\");

                    string file = Path.Combine(root, relatedPath);

                    NotesFiles.Add(file);

                }

            }

        }

        internal static string GetPPTNameFromFullPath(string scanFolder)

        {

            int lastIndexOfSlash = scanFolder.LastIndexOf(@"\");

            if (lastIndexOfSlash == scanFolder.Length - )

            {

                scanFolder = scanFolder.Remove(lastIndexOfSlash);

            }

            string lastString = scanFolder.Substring(scanFolder.LastIndexOf(@"\") + );

            string[] names = lastString.Split(new string[] { "." }, StringSplitOptions.RemoveEmptyEntries);

            return names[];

        }

    }

PPTZIP

class PPTZIP

    {

        private static List<string> SlideFiles = new List<string>();

        private static List<string> NotesFiles = new List<string>();

        /// <summary>

        /// collect together all the <a:t>...</a:t> strings, put it in txt file

        /// txt file be saved to output\<original PPT name>_<fileName>.txt

        /// </summary>

        /// <param name="file">xml file that contains <a:t>...</a:t></param>

        /// <param name="output">the txt file be saved to the output folder</param>

        /// <param name="pptName">original PowerPoint file name</param>

        private static void ReadATContent2TXT(string file, string output, string pptName)

        {

            StringBuilder sb = new StringBuilder();

            using (StreamReader reader = new StreamReader(file))

            {

                string content = reader.ReadToEnd();

                string pattern = @"<a:t>.[^<>]+</a:t>";

                MatchCollection mc = Regex.Matches(content, pattern);

                for (int i = ; i < mc.Count; i++)

                {

                    sb.AppendLine(string.Format("{0}^", mc[i].Value.Substring(, mc[i].Value.LastIndexOf("<") - )));

                }

            }

            FileInfo fi = new FileInfo(file);

            string txtFile = Path.Combine(output, pptName+"_"+fi.Name + ".txt");

            using (StreamWriter writer = new StreamWriter(txtFile))

            {

                writer.Write(sb.ToString().Trim());

                writer.Flush();

                writer.Close();

            }

        }

        public static void Export2TXTs(string scanFolder)

        {

            string ppt_name = PPTZIPCommon.GetPPTNameFromFullPath(scanFolder);

            PPTZIPCommon.ReadContentTypes(scanFolder, ref SlideFiles, ref NotesFiles);

            if (null != SlideFiles && SlideFiles.Count > )

            {

                foreach (var file in SlideFiles)

                {

                    string outputfolder = Path.Combine(scanFolder, "SlideTXTs");

                    if (!Directory.Exists(outputfolder))

                        Directory.CreateDirectory(outputfolder);

                    string transFolder = Path.Combine(scanFolder, "SlideTXTs_Trans");

                    if (!Directory.Exists(transFolder))

                        Directory.CreateDirectory(transFolder);

                    ReadATContent2TXT(file, outputfolder, ppt_name);

                }

            }

            if (null != NotesFiles && NotesFiles.Count > )

            {

                foreach (var file in NotesFiles)

                {

                    string outputfolder = Path.Combine(scanFolder, "NotesTXTs");

                    if (!Directory.Exists(outputfolder))

                        Directory.CreateDirectory(outputfolder);

                    string transFolder = Path.Combine(scanFolder, "NotesTXTs_Trans");

                    if (!Directory.Exists(transFolder))

                        Directory.CreateDirectory(transFolder);

                    ReadATContent2TXT(file, outputfolder,ppt_name);

                }

            }

        }

    }

PPTZIPWriter

class PPTZIPWriter

    {

        private static List<string> SlideFiles = new List<string>();

        private static List<string> NotesFiles = new List<string>();              

        private static void Replace(string file, List<string> original, List<string> translated)

        {

            string content = string.Empty;

            using (StreamReader reader = new StreamReader(file))

            {

                content = reader.ReadToEnd();

                for (int i = ; i < original.Count; i++)

                {

                    content = content.Replace(string.Format("<a:t>{0}</a:t>", original[i]), string.Format("<a:t>{0}</a:t>", translated[i]));

                }

                reader.Close();

            }

            using (StreamWriter writer = new StreamWriter(file))

            {

                writer.Write(content);

                writer.Flush();

                writer.Close();

            }

        }

        public static void Import2PPT(string scanFolder, string lan)

        {

            string ppt_name = PPTZIPCommon.GetPPTNameFromFullPath(scanFolder);

            // fullfill the two lists: SlideFiles and NotesFiles

            PPTZIPCommon.ReadContentTypes(scanFolder,ref SlideFiles, ref NotesFiles);

            string srcFolder = "SlideTXTs";

            string trgFolder = "SlideTXTs_Trans";

            string srcFullPath = Path.Combine(scanFolder, srcFolder);

            string trgFullPath = Path.Combine(scanFolder, trgFolder);

            foreach (var file in SlideFiles)

            {

                ReplaceATContent(file, srcFullPath, trgFullPath, ppt_name, lan);

            }

            string srcFolderNotes = "NotesTXTs";

            string trgFolderNotes = "NotesTXTs_Trans";

            string srcFullPath_trans = Path.Combine(scanFolder, srcFolderNotes);

            string trgFullPath_trans = Path.Combine(scanFolder, trgFolderNotes);

            foreach (var file in NotesFiles)

            {

                ReplaceATContent(file, srcFullPath_trans, trgFullPath_trans, ppt_name, lan);

            }

        }

        private static void ReplaceATContent(string file, string srcFolder, string trgFolder, string pptName, string lan)

        {

            if (!(Directory.Exists(srcFolder) && Directory.Exists(trgFolder)))

            {

                MessageBox.Show("SlideTXTs/NotesTXTs or SlideTXTs_Trans/NotesTXTs_Trans not exist");

                return;

            }

            FileInfo fi = new FileInfo(file);

            string srcFileName = string.Format("{0}_{1}.txt",pptName,fi.Name);

            string srcFileFullPath = Path.Combine(srcFolder, srcFileName);

             string trgFileName= string.Empty;

            if(lan==string.Empty)

                trgFileName = string.Format("{0}_{1}.txt", pptName, fi.Name);

            else

                trgFileName = string.Format("{0}_{1}_{2}.txt",pptName, fi.Name,lan);

            string trgFileFullPath = Path.Combine(trgFolder, trgFileName);

            if (!(File.Exists(srcFileFullPath) && File.Exists(trgFileFullPath)))

            {

                MessageBox.Show(string.Format(@"File {0} not replaced",file));

                return;

            }

            List<string> originalString = new List<string>();

            using (StreamReader reader = new StreamReader(srcFileFullPath))

            {

                string content = reader.ReadToEnd().Trim();

                string[] strings = content.Split(new string[] { "^" }, StringSplitOptions.RemoveEmptyEntries);

                for (int i = ; i < strings.Length; i++)

                {

                    originalString.Add(strings[i].Contains("\r\n") ? strings[i].Remove(, ) : strings[i]);

                }

            }

            List<string> translatedString = new List<string>();

            using (StreamReader reader = new StreamReader(trgFileFullPath))

            {

                string content = reader.ReadToEnd().Trim();

                string[] strings = content.Split(new string[] { "^" }, StringSplitOptions.RemoveEmptyEntries);

                for (int i = ; i < strings.Length; i++)

                {

                    translatedString.Add(strings[i].Contains("\r\n") ? strings[i].Remove(, ) : strings[i]);

                }

            }

            if (originalString.Count != translatedString.Count)

            {

                MessageBox.Show(string.Format(@"translation string count not match:{0}",file));

                return;

            }

            Replace(file, originalString, translatedString);

        }

    }

使用C#程序处理PowerPoint文件中的字符串的更多相关文章

python 小程序，替换文件中的字符串
[root@PythonPC ~]# cat passwd root:x:::root:/root:/bin/bash bin:x:::bin:/bin:/sbin/nologin daemon:x: ...
在文件夹中的指定类型文件中查找字符串（CodeBlocks+GCC编译，控制台程序，仅能在Windows上运行）
说明: 程序使用 io.h 中的 _findfirst 和 _findnext 函数遍历文件夹,故而程序只能在 Windows 下使用. 程序遍历当前文件夹,对其中的文件夹执行递归遍历.同时检查遍历到 ...
Java基础知识强化之IO流笔记52：IO流练习之把一个文件中的字符串排序后再写入另一个文件案例
1. 把一个文件中的字符串排序后再写入另一个文件已知s.txt文件中有这样的一个字符串:"hcexfgijkamdnoqrzstuvwybpl" 请编写程序读取数据内容,把数据排 ...
linux上查找文件存放地点和文件中查找字符串方法
一.查找文件存放地点 1.locate 语法:locate <filename> locate命令实际是"find -name"的另一种写法,但是查找方式跟find不同 ...
Objective-C 【从文件中读写字符串（直接读写/通过NSURL读写）】
———————————————————————————————————————————从文件中读写字符串(直接读写/通过NSURL读写) #import <Foundation/Foundati ...
Linux命令行批量替换多文件中的字符串【转】
Linux命令行批量替换多文件中的字符串[转自百度文库] 一种是Mahuinan法,一种是Sumly法,一种是30T法分别如下: 一.Mahuinan法: 用sed命令可以批量替换多个文件中的字符串. ...
c++ 读取不了hdf5文件中的字符串
问题描述: 在拿到一个hdf5文件,想用c++去读取文件中的字符串,但是会报错:read failed ps: c++读取hdf5的字符串方法见:https://support.hdfgroup.or ...
新手C#s.Split()，s.Substring(，)以及读取txt文件中的字符串的学习2018.08.05
s.split()用于字符串分割,具有多种重载方法,可以通过指定字符或字符串分割原字符串成为字符串数组. //s.Split()用于分割字符串为字符串数组,StringSplitOptions.Rem ...
使用 awk 过滤文本或文件中的字符串
当我们在 Unix/Linux 下使用特定的命令从字符串或文件中读取或编辑文本时,我们经常需要过滤输出以得到感兴趣的部分.这时正则表达式就派上用场了. 什么是正则表达式? 正则表达式可以定义为代表若干 ...

随机推荐

maven+springmvc+spring+mybatis+velocity整合
一.ssmm简介 ssmm是当下企业最常用的开发框架架构 maven:管理项目jar包,构建项目 spring:IOC容器,事务管理 springmvc:mvc框架 myBatis:持久层框架 v ...
Linux虚拟机中 Node.js 开发环境搭建
Node.js 开发环境搭建: 1.下载CentOS镜像文件和VMWare虚拟机程序; 2.安装VMWare——>添加虚拟机——>选择CentOS镜像文件即可默认安装带有桌面的Linux虚 ...
ASP.net Core部署说明（Ubuntu） [转]
最近在学习asp.net core,当然学习的目的是想了解一下,Asp.net core是否真的能够是先跨平台部署. 根据目前官网资料说明,asp.net core只有在Redhat 企业版上,才能够 ...
ImageLoader1
package com.bawei.activity; import android.app.Activity; import android.graphics.Bitmap; import andr ...
多个radiobutton选定一个
asp.net中怎么判断其中一个radiobutton被选中后登录的是一个窗体,另一个被选中后登录的是另一个窗体. 页面设置两按钮的GroupName为同一组值: <asp:RadioButto ...
MVC是什么？
MVC全名是Model View Controller,是模型(model)-视图(view)-控制器(controller)的缩写,一种软件设计典范,用一种业务逻辑.数据.界面显示分离的方法组织代码 ...
django：field字段类型
字段类型(Field types) AutoField 它是一个根据 ID 自增长的 IntegerField 字段.通常,你不必直接使用该字段.如果你没在别的字段上指定主键,Django 就会自动 ...
powershell字符界面的，powershell加WPF界面的，2048游戏
------[序言]------ 1 2048游戏,有段时间很火,我在地铁上看有人玩过.没错,坐地铁很无聊,人家玩我就一直盯着看. 2 我在电脑上找了一个,试玩了以下,没几次格子就满了.我就气呼呼的放 ...
<![CDATA[]]>作用
<![CDATA[]]>的作用是让XML解析器将标签中的字符串当作普通文本对待,以防止某些字符串对XML格式造成破坏.
Yii Uploadify批量上传
控制器: $reinfo = "fail"; $filename=""; //重要说明: //使用uploadify 上传时,每次这个sessionID都会改变 ...

使用C#程序处理PowerPoint文件中的字符串

使用C#程序处理PowerPoint文件中的字符串的更多相关文章

随机推荐

热门专题