python从TXT创建PDF文件—

使用reportlab创建PDF文件
电子书一般都是txt格式的，某些电子阅读器不能读取txt的文档，如DPT-RP1。因此本文从使用python实现txt到pdf的转换，并且支持生成目录，目录能够生成连接进行点击（前提是在txt文件中能够知道每个章节的位置），支持中文。

reportlab的使用可以查看reportlab官方文档。txt转pdf详细代码如下：

# coding: utf-8

# setting sts font utf-8
import sys
reload(sys)
sys.setdefaultencoding('utf-8')

from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from reportlab.platypus import BaseDocTemplate, Frame, PageTemplate, Paragraph
from reportlab.platypus.tableofcontents import TableOfContents
from reportlab.platypus import PageBreak
from reportlab.lib.pagesizes import A4

pdfmetrics.registerFont(TTFont('STSONG', './STSONG.TTF')) #register Font
pdfmetrics.registerFont(TTFont('simhei', './simhei.ttf')) #register Font
styles = getSampleStyleSheet()
styles.add(ParagraphStyle(fontName='STSONG', name='STSONG', leading=20, fontSize=12, firstLineIndent=22, wordWrap='CJK'))
styles.add(ParagraphStyle(fontName='simhei', name='simhei', leading=25, fontSize=14, wordWrap='CJK')) # content Font

class MyDocTemplate(BaseDocTemplate):
def __init__(self, filename, **kw):
self.allowSplitting = 0
apply(BaseDocTemplate.__init__, (self, filename), kw)

# Entries to the table of contents can be done either manually by
# calling the addEntry method on the TableOfContents object or automatically
# by sending a 'TOCEntry' notification in the afterFlowable method of
# the DocTemplate you are using. The data to be passed to notify is a list
# of three or four items countaining a level number, the entry text, the page
# number and an optional destination key which the entry should point to.
# This list will usually be created in a document template's method like
# afterFlowable(), making notification calls using the notify() method
# with appropriate data.

def afterFlowable(self, flowable):
"Registers TOC entries."
if flowable.__class__.__name__ == 'Paragraph':
text = flowable.getPlainText()
style = flowable.style.name
if style == 'Heading1':
level = 0
elif style == 'simhei':
level = 1
else:
return
E = [level, text, self.page]
#if we have a bookmark name append that to our notify data
bn = getattr(flowable,'_bookmarkName',None)
if bn is not None: E.append(bn)
self.notify('TOCEntry', tuple(E))

# this function makes our headings
def doHeading(data, text, sty):
from hashlib import sha1
# create bookmarkname
bn = sha1(text).hexdigest()
# modify paragraph text to include an anchor point with name bn
h = Paragraph(text + '<a name="%s"/>' % bn, sty)
# store the bookmark name on the flowable so afterFlowable can see this
h._bookmarkName = bn
data.append(h)

# Page Number
def footer(canvas, doc):
page_num = canvas.getPageNumber()
canvas.saveState()
P = Paragraph("%d" % page_num ,
styles['Normal'])
w, h = P.wrap(doc.width, doc.bottomMargin)
P.drawOn(canvas, doc.leftMargin + w/2, h)
canvas.restoreState()

# load txt file
def loadTxt(txt_path):
with open(txt_path, 'r') as f:
txt_datas = f.readlines()
return txt_datas

def toPDF(txt_datas, pdf_path):
PDF = MyDocTemplate(pdf_path, pagesize=A4)
frame = Frame(PDF.leftMargin, PDF.bottomMargin, PDF.width, PDF.height,
id='normal')
template = PageTemplate(frames=frame, onPage=footer)
PDF.addPageTemplates([template])

data = []

# table of contents
toc = TableOfContents()
# setting contents fontName and fontSize
toc.levelStyles = [
ParagraphStyle(fontName='simhei', fontSize=20, name='TOCHeading1', leftIndent=20, firstLineIndent=-20, spaceBefore=10,
leading=16),
ParagraphStyle(fontName='simhei', fontSize=18, name='TOCHeading2', leftIndent=40, firstLineIndent=-20, spaceBefore=5, leading=12),
]
data.append(toc) # add contents
data.append(PageBreak()) #next page

NUM = 0
# add txt
for txt_data in txt_datas:
txt_data = txt_data.lstrip() # remove left space
if len(txt_data) == 0: # no text
continue
try:
txt_data = txt_data.decode("gb2312")
except:
txt_data = txt_data.decode("gbk")

if txt_data[0] == u"第" and (u"章" in txt_data):
doHeading(data, txt_data, styles['simhei'])
else:
data.append(Paragraph(txt_data, styles['STSONG']))
NUM = NUM + 1
print('{} line'.format(NUM))

print('Build pdf!')
PDF.multiBuild(data)

if __name__ == "__main__":
txt_path = "财运天降.txt".decode("utf8")
pdf_path = "财运天降.pdf".decode("utf8")
txt_datas = loadTxt(txt_path)
toPDF(txt_datas, pdf_path)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
本代码在windows和python2下进行测试，主要注意有：

系统默认字体设置：
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
1
2
3
中文字体支持：
pdfmetrics.registerFont(TTFont('STSONG', './STSONG.TTF')) #register Font
pdfmetrics.registerFont(TTFont('simhei', './simhei.ttf')) #register Font
styles = getSampleStyleSheet(http://www.my516.com)
styles.add(ParagraphStyle(fontName='STSONG', name='STSONG', leading=20, fontSize=12, firstLineIndent=22, wordWrap='CJK'))
styles.add(ParagraphStyle(fontName='simhei', name='simhei', leading=25, fontSize=14, wordWrap='CJK')) # content Font
1
2
3
4
5
中文目录字体：
toc.levelStyles = [
ParagraphStyle(fontName='simhei', fontSize=20, name='TOCHeading1', leftIndent=20, firstLineIndent=-20, spaceBefore=10,
leading=16),
ParagraphStyle(fontName='simhei', fontSize=18, name='TOCHeading2', leftIndent=40, firstLineIndent=-20, spaceBefore=5, leading=12),
]
1
2
3
4
5
目录定位，这个需要根据你实际的txt文章进行定位修改
if txt_data[0] == u"第" and (u"章" in txt_data):
1
中文解码，由于繁体中文不能解码为gb2312，因此使用try-except的方式
try:
txt_data = txt_data.decode("gb2312")
except:
txt_data = txt_data.decode("gbk")
1
2
3
4
其效果如下：
网上随便找了个txt文章：

生成pdf目录：

生成pdf内容：
---------------------

python从TXT创建PDF文件——reportlab的更多相关文章

深入学习Python解析并解密PDF文件内容的方法
前面学习了解析PDF文档,并写入文档的知识,那篇文章的名字为深入学习Python解析并读取PDF文件内容的方法. 链接如下:https://www.cnblogs.com/wj-1314/p/9429 ...
深入学习python解析并读取PDF文件内容的方法
这篇文章主要学习了python解析并读取PDF文件内容的方法,包括对学习库的应用,python2.7和python3.6中python解析PDF文件内容库的更新,包括对pdfminer库的详细解释和应 ...
51单片机创建PDF文件
PDF文件有特定的格式要求,本以为.TXT与.PDF之间可以相互转换,只需要修改后缀名就可以了,然而事实并非如此. 如下为.PDF文件打开的编码显示. 如果需要创建PDF文件,只需要按照PDF的编码格 ...
Java 创建PDF文件包的2种方法
1. 概述 PDF文件包可方便在仅打开一个窗口的情况下阅读多个文档,通过将多个PDF文档或其他非PDF文档封装在一起,打开文件包后可以随意切换查看文件包中的文档,在需要编辑更改的情况,也可以打开文本包 ...
使用iText库创建PDF文件
前言译文连接:http://howtodoinjava.com/apache-commons/create-pdf-files-in-java-itext-tutorial/ 对于excel文件的读 ...
怎么用OCR图文识别软件在MS Office中创建PDF文件
ABBYY PDF Transformer+是一款可创建.编辑及将PDF文件转换为其他可编辑格式的OCR图文识别软件,不仅可以从纸质文档.图像文件和任何其他流行格式创建PDF文件(相关文章请参考如何从 ...
01.在Java中如何创建PDF文件
1.简介在这篇快速文章中,我们将重点介绍基于流行的iText和PdfBox库从头开始创建 PDF 文档. 2. Maven 依赖 <dependency> <groupId> ...
利用Python将多个PDF文件合并
from PyPDF2 import PdfFileMerger import os files = os.listdir()#列出目录中的所有文件 merger = PdfFileMerger() ...
【转】Python 深入浅出 - PyPDF2 处理 PDF 文件
实际应用中,可能会涉及处理 pdf 文件,PyPDF2 就是这样一个库,使用它可以轻松的处理 pdf 文件,它提供了读,割,合并,文件转换等多种操作. 文档地址:http://pythonhosted ...

随机推荐

洛谷—— P2424 约数和
https://www.luogu.org/problem/show?pid=2424 题目背景 Smart最近沉迷于对约数的研究中. 题目描述对于一个数X,函数f(X)表示X所有约数的和.例如:f ...
VBox虚拟机与主机(宿主)通讯原理以及socat(套接字猫)简单介绍
前言尝试虚拟机使用socat建立服务器端接口转发时,发现对虚拟机接入网络原理不是非常了解,于是乎上网查找资料想搞明白是怎么回事,于是乎有了这篇总结博文.socat可以在服务器端口间建立全双工通信通道 ...
shell EOF注意点
当sqlplus与shell交互的时候我们这么用 su - oracle -c "sqlplus / as sysdba<<EOF select * from gv($insta ...
centos7用rpm安装mysql5.7【初始用yum安装发现下载非常慢，就考虑本地用迅雷下载rpm方式安装】
1.下载 4个rpm包 mysql-community-client-5.7.26-1.el7.x86_64.rpmmysql-community-common-5.7.26-1.el7.x86_64 ...
C#格式化年月日截取
//if (bm.Name == "DateYear") //年 //{ // bm.Select(); ...
【钓起来的tips系列】
一.求n的阶乘: #include<bits/stdc++.h> using namespace std; int n; int jc(int k) { ); )*k; } /*int j ...
ASP之ViewState和IsPostBack
没怎么写过ASPX页面,今天在做增删改的界面的时候,修改出了问题. 根据传过来的ObjectID加载页面数据,赋值给TextBox控件后,修改控件的值回写数据库,发现值没有变化. 简单的例子如下: 然 ...
优先队列 + 并查集 + 字典树 + 欧拉回路 + 树状数组 + 线段树 + 线段树点更新 + KMP +AC自动机 + 扫描线
这里给出基本思想和实现代码 . 优先队列 : 曾经做过的一道例题坦克大战 struct node { int x,y,step; friend bool operator <(no ...
[NOI2015，LuoguP2146]软件包管理器------树剖
***题目链接戳我*** 又是在树上瞎搞滴题目.... 我们如果以安装的软件为1,未安装的软件为0,那么软件改变的数量即树上权值总和的数量,涉及到区间修改,区间查询,考虑树剖分析完毕,似乎没啥好说的 ...
Python定制容器
Python 中,像序列类型(如列表.元祖.字符串)或映射类型(如字典)都是属于容器类型,容器是可定制的.要想成功地实现容器的定制,我们需要先谈一谈协议.协议是什么呢?协议(Protocols)与其他 ...

python从TXT创建PDF文件——reportlab

python从TXT创建PDF文件——reportlab的更多相关文章

随机推荐

热门专题