tesseract-ocr了解
安装文件下载地址
http://tesseract-ocr.googlecode.com/files/tesseract-ocr-setup-3.01-1.exe
README
For the latest online version of the README.md see:
https://github.com/tesseract-ocr/tesseract/blob/master/README.md
About
This package contains an OCR engine - libtesseract and a command line program - tesseract.
The lead developer is Ray Smith. The maintainer is Zdenko Podobny. For a list of contributors seeAUTHORS and github's log of contributors.
Tesseract has unicode (UTF-8) support, and can recognize more than 100 languages "out of the box". It can be trained to recognize other languages. See Tesseract Training for more information.
Tesseract supports various output formats: plain-text, hocr(html), pdf.
This project does not include a GUI application. If you need one, please see the 3rdParty wiki page.
You should note that in many cases, in order to get better OCR results, you'll need to improve the qualityof the image you are giving Tesseract.
The latest stable version is 3.04.01, released in February 2016.
Brief history
Tesseract was originally developed at Hewlett-Packard Laboratories Bristol and at Hewlett-Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998.
In 2005 Tesseract was open sourced by HP. Since 2006 it is developed by Google.
For developers
Developers can use libtesseract C or C++ API to build their own application. If you need bindings tolibtesseract for other programming languages, please see the wrapper section on AddOns wiki page.
Documentation of Tesseract generated from source code by doxygen can be found on tesseract-ocr.github.io.
License
The code in this repository is licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
NOTE: This software depends on other packages that may be licensed under different open source licenses.
Installing Tesseract
You can either Install Tesseract via pre-built binary package or build it from source.
Running Tesseract
Basic command line usage:
tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfiles...]
For more information about the various command line options use tesseract --help or man tesseract.
Support
Mailing-lists: * tesseract-ocr - For tesseract users. * tesseract-dev - For tesseract developers.
Please read the FAQ before asking any question in the mailing-list or reporting an issue.
tesseract-ocr了解的更多相关文章
- tesseract ocr文字识别Android实例程序和训练工具全部源代码
tesseract ocr是一个开源的文字识别引擎,Android系统中也可以使用.可以识别50多种语言,通过自己训练识别库的方式,可以大大提高识别的准确率. 为了节省大家的学习时间,现将自己近期的学 ...
- Tesseract——OCR图像识别 入门篇
Tesseract——OCR图像识别 入门篇 最近给了我一个任务,让我研究图像识别,从我们项目的screenshot中识别文字信息,so我开始了学习,与大家分享下. 我看到目前OCR技术有很多,最主要 ...
- Tesseract Ocr引擎
Tesseract Ocr引擎 1.Tesseract介绍 tesseract 是一个google支持的开源ocr项目,其项目地址:https://github.com/tesseract-ocr/t ...
- 开源图片文字识别引擎——Tesseract OCR
Tessseract为一款开源.免费的OCR引擎,能够支持中文十分难得.虽然其识别效果不是很理想,但是对于要求不高的中小型项目来说,已经足够用了. 文字识别可应用于许多领域,如阅读.翻译.文献资料的检 ...
- Python下Tesseract Ocr引擎及安装介绍
1.Tesseract介绍 tesseract 是一个google支持的开源ocr项目,其项目地址:https://github.com/tesseract-ocr/tesseract,目前最新的源码 ...
- Tesseract OCR使用介绍
#Tesseract OCR使用介绍 ##目录[TOC] ##下载地址及介绍 官网介绍:http://code.google.com/p/tesseract-ocr/wiki/TrainingTess ...
- selenium使用笔记(二)——Tesseract OCR
在自动化测试过程中我们经常会遇到需要输入验证码的情况,而现在一般以图片验证码居多.通常我们处理这种情况应该用最简单的方式,让开发给个万能验证码或者直接将验证码这个环节跳过.之前在技术交流群里也跟朋友讨 ...
- alfresco install in linux, and integrated with tesseract ocr
本文描述在Linux系统上安装Alfresco的步骤: 1. 下载安装文件:alfresco-community-5.0.d-installer-linux-x64.bin 2. 增加执行权限并执行: ...
- 使用Tesseract OCR识别验证码
1.下载Tessrac OCR,默认安装 2.把验证码code.jpg图片放在D盘 3.打开cmd,进入D盘,输入:tesseract code.jpg result 4.进入D盘,生成了resul ...
- Tesseract ocr 3.02学习记录一
光学字符识别(OCR,Optical Character Recognition)是指对文本资料进行扫描,然后对图像文件进行分析处理,获取文字及版面信息的过程.OCR技术非常专业,一般多是印刷.打印行 ...
随机推荐
- less文件的样式无法生效的一个原因,通过WEB浏览器访问服务器less文件地址返回404错误
有一种情况容易导致less文件的样式无法生效,就是部分服务器(以IIS居多)会对未知后缀的文件返回404,导致无法正常读取.less文件.解决方案是在服务器中为.less文件配置MIME值为text/ ...
- 【海岛帝国系列赛】No.3 海岛帝国:运输资源
海岛帝国:运输资源 [试题描述] YSF考虑到“药师傅”帝国现在资源极度不平均,于是,商讨启用南水北调工程.YZM为首席工程师.现在,YSF由于工作紧张,准备军用物资和民用物资.但他要时时关注运输工程 ...
- 单例模式在Java和C#中的实现
单例模式算是最常见和最容易理解一种设计模式了.通常是指某一个类只有一实例存在,存在的空间我认为可以理解为该类所在的应用系统内,还有一种是在某一个容器内单一存在,比如像spring的IOC容器(作用域为 ...
- web跨页弹窗选值
最近在项目中看到这样一种效果——点击当前网页文本框,然后弹出一个缩小的网页,并在网页里选择或填写数据,然后又返回当前网页,小网页关闭.感觉非常不错,其实在以前网上也看见过,只是当时没有留心.今天抽时间 ...
- uploadify3.2.1加载时,报NetworkError 404 Not Found或NetworkError forbidden错误
我用的uploadify的版本是3.2.1 在打开配置了uploadify的页面的时候,什么操作都没有,仅仅是打开了页面,在火狐里可以看到一行报错信息,我的uploadify页面 在"/项目 ...
- Java常用快捷键
Ctrl+M 最大化当前的Edit或View (再按则反之) Crl+k:查找下一处 Ctrl+Shift+O 导包 Ctrl+W 关闭当前EditerCtrl+shift+W 关闭所有当前页 Ctr ...
- Bind[Exclude|Include]排除字段或只允许字段验证
public ActionResult xx([Bind(Exclude = "id")] xxModel xx, HttpPostedFileBase file)//排除id验证 ...
- Spring的javaMail邮件发送(带附件)
项目中经常用到邮件功能,在这里简单的做一下笔记,方便日后温习. 首先需要在配置文件jdbc.properties添加: #------------ Mail ------------ mail.smt ...
- ecshop添加商品选择品牌时如何按拼音排序
ECSHOP后台添加新商品时,有一个选择品牌的下拉框,如果品牌太多,在下拉框里查找起来很不方便. 我想给“下拉框里的品牌列表”按品牌名的拼音排序,比如有“中国水利出版社” “中国人民出版社” 这两个品 ...
- python: indentationerror: unexpected indent
以后遇到了IndentationError: unexpected indent你就要知道python编译器是在告诉你“Hi,老兄,你的文件里格式不对了,可能是tab和空格没对齐的问题,你需要检查下t ...