安装文件下载地址

http://tesseract-ocr.googlecode.com/files/tesseract-ocr-setup-3.01-1.exe

README

For the latest online version of the README.md see:

https://github.com/tesseract-ocr/tesseract/blob/master/README.md

About

This package contains an OCR engine - libtesseract and a command line program - tesseract.

The lead developer is Ray Smith. The maintainer is Zdenko Podobny. For a list of contributors seeAUTHORS and github's log of contributors.

Tesseract has unicode (UTF-8) support, and can recognize more than 100 languages "out of the box". It can be trained to recognize other languages. See Tesseract Training for more information.

Tesseract supports various output formats: plain-text, hocr(html), pdf.

This project does not include a GUI application. If you need one, please see the 3rdParty wiki page.

You should note that in many cases, in order to get better OCR results, you'll need to improve the qualityof the image you are giving Tesseract.

The latest stable version is 3.04.01, released in February 2016.

Brief history

Tesseract was originally developed at Hewlett-Packard Laboratories Bristol and at Hewlett-Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998.

In 2005 Tesseract was open sourced by HP. Since 2006 it is developed by Google.

Release Notes

For developers

Developers can use libtesseract C or C++ API to build their own application. If you need bindings tolibtesseract for other programming languages, please see the wrapper section on AddOns wiki page.

Documentation of Tesseract generated from source code by doxygen can be found on tesseract-ocr.github.io.

License

The code in this repository is licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

NOTE: This software depends on other packages that may be licensed under different open source licenses.

Installing Tesseract

You can either Install Tesseract via pre-built binary package or build it from source.

Running Tesseract

Basic command line usage:

tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfiles...]

For more information about the various command line options use tesseract --help or man tesseract.

Support

Mailing-lists: * tesseract-ocr - For tesseract users. * tesseract-dev - For tesseract developers.

Please read the FAQ before asking any question in the mailing-list or reporting an issue.

tesseract-ocr了解的更多相关文章

  1. tesseract ocr文字识别Android实例程序和训练工具全部源代码

    tesseract ocr是一个开源的文字识别引擎,Android系统中也可以使用.可以识别50多种语言,通过自己训练识别库的方式,可以大大提高识别的准确率. 为了节省大家的学习时间,现将自己近期的学 ...

  2. Tesseract——OCR图像识别 入门篇

    Tesseract——OCR图像识别 入门篇 最近给了我一个任务,让我研究图像识别,从我们项目的screenshot中识别文字信息,so我开始了学习,与大家分享下. 我看到目前OCR技术有很多,最主要 ...

  3. Tesseract Ocr引擎

    Tesseract Ocr引擎 1.Tesseract介绍 tesseract 是一个google支持的开源ocr项目,其项目地址:https://github.com/tesseract-ocr/t ...

  4. 开源图片文字识别引擎——Tesseract OCR

    Tessseract为一款开源.免费的OCR引擎,能够支持中文十分难得.虽然其识别效果不是很理想,但是对于要求不高的中小型项目来说,已经足够用了. 文字识别可应用于许多领域,如阅读.翻译.文献资料的检 ...

  5. Python下Tesseract Ocr引擎及安装介绍

    1.Tesseract介绍 tesseract 是一个google支持的开源ocr项目,其项目地址:https://github.com/tesseract-ocr/tesseract,目前最新的源码 ...

  6. Tesseract OCR使用介绍

    #Tesseract OCR使用介绍 ##目录[TOC] ##下载地址及介绍 官网介绍:http://code.google.com/p/tesseract-ocr/wiki/TrainingTess ...

  7. selenium使用笔记(二)——Tesseract OCR

    在自动化测试过程中我们经常会遇到需要输入验证码的情况,而现在一般以图片验证码居多.通常我们处理这种情况应该用最简单的方式,让开发给个万能验证码或者直接将验证码这个环节跳过.之前在技术交流群里也跟朋友讨 ...

  8. alfresco install in linux, and integrated with tesseract ocr

    本文描述在Linux系统上安装Alfresco的步骤: 1. 下载安装文件:alfresco-community-5.0.d-installer-linux-x64.bin 2. 增加执行权限并执行: ...

  9. 使用Tesseract OCR识别验证码

    1.下载Tessrac OCR,默认安装 2.把验证码code.jpg图片放在D盘 3.打开cmd,进入D盘,输入:tesseract  code.jpg result 4.进入D盘,生成了resul ...

  10. Tesseract ocr 3.02学习记录一

    光学字符识别(OCR,Optical Character Recognition)是指对文本资料进行扫描,然后对图像文件进行分析处理,获取文字及版面信息的过程.OCR技术非常专业,一般多是印刷.打印行 ...

随机推荐

  1. Openstack的mysql数据多主galera的错误

    登录openstack的在dashboard,提示说权限验证错误,有2种情况: 1. 密码被人改了. 2. 系统发生了问题. 密码确认没人改,所以查看/var/log/keystone-all.log ...

  2. SQL SERVER2012秘钥

    来自网络: MICROSOFT SQL SERVER 2012 DEVELOPER 版(开发版)序列号:YQWTX-G8T4R-QW4XX-BVH62-GP68YMICROSOFT SQL SERVE ...

  3. centos rabbitmq

    1. 安装erlang  安装依赖环境 yum -y install make gcc gcc-c++ kernel-devel m4 ncurses-devel openssl-devel unix ...

  4. D5转Xe点滴

    1. VarIsNull 函数已经被 Variants 单元, 相应的 Var 相关都放在哪 2.原本在 SysUtils 下的部分格式函数或定义,例如 .DecimalSeparator:  Sho ...

  5. linux设备驱动归纳总结(四):2.进程调度的相关概念【转】

    本文转载自:http://blog.chinaunix.net/uid-25014876-id-65555.html linux设备驱动归纳总结(四):2.进程调度的相关概念 xxxxxxxxxxxx ...

  6. android 开发中的常见问题

    Android studio 使用极光推送, 显示获取sdk版本失败 在 build.gradle(Module.app) 添加 android {    sourceSets.main {      ...

  7. 测试过程中LR的关联报错

    在测试过程中,录制的脚本会做一些关联.在测试的过程中,常常出现关联失败的情况. 如果最后的结果有检查点,检查点失败而事务失败. 每次出现这样的情况,我都不知道如何办.为了不出现错误,我都在关联函数里面 ...

  8. DataGuard主备归档存在gap的处理办法

    DataGuard主备之间可能由于网络等原因,造成备库和主库之间的归档日志不一致,这样就产生了gap. 解决gap的步骤: 1.在备库获得gap的详细信息 2.将需要的归档日志从主库拷贝到备库 3.备 ...

  9. 记录整合sprinmvc+log4j的的过程

    简介 由于进一步的学习以及便于自己更好的调试程序中遇到的错误,开始了将log4j整合到web项目中,项目是基于springmvc的,所以就做了一个springmvc和web项目的整合demo,本篇博客 ...

  10. Android中的图片压缩

    1.android中计算图片占用堆内存的kB大小跟图片本身的kB大小无关,而是根据图片的尺寸来计算的. 比如一张 480*320大小的图片占用的堆内存大小为: 480*320*4/1024=600kB ...