http://www.pixel-technology.com/freeware/tessnet2/

Tessnet2 a .NET 2.0 Open Source OCR assembly using Tesseract engine

Keywords: Open source, OCR, Tesseract, .NET, DOTNET, C#, VB.NET, C++/CLI

Current version : 2.04.0, 02SEP09 (see version history)

The big picture
Tesseractis a C++
open source OCR engine. Tessnet2 is .NET assembly that expose very simple methods
to do OCR.
Tessnet2 is multi threaded. It uses the engine the same way Tesseract.exe does.
Tessdll uses another method (no thresholding).

License
Tessnet2 is under Apache 2 license (like tesseract), meaning you can use it like
you want, included in commercial products. You can read full license info in
source file.

Quick Tessnet2 usage

  1. Download binary here, add a reference
    of the assembly Tessnet2.dll to your .NET project.

  2. Download language data definition file
    here and
    put it in tessdata directory. Tessdata directory and your exe must be in the
    same directory.

  3. Look at the Program.cs sample

Note: Tessnet2.dll needs Visual C++ 2008 Runtime. When deploying
your application be sure to install C++ runtime (x86,
x64)

Tessnet2 usage

Bitmap
image = new
Bitmap("eurotext.tif");
tessnet2.Tesseract
ocr =
new tessnet2.Tesseract();
ocr.SetVariable("tessedit_char_whitelist",
"0123456789");
// If digit only
ocr.Init(@"c:\temp", "fra",
false);
// To use correct tessdata
List<tessnet2.Word>
result = ocr.DoOCR(image, Rectangle.Empty);
foreach (tessnet2.Word
word in result)
    Console.WriteLine("{0}
: {1}", word.Confidence, word.Text);

Tessnet2 source code and recompiling

  1. Download Tesseract
    source code here
    and expand it in a directory

  2. Download Tessnet2 source code here and
    expand it in Tesseract source code root directory (it should create dotnet sub
    directory)

  3. Open the project solution tessnet2.sln. It's a Visual
    Studio 2008 C++/CLI project

Memory leak

Tesseract C++ source code is full of memory leak. Using
tessnet2 assembly several time will cause memory overflow. This is not tessnet2
leak, this is tesseract leak and I spent two days in tesseract source code
trying to improve this with no success.
See
what I think about this.

Tessnet2 demo
In the Tessnet2 source code you have two C# demo project. TesseractOCR is a multi-tread
WinForm demo with a progression bar. TesseractConsole is a console demo.


The confidence score is between braquets. < 160 mean not bad

Version History

07JUN08: First release on Tesserect
2.03

10JUN08: Version 2.03.1. Change Confidence
behavior, now it's calculated from each word letter and not from the first letter.
Type change from byte to double. 0 = perfect, 100 = reject

13JUN08 : Version 2.03.2

After 3 days in Tesseract code (urgh), here is Tessnet2 version
2.03.2
The corrections deals with the following problems
* Confidence was not very useful, the value was strange. This has been corrected,
setting the variable tessedit_write_ratings=true. After many test I found this mode
is the best for confidence accuracy. Value range from 0 (perfect) to 255 (reject)
. When value goes over 160 this really mean the OCR was bad.
* Calling DoOCR twice was not giving the same result. It was, as expected, a problem
with global variables. The problem is almost fixed, sometime it doesn’t work but
right now I can’t find what is not correctly reinitialized.

Tessnet2 a .NET 2.0 Open Source OCR assembly using Tesseract engine的更多相关文章

  1. windows 10 上源码编译boost 1.66.0 | compile boost 1.66.0 from source on windows 10

    本文首发于个人博客https://kezunlin.me/post/854071ac/,欢迎阅读! compile boost 1.66.0 from source on windows 10 Ser ...

  2. Ubuntu 16.04源码编译boost库 编写CMakeLists.txt | compile boost 1.66.0 from source on ubuntu 16.04

    本文首发于个人博客https://kezunlin.me/post/d5d4a460/,欢迎阅读! compile boost 1.66.0 from source on ubuntu 16.04 G ...

  3. Flume-ng-1.4.0 spooling source的方式增加了对目录的递归检测的支持

    因为flume的spooldir不支持子目录文件的递归检测,并且业务需要,所以修改了源码,重新编译 代码修改参考自:http://blog.csdn.net/yangbutao/article/det ...

  4. 【转】OCR识别引擎tesseract使用方法——安装leptonica和libtiff

    原文来自:http://cache.baiducontent.com/c?m=9f65cb4a8c8507ed4fece7631046893b4c4380146d96864968d4e414c4224 ...

  5. 开源OCR识别库-Tesseract介绍

    最近在github上面看到一个开源的ocr文字识别库,感觉效果还可以,所以在这里介绍一下,这个项目的原地址在:https://github.com/tesseract-ocr/tesseract. t ...

  6. OCR学习及tesseract的一些测试

    最近接触OCR,先收集一些资料,包括成熟软件.SDK.流行算法. 1. 一个对现有OCR软件及SDK的总结,比较全面,包括支持平台.编程语言.支持字体语言.输出格式.相关链接等 http://en.w ...

  7. 在.net中创建Access数据库

    static void Main(string[] args) { //环境要求 //安装 access 2003, //引用com组件:Microsoft ADO Ext. 2.8 for DDL ...

  8. 由于OCR文件损坏造成Oracle RAC不能启动的现象和处理方法

    v$cluster_interconnects 集群节点间通信使用的IP地址 错误信息 使用了公网进行连接 SQL> select * from v$cluster_interconnects; ...

  9. Android 4.0 源代码结构

    Android源码的第一级目录结构   Android/abi (abi相关代码.ABI:application binary interface,应用程序二进制接口)   Android/bioni ...

随机推荐

  1. UVA-10129 Play on Words (判断欧拉道路的存在性)

    题目大意:给出一系列单词,当某个单词的首字母和前一个单词的尾字母相同,则这两个单词能链接起来.给出一系列单词,问是否能够连起来. 题目分析:以单词的首尾字母为点,单词为边建立有向图,便是判断图中是否存 ...

  2. SpringBoot 使用 EhCache2.x 缓存(三十一)

    SpringBoot 使用 EhCache2.x 缓存入门很简单,废话少说上干货: 1.在POM.xml中增加jar包 <!--开启 cache 缓存--> <dependency& ...

  3. PHP:第五章——字符串加密及校验函数

    <?php header("Content-Type:text/html;charset=utf-8"); //1.md5——计算字符中的散列值 //对一段信息(Messag ...

  4. jsr303 参考表

    下面是主要的验证注解及说明: 注解 适用的数据类型 说明 @AssertFalse Boolean, boolean 验证注解的元素值是false @AssertTrue Boolean, boole ...

  5. Struts2基本使用(二)--配置文件简述

    配置文件简述 引入Struts2框架之后项目中多了一个struts.xml,以及web.xml也多了一些代码 web.xml只要的功能就是拦截用户的请求其多出的代码如下: <filter> ...

  6. 从开发者的角度分析iOS应如何省电

    从开发者的角度分析iOS应如何省电 说明 网上关于iPhone如何省电的文章很多.但是基本没有讲原理.而在生活中,很多人在使用iPhone中有着明显的错误的省电习惯. 本文从iOS开发者的角度,对iO ...

  7. 让opencv输出人脸检测的得分(置信率)

    最近项目略多,其中一个需要找出一些和脸比较像但是不是脸的负样本,想用opencv的人脸检测器检测到的错误脸作为这样的负样本. 但是国内(包括国外)居然几乎没有相关的资料如何输出detectMultiS ...

  8. LG4719 【模板】动态dp 及 LG4751 动态dp【加强版】

    题意 题目描述 给定一棵\(n\)个点的树,点带点权. 有\(m\)次操作,每次操作给定\(x,y\),表示修改点\(x\)的权值为\(y\). 你需要在每次操作之后求出这棵树的最大权独立集的权值大小 ...

  9. 我的AOP那点事儿--2

    在<我的AOP那点事儿-1>中,从写死代码,到使用代理:从编程式AOP到声明式AOP.一切都朝着简单实用主义的方向在发展.沿着 Spring AOP 的方向,Rod Johnson(老罗) ...

  10. IO流常规操作

    IO流 IO就是输入输出,IO设备在计算机中起着举足轻重的作用,IO流也就是输入输出流,用来交互数据,程序和程序交互,程序也可以和网络等媒介交互. 一.IO流的分类 要分类,肯定得站得不同角度来看这个 ...