centos上 java使用Tesseract进行ocr识别

前端之家收集整理的这篇文章主要介绍了centos上 java使用Tesseract进行ocr识别前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。

1、安装过程:

安装ocr

yum install tesseract-ocr

查找中文
yum search tesseract-ocr | grep sim

安装中文
yum install tesseract-langpack-chi_sim

安装版本信息:

? test-ugc-api01 tesseract tesseract -v
tesseract 3.04.00
leptonica-1.72
libgif 4.1.6(?) : libjpeg 6b (libjpeg-turbo 1.2.90) : libpng 1.5.13 : libtiff 4.0.3 : zlib 1.2.7 : libwebp 0.3.0

2、java开发

注意版本匹配:3.04.00,采用tess4j

  1. <
  2. dependency
  3. >
  4. <
  5. groupId
  6. >
  7. net.sourceforge.tess4j
  8. </
  9. groupId
  10. >
  11. <
  12. artifactId
  13. >
  14. tess4j
  15. </
  16. artifactId
  17. >
  18. <
  19. version
  20. >
  21. 3.0
  22. .
  23. 0
  24. </
  25. version
  26. >
  27. </
  28. dependency
  29. >

简单测试代码

  1. public
  2. String ocr(String url) { String datapath
  3. =
  4. "
  5. /usr/share/tesseract/
  6. "
  7. ; String language
  8. =
  9. "
  10. chi_sim
  11. "
  12. ;
  13. //
  14. 进行相关的检测
  15. try
  16. { url
  17. =
  18. url.trim(); System.
  19. out
  20. .println(
  21. "
  22. url is:
  23. "
  24. +
  25. url); URL targetUrl
  26. =
  27. new
  28. URL(url); BufferedImage image
  29. =
  30. ImageIO.read(targetUrl); ByteBuffer buf
  31. =
  32. ImageIOHelper.convertImageData(image);
  33. int
  34. bpp
  35. =
  36. image.getColorModel().getPixelSize();
  37. int
  38. bytespp
  39. =
  40. bpp
  41. /
  42. 8
  43. ;
  44. int
  45. bytespl
  46. =
  47. (
  48. int
  49. ) Math.ceil(image.getWidth()
  50. *
  51. bpp
  52. /
  53. 8.0
  54. ); System.
  55. out
  56. .println(
  57. "
  58. bpp is:
  59. "
  60. +
  61. bpp
  62. +
  63. "
  64. ;bytespp is:
  65. "
  66. +
  67. bytespp
  68. +
  69. "
  70. ;bytespl is:
  71. "
  72. +
  73. bytespl);
  74. //
  75. 初始化
  76. ITessAPI.TessBaseAPI handle
  77. =
  78. TessAPI1.TessBaseAPICreate(); TessAPI1.TessBaseAPIInit3(handle,datapath,language); TessAPI1.TessBaseAPISetPageSegMode(handle,ITessAPI.TessPageSegMode.PSM_AUTO); Pointer utf8Text
  79. =
  80. TessAPI1.TessBaseAPIRect(handle,buf,bytespp,bytespl,
  81. 0
  82. ,image.getWidth(),image.getHeight()); String result
  83. =
  84. utf8Text.getString(
  85. 0
  86. ); TessAPI1.TessDeleteText(utf8Text); TessAPI1.TessBaseAPIDelete(handle); System.
  87. out
  88. .println(
  89. "
  90. ==============================================
  91. "
  92. ); System.
  93. out
  94. .println(
  95. "
  96. result is:
  97. "
  98. +
  99. result); System.
  100. out
  101. .println(
  102. "
  103. ==============================================
  104. "
  105. );
  106. if
  107. (result.equalsIgnoreCase(
  108. ""
  109. )){ System.
  110. out
  111. .println(
  112. "
  113. no detected words!!
  114. "
  115. ); }
  116. return
  117. result; }
  118. catch
  119. (Exception ex){ ex.printStackTrace(); }
  120. return
  121. "
  122. no detected words!!
  123. "
  124. ; }

注意:datapath要设置在tessdata的上一级目录

3、yum安装所在目录查询相关命令

  1. #查询相关包 test
  2. -
  3. ugc
  4. -
  5. api01 tesseract rpm
  6. -
  7. qa
  8. |
  9. grep tesseract tesseract
  10. -
  11. langpack
  12. -
  13. chi_sim
  14. -
  15. 3.04
  16. .
  17. 00
  18. -
  19. 3
  20. .el7.noarch tesseract
  21. -
  22. 3.04
  23. .
  24. 00
  25. -
  26. 3
  27. .el7.x86_64 #查询包具体安装位置 test
  28. -
  29. ugc
  30. -
  31. api01 tesseract rpm
  32. -
  33. ql tesseract
  34. -
  35. 3.04
  36. .
  37. 00
  38. -
  39. 3
  40. .el7.x86_64
  41. /
  42. usr
  43. /
  44. bin
  45. /
  46. ambiguous_words
  47. /
  48. usr
  49. /
  50. bin
  51. /
  52. classifier_tester
  53. /
  54. usr
  55. /
  56. bin
  57. /
  58. cntraining
  59. /
  60. usr
  61. /
  62. bin
  63. /
  64. combine_tessdata
  65. /
  66. usr
  67. /
  68. bin
  69. /
  70. dawg2wordlist
  71. /
  72. usr
  73. /
  74. bin
  75. /
  76. mftraining
  77. /
  78. usr
  79. /
  80. bin
  81. /
  82. set_unicharset_properties
  83. /
  84. usr
  85. /
  86. bin
  87. /
  88. shapeclustering
  89. /
  90. usr
  91. /
  92. bin
  93. /
  94. tesseract
  95. /
  96. usr
  97. /
  98. bin
  99. /
  100. text2image
  101. /
  102. usr
  103. /
  104. bin
  105. /
  106. unicharset_extractor
  107. /
  108. usr
  109. /
  110. bin
  111. /
  112. wordlist2dawg
  113. /
  114. usr
  115. /
  116. lib64
  117. /
  118. libtesseract.so.
  119. 3
  120. /
  121. usr
  122. /
  123. lib64
  124. /
  125. libtesseract.so.
  126. 3.0
  127. .
  128. 4
  129. /
  130. usr
  131. /
  132. share
  133. /
  134. doc
  135. /
  136. tesseract
  137. -
  138. 3.04
  139. .
  140. 00
  141. /
  142. usr
  143. /
  144. share
  145. /
  146. doc
  147. /
  148. tesseract
  149. -
  150. 3.04
  151. .
  152. 00
  153. /
  154. AUTHORS
  155. /
  156. usr
  157. /
  158. share
  159. /
  160. doc
  161. /
  162. tesseract
  163. -
  164. 3.04
  165. .
  166. 00
  167. /
  168. ChangeLog
  169. /
  170. usr
  171. /
  172. share
  173. /
  174. doc
  175. /
  176. tesseract
  177. -
  178. 3.04
  179. .
  180. 00
  181. /
  182. NEWS
  183. /
  184. usr
  185. /
  186. share
  187. /
  188. doc
  189. /
  190. tesseract
  191. -
  192. 3.04
  193. .
  194. 00
  195. /
  196. README
  197. /
  198. usr
  199. /
  200. share
  201. /
  202. doc
  203. /
  204. tesseract
  205. -
  206. 3.04
  207. .
  208. 00
  209. /
  210. eurotext.tif
  211. /
  212. usr
  213. /
  214. share
  215. /
  216. doc
  217. /
  218. tesseract
  219. -
  220. 3.04
  221. .
  222. 00
  223. /
  224. phototest.tif
  225. /
  226. usr
  227. /
  228. share
  229. /
  230. licenses
  231. /
  232. tesseract
  233. -
  234. 3.04
  235. .
  236. 00
  237. /
  238. usr
  239. /
  240. share
  241. /
  242. licenses
  243. /
  244. tesseract
  245. -
  246. 3.04
  247. .
  248. 00
  249. /
  250. COPYING
  251. /
  252. usr
  253. /
  254. share
  255. /
  256. man
  257. /
  258. man1
  259. /
  260. ambiguous_words.
  261. 1
  262. .gz
  263. /
  264. usr
  265. /
  266. share
  267. /
  268. man
  269. /
  270. man1
  271. /
  272. cntraining.
  273. 1
  274. .gz
  275. /
  276. usr
  277. /
  278. share
  279. /
  280. man
  281. /
  282. man1
  283. /
  284. combine_tessdata.
  285. 1
  286. .gz
  287. /
  288. usr
  289. /
  290. share
  291. /
  292. man
  293. /
  294. man1
  295. /
  296. dawg2wordlist.
  297. 1
  298. .gz
  299. /
  300. usr
  301. /
  302. share
  303. /
  304. man
  305. /
  306. man1
  307. /
  308. mftraining.
  309. 1
  310. .gz
  311. /
  312. usr
  313. /
  314. share
  315. /
  316. man
  317. /
  318. man1
  319. /
  320. shapeclustering.
  321. 1
  322. .gz
  323. /
  324. usr
  325. /
  326. share
  327. /
  328. man
  329. /
  330. man1
  331. /
  332. tesseract.
  333. 1
  334. .gz
  335. /
  336. usr
  337. /
  338. share
  339. /
  340. man
  341. /
  342. man1
  343. /
  344. unicharset_extractor.
  345. 1
  346. .gz
  347. /
  348. usr
  349. /
  350. share
  351. /
  352. man
  353. /
  354. man1
  355. /
  356. wordlist2dawg.
  357. 1
  358. .gz
  359. /
  360. usr
  361. /
  362. share
  363. /
  364. man
  365. /
  366. man5
  367. /
  368. unicharambigs.
  369. 5
  370. .gz
  371. /
  372. usr
  373. /
  374. share
  375. /
  376. man
  377. /
  378. man5
  379. /
  380. unicharset.
  381. 5
  382. .gz
  383. /
  384. usr
  385. /
  386. share
  387. /
  388. tesseract
  389. /
  390. usr
  391. /
  392. share
  393. /
  394. tesseract
  395. /
  396. tessdata
  397. /
  398. usr
  399. /
  400. share
  401. /
  402. tesseract
  403. /
  404. tessdata
  405. /
  406. configs
  407. /
  408. usr
  409. /
  410. share
  411. /
  412. tesseract
  413. /
  414. tessdata
  415. /
  416. configs
  417. /
  418. ambigs.train
  419. /
  420. usr
  421. /
  422. share
  423. /
  424. tesseract
  425. /
  426. tessdata
  427. /
  428. configs
  429. /
  430. api_config
  431. /
  432. usr
  433. /
  434. share
  435. /
  436. tesseract
  437. /
  438. tessdata
  439. /
  440. configs
  441. /
  442. bigram
  443. /
  444. usr
  445. /
  446. share
  447. /
  448. tesseract
  449. /
  450. tessdata
  451. /
  452. configs
  453. /
  454. Box.train
  455. /
  456. usr
  457. /
  458. share
  459. /
  460. tesseract
  461. /
  462. tessdata
  463. /
  464. configs
  465. /
  466. Box.train.stderr
  467. /
  468. usr
  469. /
  470. share
  471. /
  472. tesseract
  473. /
  474. tessdata
  475. /
  476. configs
  477. /
  478. digits
  479. /
  480. usr
  481. /
  482. share
  483. /
  484. tesseract
  485. /
  486. tessdata
  487. /
  488. configs
  489. /
  490. hocr
  491. /
  492. usr
  493. /
  494. share
  495. /
  496. tesseract
  497. /
  498. tessdata
  499. /
  500. configs
  501. /
  502. inter
  503. /
  504. usr
  505. /
  506. share
  507. /
  508. tesseract
  509. /
  510. tessdata
  511. /
  512. configs
  513. /
  514. kannada
  515. /
  516. usr
  517. /
  518. share
  519. /
  520. tesseract
  521. /
  522. tessdata
  523. /
  524. configs
  525. /
  526. lineBox
  527. /
  528. usr
  529. /
  530. share
  531. /
  532. tesseract
  533. /
  534. tessdata
  535. /
  536. configs
  537. /
  538. logfile
  539. /
  540. usr
  541. /
  542. share
  543. /
  544. tesseract
  545. /
  546. tessdata
  547. /
  548. configs
  549. /
  550. makeBox
  551. /
  552. usr
  553. /
  554. share
  555. /
  556. tesseract
  557. /
  558. tessdata
  559. /
  560. configs
  561. /
  562. pdf
  563. /
  564. usr
  565. /
  566. share
  567. /
  568. tesseract
  569. /
  570. tessdata
  571. /
  572. configs
  573. /
  574. quiet
  575. /
  576. usr
  577. /
  578. share
  579. /
  580. tesseract
  581. /
  582. tessdata
  583. /
  584. configs
  585. /
  586. reBox
  587. /
  588. usr
  589. /
  590. share
  591. /
  592. tesseract
  593. /
  594. tessdata
  595. /
  596. configs
  597. /
  598. strokewidth
  599. /
  600. usr
  601. /
  602. share
  603. /
  604. tesseract
  605. /
  606. tessdata
  607. /
  608. configs
  609. /
  610. unlv
  611. /
  612. usr
  613. /
  614. share
  615. /
  616. tesseract
  617. /
  618. tessdata
  619. /
  620. eng.cube.bigrams
  621. /
  622. usr
  623. /
  624. share
  625. /
  626. tesseract
  627. /
  628. tessdata
  629. /
  630. eng.cube.fold
  631. /
  632. usr
  633. /
  634. share
  635. /
  636. tesseract
  637. /
  638. tessdata
  639. /
  640. eng.cube.lm
  641. /
  642. usr
  643. /
  644. share
  645. /
  646. tesseract
  647. /
  648. tessdata
  649. /
  650. eng.cube.nn
  651. /
  652. usr
  653. /
  654. share
  655. /
  656. tesseract
  657. /
  658. tessdata
  659. /
  660. eng.cube.
  661. params
  662. /
  663. usr
  664. /
  665. share
  666. /
  667. tesseract
  668. /
  669. tessdata
  670. /
  671. eng.cube.size
  672. /
  673. usr
  674. /
  675. share
  676. /
  677. tesseract
  678. /
  679. tessdata
  680. /
  681. eng.cube.word
  682. -
  683. freq
  684. /
  685. usr
  686. /
  687. share
  688. /
  689. tesseract
  690. /
  691. tessdata
  692. /
  693. eng.tesseract_cube.nn
  694. /
  695. usr
  696. /
  697. share
  698. /
  699. tesseract
  700. /
  701. tessdata
  702. /
  703. eng.traineddata
  704. /
  705. usr
  706. /
  707. share
  708. /
  709. tesseract
  710. /
  711. tessdata
  712. /
  713. pdf.ttf
  714. /
  715. usr
  716. /
  717. share
  718. /
  719. tesseract
  720. /
  721. tessdata
  722. /
  723. tessconfigs
  724. /
  725. usr
  726. /
  727. share
  728. /
  729. tesseract
  730. /
  731. tessdata
  732. /
  733. tessconfigs
  734. /
  735. batch
  736. /
  737. usr
  738. /
  739. share
  740. /
  741. tesseract
  742. /
  743. tessdata
  744. /
  745. tessconfigs
  746. /
  747. batch.nochop
  748. /
  749. usr
  750. /
  751. share
  752. /
  753. tesseract
  754. /
  755. tessdata
  756. /
  757. tessconfigs
  758. /
  759. matdemo
  760. /
  761. usr
  762. /
  763. share
  764. /
  765. tesseract
  766. /
  767. tessdata
  768. /
  769. tessconfigs
  770. /
  771. msdemo
  772. /
  773. usr
  774. /
  775. share
  776. /
  777. tesseract
  778. /
  779. tessdata
  780. /
  781. tessconfigs
  782. /
  783. nobatch
  784. /
  785. usr
  786. /
  787. share
  788. /
  789. tesseract
  790. /
  791. tessdata
  792. /
  793. tessconfigs
  794. /
  795. segdemo

查看.so文件接口

nm -D xxx.so

猜你在找的CentOS相关文章