Nutch:用Java调用,而不是命令行?

前端之家收集整理的这篇文章主要介绍了Nutch:用Java调用,而不是命令行?前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
我是否很厚或者是否真的无法以编程方式通过某些 Java代码调用Apache Nutch?关于如何执行此操作的文档(或指南或教程)在哪里?谷歌让我失望了.所以我实际上尝试了Bing. (是的,我知道,可悲.)想法?提前致谢.

(另外,如果Nutch是一个废话,那么用Java编写的任何其他爬行器在互联网规模上都可以用实际文档证明是可靠的吗?)

解决方法

如果您查看bin / nutch脚本,您将看到它调用与您的命令对应的Java类:
  1. # figure out which class to run
  2. if [ "$COMMAND" = "crawl" ] ; then
  3. CLASS=org.apache.nutch.crawl.Crawl
  4. elif [ "$COMMAND" = "inject" ] ; then
  5. CLASS=org.apache.nutch.crawl.Injector
  6. elif [ "$COMMAND" = "generate" ] ; then
  7. CLASS=org.apache.nutch.crawl.Generator
  8. elif [ "$COMMAND" = "freegen" ] ; then
  9. CLASS=org.apache.nutch.tools.FreeGenerator
  10. elif [ "$COMMAND" = "fetch" ] ; then
  11. CLASS=org.apache.nutch.fetcher.Fetcher
  12. elif [ "$COMMAND" = "fetch2" ] ; then
  13. CLASS=org.apache.nutch.fetcher.Fetcher2
  14. elif [ "$COMMAND" = "parse" ] ; then
  15. CLASS=org.apache.nutch.parse.ParseSegment
  16. elif [ "$COMMAND" = "readdb" ] ; then
  17. CLASS=org.apache.nutch.crawl.CrawlDbReader
  18. elif [ "$COMMAND" = "convdb" ] ; then
  19. CLASS=org.apache.nutch.tools.compat.CrawlDbConverter
  20. elif [ "$COMMAND" = "mergedb" ] ; then
  21. CLASS=org.apache.nutch.crawl.CrawlDbMerger
  22. elif [ "$COMMAND" = "readlinkdb" ] ; then
  23. CLASS=org.apache.nutch.crawl.LinkDbReader
  24. elif [ "$COMMAND" = "readseg" ] ; then
  25. CLASS=org.apache.nutch.segment.SegmentReader
  26. elif [ "$COMMAND" = "segread" ] ; then
  27. echo "[DEPRECATED] Command 'segread' is deprecated,use 'readseg' instead."
  28. CLASS=org.apache.nutch.segment.SegmentReader
  29. elif [ "$COMMAND" = "mergesegs" ] ; then
  30. CLASS=org.apache.nutch.segment.SegmentMerger
  31. elif [ "$COMMAND" = "updatedb" ] ; then
  32. CLASS=org.apache.nutch.crawl.CrawlDb
  33. elif [ "$COMMAND" = "invertlinks" ] ; then
  34. CLASS=org.apache.nutch.crawl.LinkDb
  35. elif [ "$COMMAND" = "mergelinkdb" ] ; then
  36. CLASS=org.apache.nutch.crawl.LinkDbMerger
  37. elif [ "$COMMAND" = "index" ] ; then
  38. CLASS=org.apache.nutch.indexer.Indexer
  39. elif [ "$COMMAND" = "solrindex" ] ; then
  40. CLASS=org.apache.nutch.indexer.solr.SolrIndexer
  41. elif [ "$COMMAND" = "dedup" ] ; then
  42. CLASS=org.apache.nutch.indexer.DeleteDuplicates
  43. elif [ "$COMMAND" = "solrdedup" ] ; then
  44. CLASS=org.apache.nutch.indexer.solr.SolrDeleteDuplicates
  45. elif [ "$COMMAND" = "merge" ] ; then
  46. CLASS=org.apache.nutch.indexer.IndexMerger
  47. elif [ "$COMMAND" = "plugin" ] ; then
  48. CLASS=org.apache.nutch.plugin.PluginRepository
  49. elif [ "$COMMAND" = "server" ] ; then
  50. CLASS='org.apache.nutch.searcher.DistributedSearch$Server'
  51. else
  52. CLASS=$COMMAND
  53. fi
  54.  
  55. # run it
  56. exec "$JAVA" $JAVA_HEAP_MAX $NUTCH_OPTS -classpath "$CLASSPATH" $CLASS "$@"

从那以后,只有查看API docs以及必要时这些类的源代码的问题.

猜你在找的Java相关文章