SAX的一种应用

前端之家收集整理的这篇文章主要介绍了SAX的一种应用前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。

上一篇文章里说到了SAX的快速入门现在让我们来看看它的一个具体应用吧。

现在有这样的一个XML document:

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <store>
  3. <other_tag1 />
  4. <type name="book" type_id="1" />
  5. <other_tag3 />
  6. <bookstore>
  7. <other_tag2 />
  8. <address addr="Shanghai,China" />
  9. <other_tag4 />
  10. <book category="COOKING" title="Everyday Italian" author="Giada De Laurentiis" year="2005" price="30.00" />
  11. <book category="CHILDREN" title="Harry Potter" author="J K. Rowling" year="2005" price="29.99" />
  12. </bookstore>
  13. </store>

我们要利用SAX提取<type name="book" type_id="1" />, <address addr="Shanghai,China" />, <book/>这几个nodes的信息,我们因该怎么做呢?

现在有这样的一个思路:利用SAX,当一遇到node的localName 为“type”,"address","book"的时候,就停下来抓取信息。这时候,我觉得一个方法就是在

startElement里面加上if/ else if/ else 这样的判断,这样虽然很直接明了,但是很傻,因为如果要解析的xml内容少还好,如果要抓取的信息量极大的话,那得

要多少个if/ else if/ else ?!现在我们换一个新的方法来处理这件事情:把需要追踪的元素形成一个“类似XML元素树”,例如,我们要追踪的全部元素为“store”,

“type”,"bookstore","book",树的结构为:

  1. store
  2. |
  3. --------------
  4. | |
  5. type bookstore
  6. |
  7. -----------
  8. | |
  9. address book
  10.  

追踪的代码

  1. root.track("store",new StoreTracker());
  2. root.track("store/type",new TypeTracker());
  3. root.track("store/bookstore",new BookStoreTracker());
  4. root.track("store/bookstore/address",new AddrTracker());
  5. root.track("store/bookstore/book",new BookTracker());
形成“XML元素树”的关键代码,这是TagTracker.java里的一部分:

  1. public void track(String tagName,TagTracker tracker) {
  2. int slashOffset = tagName.indexOf("/");
  3. if(slashOffset < 0) {
  4. trackers.put(tagName,tracker);
  5. } else if(slashOffset == 0) {
  6. // "/a/b" --> "a/b" and continue.
  7. track(tagName.substring(1),tracker);
  8. } else {
  9. String topTagName = tagName.substring(0,slashOffset);
  10. String remainderOfTagName = tagName.substring(slashOffset + 1);
  11. TagTracker child = trackers.get(topTagName);
  12. if(child == null) {
  13. child = new TagTracker();
  14. trackers.put(topTagName,child);
  15. }
  16. child.track(remainderOfTagName,tracker);
  17. }
  18. }

这样,在整个"root" element下有"store"节点,在“store”下有“type”和“bookstore”节点,在"bookstore"下又有“address”和"book"节点,形成了我们实际需要追踪的

“元素树”.

我们知道,SAX是通过startElement(String namespaceURI,String localName,String qName,

Attributes attr)和 endElement(String namespaceURI,String qName)来获取元素的uri,localName,attributes等诸如信息的(如果要获取元素里的文本信息,需要调用characters(char[] ch,int start,int length))。每一个元素都应该有一个特定的的方法来收集这个元素的信息(包括元素里的文本信息):

  1. // get information of the element "type"
  2. private class TypeTracker extends TagTracker {
  3. public TypeTracker() {
  4. }
  5. @Override
  6. public void onStart(String namespaceURI,Attributes attr) throws Exception {
  7. String name = attr.getValue("name");
  8. String typeId = attr.getValue("type_id");
  9. // handle these info. ...
  10. }
  11. @Override
  12. public void onEnd(String namespaceURI,CharArrayWriter contents) {
  13. // get the characters data inside the element
  14. String text = contents.toString();
  15. // handle this text...
  16. }
  17. }
而SAX的startElement和endElement方法会分别调用上述的两个方法,把namespaceURI,qName,attributes等信息传递过去

好了,现在让我们来看看,是怎么让SAX挑出我们所需要追踪的元素并解析的吧(自动忽略其他的元素)。

TagTracker.java的全部代码

  1. package com.desmond.xml.sax;
  2.  
  3. import java.io.CharArrayWriter;
  4. import java.util.Hashtable;
  5. import java.util.Stack;
  6.  
  7. import org.apache.commons.logging.Log;
  8. import org.apache.commons.logging.LogFactory;
  9. import org.xml.sax.Attributes;
  10.  
  11. public class TagTracker {
  12. private static Log log = LogFactory.getLog(TagTracker.class);
  13.  
  14. private Hashtable<String,TagTracker> trackers = new Hashtable<String,TagTracker>();
  15.  
  16. // use to skip these un-choiced elements
  17. private static SkippingTagTracker skip = new SkippingTagTracker();
  18.  
  19. public TagTracker() {
  20.  
  21. }
  22. /**
  23. * track all elements need to be tracked.
  24. * @param tagName the absolute path of the tracked element
  25. * @param tracker the detail handler to parse a special element
  26. */
  27. public void track(String tagName,TagTracker tracker) {
  28. int slashOffset = tagName.indexOf("/");
  29.  
  30. if (slashOffset < 0) {
  31. // if it is a simple tag name (no "/" sperators) simple add it.
  32. trackers.put(tagName,tracker);
  33. } else if (slashOffset == 0) {
  34. // "/a/b" --> "a/b" and continue.
  35. track(tagName.substring(1),slashOffset);
  36. String remainderOfTagName = tagName.substring(slashOffset + 1);
  37. TagTracker child = trackers.get(topTagName);
  38. if (child == null) {
  39. child = new TagTracker();
  40. trackers.put(topTagName,child);
  41. }
  42.  
  43. child.track(remainderOfTagName,tracker);
  44. }
  45. }
  46. /**
  47. * start to parse a element,which will be invoked by SAX's startElement.
  48. * @param namespaceURI
  49. * @param localName
  50. * @param qName
  51. * @param attr
  52. * @param tagStack "tracked element tree"
  53. * @throws Exception
  54. */
  55. public void startElement(String namespaceURI,Attributes attr,Stack<TagTracker> tagStack)
  56. throws Exception {
  57. TagTracker tracker = trackers.get(localName);
  58.  
  59. // not found this tag track.
  60. if (tracker == null) {
  61. log.debug("Skipping tag:[" + localName + "]");
  62. tagStack.push(skip);
  63. } else {
  64. log.debug("Tracking tag:[" + localName + "]");
  65. onDeactivate();
  66. tracker.onStart(namespaceURI,attr);
  67. tagStack.push(tracker);
  68. }
  69. }
  70. /**
  71. * end to parse a element,which will be invoked by SAX's endElement.
  72. * @param namespaceURI
  73. * @param localName
  74. * @param qName
  75. * @param contents
  76. * @param tagStack current element
  77. * @throws Exception
  78. */
  79. public void endElement(String namespaceURI,CharArrayWriter contents,Stack tagStack) throws Exception {
  80. log.debug("Finished tracking tag:[" + localName + "]");
  81. try {
  82. onEnd(namespaceURI,contents);
  83. } catch (Exception e) {
  84. e.printStackTrace();
  85. throw e;
  86. }
  87.  
  88. // clean up the stack
  89. tagStack.pop();
  90.  
  91. // send the reactivate event
  92. TagTracker activeTracker = (TagTracker) tagStack.peek();
  93. if (activeTracker != null) {
  94. log.debug("Reactivating pervIoUs tag tracker.");
  95. activeTracker.onReactivate();
  96. }
  97. }
  98. /**
  99. * detail method to start to parse the special element.
  100. * @param namespaceURI
  101. * @param localName
  102. * @param qName
  103. * @param attr
  104. * @throws Exception
  105. */
  106. public void onStart(String namespaceURI,Attributes attr) throws Exception {
  107. }
  108. public void onDeactivate() throws Exception {
  109. }
  110. /**
  111. * detail method to end to parse the special element.
  112. * @param namespaceURI
  113. * @param localName
  114. * @param qName
  115. * @param contents
  116. */
  117. public void onEnd(String namespaceURI,CharArrayWriter contents) {
  118. }
  119.  
  120. public void onReactivate() throws Exception {
  121. }
  122. }
在startElement中,我们会去“元素树”中选择当前的元素,从而去判断当前的元素释放应该被解析:
  1. TagTracker tracker = trackers.get(localName);
  2.  
  3. // not found this tag track.
  4. if (tracker == null) {
  5. log.debug("Skipping tag:[" + localName + "]");
  6. tagStack.push(skip);
  7. } else {
  8. log.debug("Tracking tag:[" + localName + "]");
  9. onDeactivate();
  10. tracker.onStart(namespaceURI,attr);
  11. tagStack.push(tracker);
  12. }

如果tracker为null,说明这个元素不是我们想要解析的那些,因此要"跳过",如何去跳过,这里用到了另一个类SkippingTagTracker,它所做的事情就是去跳过这个

元素,代码如下:

  1. package com.desmond.xml.sax;
  2.  
  3. import java.util.Stack;
  4.  
  5. import org.apache.commons.logging.Log;
  6. import org.apache.commons.logging.LogFactory;
  7. import org.xml.sax.Attributes;
  8.  
  9. public class SkippingTagTracker extends TagTracker {
  10. private static Log log = LogFactory.getLog(SkippingTagTracker.class);
  11. public void startElement(String namespaceURI,Stack tagStack) {
  12. log.debug("Skipping tag[" + localName + "]...");
  13. tagStack.push(this);
  14. }
  15. public void endElement(String namespaceURI,Stack tagStack) {
  16. log.debug("Finished skipping tag:[" + localName + "]");
  17. tagStack.pop();
  18. }
  19. }
如果tracker有值,我们就要开始解析这个元素了。这时,我们调用前面提到的”一个特定的的方法来收集这个元素的信息“,即:
  1. tracker.onStart(namespaceURI,attr);
完了,之后我们把当前这个元素的tracker压入栈中,如果这个元素没有子元素,那么它将会在endElement中被抛出栈顶。如果有的话,先处理它的子

元素,等所有的子元素都处理完了,才调用endElement结束这个元素(这个也是SAX处理元素的规则,这里只是用到了这一点而已)。

综上所述,整个事件的处理流程是:使用TagTracker追踪所需要的元素-------> 利用TagTracker的 track方法递归调用形成”元素树“-------> 利用这些

”元素树“去判断当前的元素是不是应该被解析-------> 不被解析就跳过,被解析,再去判断他的子元素。一种这样递归地完成真个解析过程。

附全部代码(SaxMapper.java/ SkippingTagTracker.java/ TagTracker.java/ TestMain.java,共四个类).

SaxMapper.java

  1. package com.desmond.xml.sax;
  2.  
  3. import java.io.ByteArrayInputStream;
  4. import java.io.CharArrayWriter;
  5. import java.io.File;
  6. import java.io.IOException;
  7. import java.util.Stack;
  8.  
  9. import org.apache.commons.configuration.Configuration;
  10. import org.apache.commons.io.FileUtils;
  11. import org.apache.commons.logging.Log;
  12. import org.apache.commons.logging.LogFactory;
  13. import org.xml.sax.Attributes;
  14. import org.xml.sax.InputSource;
  15. import org.xml.sax.SAXException;
  16. import org.xml.sax.XMLReader;
  17. import org.xml.sax.helpers.DefaultHandler;
  18. import org.xml.sax.helpers.XMLReaderFactory;
  19.  
  20. public class SaxMapper extends DefaultHandler{
  21. private static final Log log = LogFactory.getLog(SaxMapper.class);
  22. private String file = "";
  23. protected Stack<TagTracker> tagStack = new Stack<TagTracker>();
  24. protected XMLReader xr;
  25. protected CharArrayWriter contents = new CharArrayWriter();
  26. protected boolean parSEOnly;
  27. protected Configuration config;
  28. public SaxMapper() throws Exception {
  29. try {
  30. xr = XMLReaderFactory.createXMLReader();
  31. } catch (SAXException e) {
  32. e.printStackTrace();
  33. }
  34. log.info("Creating the tag tracker network.");
  35. tagStack.push(createTagTrackerNetwork());
  36. log.info("Tag Tracker network created.");
  37. }
  38.  
  39. @Override
  40. public void startElement(String namespaceURI,Attributes attr) throws SAXException {
  41. contents.reset();
  42. TagTracker ativeTracker = (TagTracker) tagStack.peek();
  43. try {
  44. ativeTracker.startElement(namespaceURI,attr,tagStack);
  45. } catch(Exception e) {
  46. e.printStackTrace();
  47. throw new SAXException(e);
  48. }
  49. }
  50.  
  51. @Override
  52. public void endElement(String namespaceURI,String qName)
  53. throws SAXException {
  54. TagTracker activeTracker = (TagTracker) tagStack.peek();
  55. try {
  56. activeTracker.endElement(namespaceURI,contents,tagStack);
  57. } catch(Exception e) {
  58. e.printStackTrace();
  59. throw new SAXException(e);
  60. }
  61. }
  62.  
  63. @Override
  64. public void characters(char[] ch,int length)
  65. throws SAXException {
  66. contents.write(ch,start,length);
  67. }
  68. protected InputSource getSource(String fileName) throws IOException {
  69. File xmlFile = new File(fileName);
  70. byte[] xmlBytes = FileUtils.readFileToByteArray(xmlFile);
  71. return new InputSource(new ByteArrayInputStream(xmlBytes));
  72. }
  73. protected void parseXML() throws IOException,Exception {
  74. parse(getSource(getFileName()));
  75. }
  76. protected void parse(InputSource in) throws Exception{
  77. parSEOnly = true;
  78. xr.setContentHandler(this);
  79. log.info("start to parse...");
  80. xr.parse(in);
  81. log.info("end to parse...");
  82. }
  83. protected TagTracker createTagTrackerNetwork() {
  84. TagTracker root = new TagTracker();
  85. root.track("store",new BookTracker());
  86. return root;
  87. }
  88. protected String getFileName() {
  89. return file;
  90. }
  91. protected void setFileName(String fileName) {
  92. this.file = fileName;
  93. }
  94. private class StoreTracker extends TagTracker {
  95. public StoreTracker() {
  96. }
  97.  
  98. @Override
  99. public void onStart(String namespaceURI,Attributes attr) throws Exception {
  100. }
  101.  
  102. @Override
  103. public void onEnd(String namespaceURI,CharArrayWriter contents) {
  104. }
  105. }
  106. // get information of the element "type"
  107. private class TypeTracker extends TagTracker {
  108. public TypeTracker() {
  109. }
  110. @Override
  111. public void onStart(String namespaceURI,CharArrayWriter contents) {
  112. // get the characters data inside the element
  113. String text = contents.toString();
  114. // handle this text...
  115. }
  116. }
  117. private class BookStoreTracker extends TagTracker {
  118. public BookStoreTracker() {
  119. }
  120. }
  121. private class AddrTracker extends TagTracker {
  122. public AddrTracker() {
  123. }
  124. @Override
  125. public void onStart(String namespaceURI,CharArrayWriter contents) {
  126. }
  127. }
  128. private class BookTracker extends TagTracker {
  129. public BookTracker() {
  130. }
  131. @Override
  132. public void onStart(String namespaceURI,CharArrayWriter contents) {
  133. }
  134. }
  135.  
  136. }

SkippingTagTracker.java
  1. package com.desmond.xml.sax;
  2.  
  3. import java.util.Stack;
  4.  
  5. import org.apache.commons.logging.Log;
  6. import org.apache.commons.logging.LogFactory;
  7. import org.xml.sax.Attributes;
  8.  
  9. public class SkippingTagTracker extends TagTracker {
  10. private static Log log = LogFactory.getLog(SkippingTagTracker.class);
  11. public void startElement(String namespaceURI,Stack tagStack) {
  12. log.debug("Finished skipping tag:[" + localName + "]");
  13. tagStack.pop();
  14. }
  15. }

TagTracker
  1. package com.desmond.xml.sax;
  2.  
  3. import java.io.CharArrayWriter;
  4. import java.util.Hashtable;
  5. import java.util.Stack;
  6.  
  7. import org.apache.commons.logging.Log;
  8. import org.apache.commons.logging.LogFactory;
  9. import org.xml.sax.Attributes;
  10.  
  11. public class TagTracker {
  12. private static Log log = LogFactory.getLog(TagTracker.class);
  13.  
  14. private Hashtable<String,CharArrayWriter contents) {
  15. }
  16.  
  17. public void onReactivate() throws Exception {
  18. }
  19. }

TestMain.java
  1. package com.desmond.xml.sax;
  2.  
  3. public class TestMain {
  4.  
  5. /**
  6. * @param args
  7. * @throws Exception
  8. */
  9. public static void main(String[] args) throws Exception {
  10. SaxMapper mapper = new SaxMapper();
  11. if(args.length > 0) {
  12. mapper.setFileName(args[0]);
  13. mapper.parseXML();
  14. } else {
  15. System.out.println("no file configurated! please configurate it.");
  16. }
  17. }
  18.  
  19. }

猜你在找的XML相关文章