正则表达式获取HTML标记innerHTML

前端之家收集整理的这篇文章主要介绍了正则表达式获取HTML标记innerHTML前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。

在参考下: http://www.imkevinyang.com/2010/07/javajs%e5%a6%82%e4%bd%95%e4%bd%bf%e7%94%a8%e6%ad%a3%e5%88%99%e8%a1%a8%e8%be%be%e5%bc%8f%e5%8c%b9%e9%85%8d%e5%b5%8c%e5%a5%97html%e6%a0%87%e7%ad%be.html

的基础上,根据自己的需求加以改进,获取标记中的所有内容信息

测试数据:

<div style="background-color:gray;" id="footer">
<a id="gotop" href="#" onclick="MGJS.goTop();return false;">Top</a>
<a id="powered" href="http://wordpress.org/">WordPress</a>
<div id="copyright">
Copyright &copy; 2009 简单生活 —— Kevin Yang的博客 </div>
<div id="themeinfo">
Theme by <a href="http://www.neoease.com/">mg12</a>.
Valid <a href="http://validator.w3.org/check?uri=referer">XHTML 1.1</a>
and <a href="http://jigsaw.w3.org/css-validator/">CSS 3</a>
</div>
<div/>
<p/>
</div>

修改过后的正则:

  1. <(?<HtmlTag>[\w]+)[^>]*\s[iI][dD]=(?<Quote>["']?)footer(?(Quote)\k<Quote>)[^>]*?(/>|>(?<innerHtml>((?<Nested><\k<HtmlTag>[^>]*>)|</\k<HtmlTag>>(?<-Nested>)|[\s\S]*?)*)</\k<HtmlTag>>)

主要改正二处:

1,加了命名组innerHTML获取内部所有内容

2,最后部分>)|[.*?)*)</\k<HtmlTag>>)改成了[\s\S]以适应多行


C# 调用代码

  1. // Regex match
  2. RegexOptions options = RegexOptions.None;
  3. Regex regex = new Regex(@"<(?<HtmlTag>[\w]+)[^>]*\s[iI][dD]=(?<Quote>[""']?)footer(?(Quote)\k<Quote>)[^>]*?(/>|>(?<innerHtml>((?<Nested><\k<HtmlTag>[^>]*>)|</\k<HtmlTag>>(?<-Nested>)|[\s\S]*?)*)</\k<HtmlTag>>)",options);
  4. string input = @"<div style=""background-color:gray;"" id=""footer"">
  5. <a id=""gotop"" href=""#"" onclick=""MGJS.goTop();return false;"">Top</a>
  6. <a id=""powered"" href=""http://wordpress.org/"">wordpress</a>
  7. <div id=""copyright"">
  8. Copyright &copy; 2009 简单生活 —— Kevin Yang的博客 </div>
  9. <div id=""themeinfo"">
  10. Theme by <a href=""http://www.neoease.com/"">mg12</a>.
  11. Valid <a href=""http://validator.w3.org/check?uri=referer"">XHTML 1.1</a>
  12. and <a href=""http://jigsaw.w3.org/css-validator/"">CSS 3</a>
  13. </div>
  14. <div/>
  15. <p/>
  16. </div> ";
  17.  
  18. // Check for match
  19. bool isMatch = regex.IsMatch(input);
  20. if (isMatch)
  21. {
  22. // TODO: Do something with result
  23. MessageBox.Show(input,"IsMatch");
  24. }
  25.  
  26. // Get match
  27. Match match = regex.Match(input);
  28.  
  29. // Get matches
  30. MatchCollection matches = regex.Matches(input);
  31. for (int i = 0; i != matches.Count; ++i)
  32. {
  33. // TODO: Do something with result
  34. MessageBox.Show(matches[i].Value,"Match");
  35. }
  36.  
  37. // Numbered groups
  38. for (int i = 0; i != match.Groups.Count; ++i)
  39. {
  40. Group group = match.Groups[i];
  41.  
  42. // TODO: Do something with result
  43. MessageBox.Show(group.Value,"Group: " + i);
  44. }
  45.  
  46. // Named groups
  47. string groupA = match.Groups["HtmlTag"].Value;
  48. string groupB = match.Groups["innerHtml"].Value;
  49.  
  50. // TODO: Do something with result
  51. MessageBox.Show(groupA,"Group: HtmlTag");
  52. MessageBox.Show(groupB,"Group: innerHtml");






猜你在找的正则表达式相关文章