xml – XSLT:合并具有不同结构和时间表示的两个日志文件

前端之家收集整理的这篇文章主要介绍了xml – XSLT:合并具有不同结构和时间表示的两个日志文件前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
正如Dimitre Novatchev所问,我创建了一个新问题,因为旧问题的某些部分发生了变化.

(链接到旧问题:Merging two different XML log files (trace and messages) using date and timestamp?)

我需要合并两个XML日志文件(最多700MB).一个日志文件包含具有位置更新的跟踪.另一个日志文件包含收到的消息.可以存在多个接收到的消息而不在其间进行位置更新,反之亦然.

这两个日志都有时间戳,包括毫秒(本例中为123):

>跟踪日志使用< date> (例如,14.7.2012 11:08:07.123)
>消息日志使用unix时间戳< timeStamp> (例如,1342264087123)

还有其他< timeStamp>消息日志中包含的元素,但只有路径messageList / Message / originator / originatorPosition / timeStamp中的元素是相关的.

以下结构略微简化,因为省略了诸如“加速”等附加内容.此附加内容只需与其他消息/项目一起复制.

位置轨迹的结构如下所示:

  1. <itemList>
  2. <item>
  3. <date>14.7.2012 12:13:05.123</date>
  4. <FilteredPosition>
  5. <Latitude>51.12235</Latitude>
  6. <Longitude>9.347214</Longitude>
  7. </FilteredPosition>
  8. </item>
  9. <item>
  10. <date>14.7.2012 12:13:07.456</date>
  11. <FilteredPosition>
  12. <Latitude>51.12235</Latitude>
  13. <Longitude>9.347214</Longitude>
  14. </FilteredPosition>
  15. </item>
  16. </itemList>

消息日志的结构如下:

  1. <messageList>
  2. <Message>
  3. <messageId>1234</messageId>
  4. <originator>
  5. <originatorPosition>
  6. <nodeId>2345</nodeId>
  7. <timeStamp>1342264087061</timeStamp>
  8. </originatorPosition>
  9. <senderPosition>
  10. <nodeId>2345</nodeId>
  11. <timeStamp>1342264087234</timeStamp>
  12. </senderPosition>
  13. <medium></medium>
  14. </originator>
  15. <MessagePayload>
  16. <generationTime>
  17. <timeStamp>1342264087</timeStamp>
  18. <milliSec>42</milliSec>
  19. </generationTime>
  20. </MessagePayload>
  21. </Message>
  22. <Message>
  23. <messageId>1234</messageId>
  24. <originator>
  25. <originatorPosition>
  26. <nodeId>2345</nodeId>
  27. <timeStamp>1342264088064</timeStamp>
  28. </originatorPosition>
  29. <senderPosition>
  30. <nodeId>2345</nodeId>
  31. <timeStamp>1342264088254</timeStamp>
  32. </senderPosition>
  33. <medium></medium>
  34. </originator>
  35. <MessagePayload>
  36. <generationTime>
  37. <timeStamp>1342264088</timeStamp>
  38. <milliSec>42</milliSec>
  39. </generationTime>
  40. </MessagePayload>
  41. </Message>
  42. </messageList>

在进行合并时,应该读取时间戳(还要转换/比较“date”和“timestamp”,包括格式为“14.7.2012 11:08:07.123”的毫秒)以及以正确顺序添加的所有位置和消息.

位置数据可以按原样添加.但是,邮件应放在< item>内.标签,< date>应添加标签(基于消息’unix time with milliseconds)和< Message>标签应替换为< m:消息类型=“收到”>标签.这些项目放在根< itemList>内,就像位置跟踪一样.

结果可能如下所示:

  1. <itemList>
  2. <item>
  3. <date>14.7.2012 12:13:05.123</date>
  4. <FilteredPosition>
  5. <Latitude>51.12235</Latitude>
  6. <Longitude>9.347214</Longitude>
  7. </FilteredPosition>
  8. </item>
  9. <item>
  10. <date>14.7.2012 12:13:07.061</date>
  11. <m:Message type="received">
  12. <messageId>1234</messageId>
  13. <originator>
  14. <originatorPosition>
  15. <nodeId>2345</nodeId>
  16. <timeStamp>1342264087061</timeStamp>
  17. </originatorPosition>
  18. <senderPosition>
  19. <nodeId>2345</nodeId>
  20. <timeStamp>1342264087234</timeStamp>
  21. </senderPosition>
  22. <medium></medium>
  23. </originator>
  24. <MessagePayload>
  25. <generationTime>
  26. <timeStamp>1342264087</timeStamp>
  27. <milliSec>63</milliSec>
  28. </generationTime>
  29. </MessagePayload>
  30. </m:Message>
  31. </item>
  32. <item>
  33. <date>14.7.2012 12:13:07.456</date>
  34. <FilteredPosition>
  35. <Latitude>51.12235</Latitude>
  36. <Longitude>9.347214</Longitude>
  37. </FilteredPosition>
  38. </item>
  39. <item>
  40. <date>14.7.2012 12:13:08.064</date>
  41. <m:Message type="received">
  42. <messageId>1234</messageId>
  43. <originator>
  44. <originatorPosition>
  45. <nodeId>2345</nodeId>
  46. <timeStamp>1342264088064</timeStamp>
  47. </originatorPosition>
  48. <senderPosition>
  49. <nodeId>2345</nodeId>
  50. <timeStamp>1342264088254</timeStamp>
  51. </senderPosition>
  52. <medium></medium>
  53. </originator>
  54. <MessagePayload>
  55. <generationTime>
  56. <timeStamp>1342264088</timeStamp>
  57. <milliSec>70</milliSec>
  58. </generationTime>
  59. </MessagePayload>
  60. </m:Message>
  61. </item>
  62. <itemList>

还有一些< item>位置日志文件中不包含时间戳(并且没有“FilteredPosition”)的元素.这些项目可以忽略,不需要复制.

我很感激XSLT代码的任何帮助,因为我对这个主题很新…: – /

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  3. xmlns:xs="http://www.w3.org/2001/XMLSchema"
  4. xmlns:m="http://www.example.com/"
  5. exclude-result-prefixes="xs"
  6. version="2.0">
  7.  
  8. <xsl:output indent="yes" method="xml"/>
  9.  
  10. <!-- The two source-documents. -->
  11. <xsl:variable name="doc1" select="doc('log1.xml')"/>
  12. <xsl:variable name="doc2" select="doc('log2.xml')"/>
  13.  
  14. <!-- Timezone adjustment -->
  15. <xsl:variable name="timezoneAdjustment" select="1"/>
  16.  
  17. <!-- Root template to start the transformation. -->
  18. <xsl:template match="/">
  19. <!-- Transform and collect all the elements -->
  20. <xsl:variable name="data" as="node()*">
  21. <xsl:apply-templates select="$doc1/itemList/item"/>
  22. <xsl:apply-templates select="$doc2/messageList/Message"/>
  23. </xsl:variable>
  24. <!-- Sort by the timestamp,and discard the wrapper. -->
  25. <itemList>
  26. <xsl:for-each select="$data">
  27. <xsl:sort select="@timestamp" data-type="number"/>
  28. <xsl:copy-of select="item"/>
  29. </xsl:for-each>
  30. </itemList>
  31. </xsl:template>
  32.  
  33. <!--
  34. Template to transform <item> elements in the first format.
  35. It just parses the date,and adds a wrapper with the timestamp.
  36. -->
  37. <xsl:template match="item[date]">
  38. <xsl:variable name="dateTimeString" select="date" as="xs:string"/>
  39. <xsl:variable name="datePart" select="substring-before($dateTimeString,' ')"/>
  40. <xsl:variable name="day" select="xs:integer(substring-before($datePart,'.'))"/>
  41. <xsl:variable name="month" select="xs:integer(substring-before(substring-after($datePart,'.'),'.'))"/>
  42. <xsl:variable name="year" select="xs:integer(substring-after(substring-after($datePart,'.'))"/>
  43. <xsl:variable name="timePart" select="substring-after($dateTimeString,' ')"/>
  44. <xsl:variable name="reformatted" select="concat(format-number($year,'0000'),'-',format-number($month,'00'),format-number($day,'T',$timePart)"/>
  45. <xsl:variable name="timestamp" select="( xs:dateTime($reformatted) - xs:dateTime('1970-01-01T00:00:00') - $timezoneAdjustment * xs:dayTimeDuration('PT1H') ) div xs:dayTimeDuration('PT0.001S')"/>
  46. <wrapper timestamp="{$timestamp}">
  47. <xsl:copy-of select="self::*"/>
  48. </wrapper>
  49. </xsl:template>
  50.  
  51. <!--
  52. Template to transform <Message> elements in the second log format.
  53. It generates an item with the date,and wraps it with the timestamp.
  54. -->
  55. <xsl:template match="Message[originator/originatorPosition/timeStamp]">
  56. <xsl:variable name="timestamp" select="originator/originatorPosition/timeStamp" as="xs:integer"/>
  57. <xsl:variable name="date" select="xs:dateTime('1970-01-01T00:00:00') + $timezoneAdjustment * xs:dayTimeDuration('PT1H') + $timestamp * xs:dayTimeDuration('PT0.001S')"/>
  58. <wrapper timestamp="{$timestamp}">
  59. <item>
  60. <date>
  61. <xsl:value-of select="format-dateTime($date,'[D01].[M01].[Y0001] [H01]:[m01]:[s01].[f001]')"/>
  62. </date>
  63. <m:Message type="recieved">
  64. <xsl:copy-of select="*"/>
  65. </m:Message>
  66. </item>
  67. </wrapper>
  68. </xsl:template>
  69.  
  70. </xsl:stylesheet>

编辑:我添加了一个变量用于消息的时区调整.

编辑:修复了属性名称,因此项目将正确排序.

猜你在找的XML相关文章