by年分组

2024-05-10 • 问答

首先是总体目标：我有一个全球日平均海面温度阵列：

>>> sst
<xarray.DataArray 'sst' (time: 10957,lat: 720,lon: 1440)>
dask.array<rechunk-merge,shape=(10957,720,1440),dtype=float32,chunksize=(10957,1,1),chunktype=numpy.ndarray>
Coordinates:
  * lon      (lon) float32 0.125 0.375 0.625 0.875 ... 359.375 359.625 359.875
  * lat      (lat) float32 -89.875 -89.625 -89.375 ... 89.375 89.625 89.875
  * time     (time) datetime64[ns] 1989-01-01 1989-01-02 ... 2018-12-31
Attributes:
    long_name:    Daily Sea Surface Temperature

我需要找到的是热浪，热浪定义为“一个5天或更长时间的时段，其中给定位置的温度高于该位置和一年中第90个百分位。”

此定义希望我将每个日期与其他年份的相同日期进行比较，所以我不能只使用time.dayofyear，但这没关系。我创建了一个新的数组monthday：

>>> monthday = xr.DataArray(
>>>     sst.time.dt.month*100+sst.time.dt.day,>>>     name='monthday',dims='time',coords={'time':sst['time']}
>>> )
>>> sst['monthday'] = monthday

现在我可以按monthday

分组了

我目前的问题是the年。如您所见，该数据集中只有7个leap年，而仅从7个值中获得第90个百分位数是没有意义的。

因此，为了计算2月29日的第90个百分位数，决定对于非-年，我们应该使用2月28日和3月1日的平均值。

简短地创建一个新数组，每个水平网格点的尺寸为(year,monthday)，然后手动填充这些值，最后运行groupby和percentile，我不知道该怎么做。

有什么建议可以使其更优雅（最好是一种保留使用dask进行并行化的能力的方法）？

谢谢

by年分组

chenmin1258 回答：by年分组

大家都在问