在以下两个用例下,xarray / dask中的分块行为是否应类似?
(a)使用chunks
选项从netcdf文件打开数据集时;
(b)使用Dataset.chunk
方法重新分块现有数据集时。
我对在不同维度上进行切片的效果感兴趣。就我而言,效果是完全不同的,请比较下面的( Case1 )和( Case3 ):
(案例1):打开沿station
维度只有一个块的数据集(快速切片一次)
In [1]: import xarray as xr
In [2]: dset = xr.open_dataset(
...: "/tmp/spectra.nc",...: chunks={"station": None}
...: )
In [3]: dset
Out[3]:
<xarray.Dataset>
Dimensions: (direction: 24,frequency: 25,station: 14048,time: 249)
Coordinates:
* time (time) datetime64[ns] 2017-01-01 ... 2017-02-01
* station (station) float64 1.0 2.0 3.0 ... 1.405e+04 1.405e+04
* frequency (frequency) float32 0.04118 0.045298003 ... 0.40561208
* direction (direction) float32 90.0 75.0 60.0 45.0 ... 135.0 120.0 105.0
Data variables:
longitude (time,station) float32 dask.array<chunksize=(249,14048),meta=np.ndarray>
latitude (time,meta=np.ndarray>
efth (time,station,frequency,direction) float32 dask.array<chunksize=(249,14048,25,24),meta=np.ndarray>
In [4]: %time lats = dset.latitude.isel(time=0).values
CPU times: user 171 ms,sys: 49.2 ms,total: 220 ms
Wall time: 219 ms
(案例2):沿station
维度打开具有很多大小= 1个块的数据集(切片一次很慢,切片一个很快)
In [5]: dset = xr.open_dataset(
...: "/tmp/spectra.nc",...: chunks={"station": 1}
...: )
In [6]: dset
Out[6]:
<xarray.Dataset>
Dimensions: (direction: 24,1),1,meta=np.ndarray>
In [7]: %time lats = dset.latitude.isel(time=0).values
CPU times: user 13.1 s,sys: 1.94 s,total: 15 s
Wall time: 11.1 s
(案例3):尝试将station
重新打包成一个块(一次切片仍然很慢,应该更快吗?)
In [8]: dset = dset.chunk({"station": None})
In [8]: dset
Out[8]:
<xarray.Dataset>
Dimensions: (direction: 24,meta=np.ndarray>
In [9]: %time lats = dset.latitude.isel(time=0).values
CPU times: user 9.06 s,sys: 1.13 s,total: 10.2 s
Wall time: 7.7 s
作为参考,此数据集以netcdf4的形式存储在磁盘上,沿station
尺寸为size = 1块:
$ ncdump -hs /tmp/spectra.nc
netcdf spectra {
dimensions:
time = UNLIMITED ; // (249 currently)
station = 14048 ;
frequency = 25 ;
direction = 24 ;
variables:
double time(time) ;
time:long_name = "julian day (UT)" ;
time:standard_name = "time" ;
time:units = "days since 1990-01-01 00:00:00" ;
time:_Storage = "chunked" ;
time:_ChunkSizes = 512 ;
time:_DeflateLevel = 9 ;
time:_Shuffle = "true" ;
time:_Endianness = "little" ;
int station(station) ;
station:long_name = "station id" ;
station:_FillValue = -2147483647 ;
station:_Storage = "chunked" ;
station:_ChunkSizes = 1 ;
station:_DeflateLevel = 9 ;
station:_Shuffle = "true" ;
station:_Endianness = "little" ;
short longitude(time,station) ;
longitude:long_name = "longitude" ;
longitude:standard_name = "longitude" ;
longitude:units = "degree_east" ;
longitude:_FillValue = 9.96921e+36f ;
longitude:scale_factor = -0.00547824f ;
longitude:add_offset = 180.f ;
longitude:_Storage = "chunked" ;
longitude:_ChunkSizes = 249,1 ;
longitude:_DeflateLevel = 9 ;
longitude:_Shuffle = "true" ;
longitude:_Endianness = "little" ;
short latitude(time,station) ;
latitude:long_name = "latitude" ;
latitude:standard_name = "latitude" ;
latitude:units = "degree_north" ;
latitude:_FillValue = 9.96921e+36f ;
latitude:scale_factor = -0.0006866874f ;
latitude:add_offset = -54.f ;
latitude:_Storage = "chunked" ;
latitude:_ChunkSizes = 249,1 ;
latitude:_DeflateLevel = 9 ;
latitude:_Shuffle = "true" ;
latitude:_Endianness = "little" ;
float frequency(frequency) ;
frequency:long_name = "frequency of center band" ;
frequency:standard_name = "sea_surface_wave_frequency" ;
frequency:units = "s-1" ;
frequency:scale_factor = 1.f ;
frequency:add_offset = 0.f ;
frequency:_FillValue = 9.96921e+36f ;
frequency:_Storage = "chunked" ;
frequency:_ChunkSizes = 25 ;
frequency:_DeflateLevel = 9 ;
frequency:_Shuffle = "true" ;
frequency:_Endianness = "little" ;
float direction(direction) ;
direction:long_name = "sea surface wave to direction" ;
direction:standard_name = "sea_surface_wave_to_direction" ;
direction:units = "degree" ;
direction:scale_factor = 1.f ;
direction:add_offset = 0.f ;
direction:_FillValue = 9.96921e+36f ;
direction:_Storage = "chunked" ;
direction:_ChunkSizes = 24 ;
direction:_DeflateLevel = 9 ;
direction:_Shuffle = "true" ;
direction:_Endianness = "little" ;
short efth(time,direction) ;
efth:long_name = "sea surface wave directional variance spectral density" ;
efth:standard_name = "sea_surface_wave_directional_variance_spectral_density" ;
efth:units = "m2 s rad-1" ;
efth:_FillValue = 9.96921e+36f ;
efth:scale_factor = -0.004410254f ;
efth:add_offset = 144.5064f ;
efth:_Storage = "chunked" ;
efth:_ChunkSizes = 249,24 ;
efth:_DeflateLevel = 9 ;
efth:_Shuffle = "true" ;
efth:_Endianness = "little" ;
// global attributes:
:nco_openmp_thread_number = 1 ;
:_NCProperties = "version=2,netcdf=4.6.2,hdf5=1.10.4" ;
:_SuperblockVersion = 2 ;
:_Isnetcdf4 = 1 ;
:_Format = "netCDF-4" ;
}