Serialization¶
virtualizarr.accessor.VirtualiZarrDatasetAccessor.to_icechunk ¶
to_icechunk(
store: IcechunkStore,
*,
group: str | None = None,
append_dim: str | None = None,
region: Literal["auto"]
| Mapping[str, Literal["auto"] | slice]
| None = None,
validate_containers: bool = True,
last_updated_at: datetime | None = None,
) -> None
Write an xarray dataset to an Icechunk store.
Any variables backed by ManifestArray objects will be be written as virtual references. Any other variables will be loaded into memory before their binary chunk data is written into the store.
If append_dim is provided, the virtual dataset will be appended to the
existing IcechunkStore along the append_dim dimension.
If last_updated_at is provided, it will be used as a checksum for any virtual
chunks written to the store with this operation. At read time, if any of the
virtual chunks have been updated since this provided datetime, an error will be
raised. This protects against reading outdated virtual chunks that have been
updated since the last read. When not provided, the current time is used. This
value is stored in Icechunk with seconds precision, so be sure to take that into
account when providing this value.
Parameters:
-
store(IcechunkStore) –Store to write dataset into.
-
group(str | None, default:None) –Path of the group to write the dataset into (default: the root group).
-
append_dim(str | None, default:None) –Dimension along which to append the virtual dataset.
-
region(Literal['auto'] | Mapping[str, Literal['auto'] | slice] | None, default:None) –Optional mapping from dimension names to either a)
"auto", or b) integer slices, indicating the region of existing zarr array(s) in which to write this dataset's data.See
xarray.Dataset.to_zarrdocumentation for details. -
validate_containers(bool, default:True) –If
True, raise if any virtual chunks have a refer to locations that don't match any existing virtual chunk container set on this Icechunk repository.It is not generally recommended to set this to
False, because it can lead to confusing runtime results and errors when reading data back. -
last_updated_at(datetime | None, default:None) –Datetime to use as a checksum for any virtual chunks written to the store with this operation. When not provided, the current time is used.
Raises:
-
ValueError–If the store is read-only.
virtualizarr.accessor.VirtualiZarrDatasetAccessor.to_kerchunk ¶
to_kerchunk(filepath: None, format: Literal['dict']) -> KerchunkStoreRefs
to_kerchunk(
filepath: str | Path | None = None,
format: Literal["dict", "json", "parquet"] = "dict",
record_size: int = 100000,
categorical_threshold: int = 10,
) -> KerchunkStoreRefs | None
Serialize all virtualized arrays in this xarray dataset into the kerchunk references format.
Parameters:
-
filepath(str | Path | None, default:None) –File path to write kerchunk references into. Not required if format is 'dict'.
-
format(Literal['dict', 'json', 'parquet'], default:'dict') –Format to serialize the kerchunk references as. If 'json' or 'parquet' then the 'filepath' argument is required.
-
record_size(int, default:100000) –Number of references to store in each reference file (default 100,000). Bigger values mean fewer read requests but larger memory footprint. Only available when
formatis 'parquet'. -
categorical_threshold(int, default:10) –Encode urls as pandas.Categorical to reduce memory footprint if the ratio of the number of unique urls to total number of refs for each variable is greater than or equal to this number (default 10). Only available when
formatis 'parquet'.
References
virtualizarr.accessor.VirtualiZarrDataTreeAccessor.to_icechunk ¶
to_icechunk(
store: IcechunkStore,
*,
write_inherited_coords: bool = False,
validate_containers: bool = True,
last_updated_at: datetime | None = None,
**kwargs,
) -> None
Write an xarray DataTree to an Icechunk store.
Any variables backed by ManifestArray objects will be be written as virtual references. Any other variables will be loaded into memory before their binary chunk data is written into the store.
If last_updated_at is provided, it will be used as a checksum for any
virtual chunks written to the store with this operation. At read time, if any
of the virtual chunks have been updated since this provided datetime, an error
will be raised. This protects against reading outdated virtual chunks that have
been updated since the last read. When not provided, no check is performed.
This value is stored in Icechunk with seconds precision, so be sure to take that
into account when providing this value.
Parameters:
-
store(IcechunkStore) –Store to write dataset into.
-
write_inherited_coords(bool, default:False) –If
True, replicate inherited coordinates on all descendant nodes. Otherwise, only write coordinates at the level at which they are originally defined. This saves disk space, but requires opening the full tree to load inherited coordinates. -
validate_containers(bool, default:True) –If
True, raise if any virtual chunks have a refer to locations that don't match any existing virtual chunk container set on this Icechunk repository.It is not generally recommended to set this to
False, because it can lead to confusing runtime results and errors when reading data back. -
last_updated_at(datetime | None, default:None) –Datetime to use as a checksum for any virtual chunks written to the store with this operation. When not provided, no check is performed.
-
**kwargs–Additional keyword arguments to be passed to
xarray.Dataset.vz.to_icechunk.
Raises:
-
ValueError–If the store is read-only.
Examples:
To ensure an error is raised if the files containing referenced virtual chunks
are modified at any time from now on, pass the current time to
last_updated_at.
>>> from datetime import datetime
>>> vdt.vz.to_icechunk(
... icechunkstore,
... last_updated_at=datetime.now(),
... )