Data Format & Architecture
Overview
Peddy.jl uses DimensionalData.jl for all data representation. This provides labeled, dimension-aware arrays that make code more readable and less error-prone than plain matrices.
DimensionalData Basics
What is a DimArray?
A DimArray is a labeled array with named dimensions and coordinates:
using DimensionalData
# Create a simple DimArray
data = rand(3, 100)
vars = [:Ux, :Uy, :Uz]
times = DateTime(2024, 1, 1):Millisecond(100):DateTime(2024, 1, 1, 0, 0, 10)
arr = DimArray(data, (Var(vars), Ti(times)))
# Access by label (not index!)
ux = arr[Var=At(:Ux)] # Get Ux variable
slice = arr[Ti=At(times[1])] # Get first time sliceWhy DimensionalData?
- Type safety: Dimensions are checked at compile time
- Readability:
arr[Var=At(:Ux)]is clearer thanarr[1, :] - Robustness: Reordering dimensions doesn't break code
- Metadata: Can attach units, descriptions, etc.
Peddy Data Format
High-Frequency Data
High-frequency data represents fast measurements (typically 10-20 Hz):
using DimensionalData
using Dates
# Typical structure
times_hf = DateTime(2024, 1, 1, 0, 0, 0):Millisecond(50):DateTime(2024, 1, 1, 23, 59, 59)
vars_hf = [:Ux, :Uy, :Uz, :Ts, :H2O, :P, :CO2, :diag_sonic]
hf = DimArray(
rand(length(vars_hf), length(times_hf)),
(Var(vars_hf), Ti(times_hf))
)Dimensions:
Var: Variable names (symbols)Ti: Time axis (DateTime objects)
Required Variables (sensor-dependent):
Ux, Uy, Uz: Wind components (m/s)Ts: Sonic temperature (°C)diag_sonic: Sonic anemometer diagnostics (0 = good)
Optional Variables:
H2O: Water vapor concentrationCO2: Carbon dioxide concentrationP: Atmospheric pressure (Pa)LI_H2Om: LI-COR H2O measurementLI_H2Om_corr: Corrected H2O
Low-Frequency Data
Low-frequency data represents slow measurements (typically 1 Hz or slower):
times_lf = DateTime(2024, 1, 1, 0, 0, 0):Second(1):DateTime(2024, 1, 1, 23, 59, 59)
vars_lf = [:TA, :RH, :P, :other_var]
lf = DimArray(
rand(length(vars_lf), length(times_lf)),
(Var(vars_lf), Ti(times_lf))
)Common Variables:
TA: Air temperature (°C)RH: Relative humidity (%)P: Atmospheric pressure (Pa)PRECIP: Precipitation (mm)WS: Wind speed (m/s)
Low-frequency data is optional and only needed for certain steps (e.g., H2O correction).
Accessing Data
By Variable
# Get all values for a variable
ux = hf[Var=At(:Ux)] # Returns 1D array
uy = hf[Var=At(:Uy)]
# Get multiple variables
wind = hf[Var=In([:Ux, :Uy, :Uz])] # Returns 3×N arrayBy Time
# Get data at specific time
t0 = DateTime(2024, 1, 1, 12, 0, 0)
slice = hf[Ti=At(t0)] # Returns all variables at t0
# Get time range
t_start = DateTime(2024, 1, 1, 0, 0, 0)
t_end = DateTime(2024, 1, 1, 1, 0, 0)
subset = hf[Ti=Between(t_start, t_end)]By Index
# Get by integer index (less preferred, but works)
first_row = hf[1, :] # First variable, all times
first_col = hf[:, 1] # All variables, first timeUsing Views
For in-place modification, use @view:
# Create a view (doesn't copy data)
ux_view = @view hf[Var=At(:Ux)]
ux_view[1] = 999.0 # Modifies original hf
# Without @view (creates a copy)
ux_copy = hf[Var=At(:Ux)]
ux_copy[1] = 999.0 # Doesn't modify hfDimensions and Coordinates
Var Dimension
The Var dimension holds variable names:
var_dim = dims(hf, Var)
var_names = val(var_dim) # Get the actual symbols
@show var_names # [:Ux, :Uy, :Uz, ...]
# Check if variable exists
:Ux in var_names # true
:unknown in var_names # falseTi Dimension
The Ti dimension holds time coordinates:
time_dim = dims(hf, Ti)
times = collect(time_dim) # Convert to Vector
n_samples = length(time_dim)
# Get time range
t_first = times[1]
t_last = times[end]
duration = t_last - t_first
# Check time regularity
time_diffs = diff(times)
is_regular = all(x -> x == time_diffs[1], time_diffs)Working with Missing Data
Peddy.jl uses NaN to represent missing values:
# Check for missing values
n_missing = count(isnan, hf)
n_valid = count(isfinite, hf)
# Get valid data only
valid_data = hf[isfinite.(hf)]
# Use skipmissing for statistics
using Statistics
mean_ux = mean(skipmissing(hf[Var=At(:Ux)]))
std_ux = std(skipmissing(hf[Var=At(:Ux)]))
# Peddy's mean_skipnan function
mean_ux = Peddy.mean_skipnan(hf[Var=At(:Ux)])Data Validation
Check Required Variables
function validate_data(hf, lf, sensor)
# Check dimensions
if !(:Var in DimensionalData.name.(dims(hf)))
error("High-frequency data must have Var dimension")
end
if !(:Ti in DimensionalData.name.(dims(hf)))
error("High-frequency data must have Ti dimension")
end
# Check required variables
required = Peddy.needs_data_cols(sensor)
available = val(dims(hf, Var))
for var in required
if var ∉ available
error("Missing required variable: $var")
end
end
endCheck Time Axis
function validate_time_axis(hf)
times = collect(dims(hf, Ti))
# Check monotonicity
if !issorted(times)
error("Time axis is not sorted")
end
# Check for duplicates
if length(unique(times)) < length(times)
error("Time axis has duplicate values")
end
# Check regularity (optional)
time_diffs = diff(times)
if !all(x -> x == time_diffs[1], time_diffs)
@warn "Time axis is not regular"
end
endData Shapes and Sizes
Understanding Array Shapes
hf = DimArray(
rand(5, 1000),
(Var([:Ux, :Uy, :Uz, :Ts, :H2O]), Ti(times))
)
# Shape information
size(hf) # (5, 1000)
size(hf, 1) # 5 (number of variables)
size(hf, 2) # 1000 (number of time points)
# Dimension sizes
length(dims(hf, Var)) # 5
length(dims(hf, Ti)) # 1000
# Parent array (underlying matrix)
parent_data = parent(hf) # Returns the 5×1000 matrixCreating Arrays of Different Shapes
# Column-major (variables × time) - Peddy standard
hf = DimArray(
rand(5, 1000),
(Var(vars), Ti(times))
)
# Row-major (time × variables) - requires transpose
data_row_major = rand(1000, 5)
hf = DimArray(
data_row_major', # Transpose to (vars, times)
(Var(vars), Ti(times))
)Data Type Considerations
Float Precision
# Float64 (default, recommended)
hf = DimArray(
rand(Float64, 5, 1000),
(Var(vars), Ti(times))
)
# Float32 (saves memory, less precision)
hf = DimArray(
rand(Float32, 5, 1000),
(Var(vars), Ti(times))
)
# Check element type
eltype(hf) # Float64 or Float32Integer Diagnostics
Diagnostic fields are often integers:
# Diagnostic with integer type
diag = zeros(Int32, 1000)
hf_with_diag = DimArray(
hcat(wind_data, diag),
(Var([:Ux, :Uy, :Uz, :diag]), Ti(times))
)Performance Considerations
Memory Layout
# Efficient: contiguous in memory
hf = DimArray(
rand(5, 1000),
(Var(vars), Ti(times))
)
# Less efficient: non-contiguous
hf_transposed = permutedims(hf)In-Place Operations
# Efficient: modifies in-place
ux_view = @view hf[Var=At(:Ux)]
ux_view .= ux_view .* 2.0
# Less efficient: creates intermediate arrays
hf[Var=At(:Ux)] = hf[Var=At(:Ux)] .* 2.0Iteration
# Efficient: iterate over variables
for var in val(dims(hf, Var))
data = @view hf[Var=At(var)]
# Process data
end
# Less efficient: iterate over indices
for i in 1:size(hf, 1)
data = @view hf[i, :]
# Process data
endModifying Data
In-Place Modifications
# Modify a variable
hf[Var=At(:Ux)] .= 0.0
# Modify with condition
ux = @view hf[Var=At(:Ux)]
ux[ux .> 100] .= NaN
# Modify time slice
hf[:, 1] .= NaNCreating New Arrays
# Copy entire array
hf_copy = copy(hf)
# Copy specific variable
ux_copy = copy(hf[Var=At(:Ux)])
# Create new array with subset of variables
subset_vars = [:Ux, :Uy, :Uz]
hf_subset = hf[Var=In(subset_vars)]Adding Variables
# Add a new variable
new_var_data = rand(length(dims(hf, Ti)))
hf_new = DimArray(
hcat(parent(hf), new_var_data),
(Var(vcat(val(dims(hf, Var)), :new_var)), Ti(dims(hf, Ti)))
)Coordinate Systems
Time Coordinates
# Regular time axis
times = DateTime(2024, 1, 1):Millisecond(50):DateTime(2024, 1, 1, 1, 0, 0)
# Irregular time axis (still valid)
times = [
DateTime(2024, 1, 1, 0, 0, 0),
DateTime(2024, 1, 1, 0, 0, 0.050),
DateTime(2024, 1, 1, 0, 0, 0.150), # Gap here
DateTime(2024, 1, 1, 0, 0, 0.200),
]
hf = DimArray(data, (Var(vars), Ti(times)))Variable Coordinates
# Variables are just symbols
vars = [:Ux, :Uy, :Uz, :Ts]
# Can use any symbols
vars = [:u, :v, :w, :T]
vars = [:wind_x, :wind_y, :wind_z, :temperature]
hf = DimArray(data, (Var(vars), Ti(times)))Metadata and Attributes
Adding Metadata
using DimensionalData
# Create DimArray with metadata
hf = DimArray(
data,
(Var(vars), Ti(times)),
name="high_frequency_data",
metadata=Dict(
"site" => "Davos",
"elevation" => 1639,
"sampling_rate" => "20 Hz"
)
)
# Access metadata
@show hf.metadataVariable Metadata
Peddy.jl provides VariableMetadata for detailed variable information:
using Peddy
# Get default metadata
meta = get_default_metadata(:Ux)
@show meta.long_name
@show meta.units
@show meta.standard_name
# Get metadata for specific variable
meta = metadata_for(:H2O)Data Transformation Examples
Resampling
# Downsample to lower frequency
downsample_factor = 10
hf_downsampled = DimArray(
hf[:, 1:downsample_factor:end],
(Var(dims(hf, Var)), Ti(collect(dims(hf, Ti))[1:downsample_factor:end]))
)Unit Conversion
# Convert wind speed from m/s to km/h
hf_kmh = copy(hf)
hf_kmh[Var=At(:Ux)] .*= 3.6
hf_kmh[Var=At(:Uy)] .*= 3.6
hf_kmh[Var=At(:Uz)] .*= 3.6Coordinate Transformation
# Rotate wind components
using LinearAlgebra
angle = π / 4 # 45 degrees
ux = hf[Var=At(:Ux)]
uy = hf[Var=At(:Uy)]
ux_rot = ux .* cos(angle) - uy .* sin(angle)
uy_rot = ux .* sin(angle) + uy .* cos(angle)
hf[Var=At(:Ux)] .= ux_rot
hf[Var=At(:Uy)] .= uy_rotCommon Patterns
Processing All Variables
for var in val(dims(hf, Var))
data = @view hf[Var=At(var)]
# Process data
data .= process_function(data)
endProcessing Time Windows
window_size = 1000 # samples
for i in 1:window_size:size(hf, 2)
window_end = min(i + window_size - 1, size(hf, 2))
window = hf[:, i:window_end]
# Process window
endCombining High and Low Frequency
# Interpolate low-frequency to high-frequency times
hf_times = collect(dims(hf, Ti))
lf_times = collect(dims(lf, Ti))
# Find corresponding LF indices for each HF time
for (i, hf_time) in enumerate(hf_times)
# Find nearest LF time
idx = searchsortednearest(lf_times, hf_time)
# Use lf[idx] for this HF time
endSee Also
- API Reference - Function documentation
- Tutorial - Practical examples
- DimensionalData.jl Documentation