Troubleshooting & FAQ

Common Issues

Data Loading

Issue: "High frequency data must have a Var dimension"

Cause: Your data doesn't have the required Var dimension from DimensionalData.jl.

Solution: Ensure your data is a DimArray with proper dimensions:

using DimensionalData

# Correct format
hf = DimArray(
    data_matrix,
    (Var([:Ux, :Uy, :Uz, :Ts]), Ti(times))
)

# Wrong format (will fail)
hf = data_matrix  # Just a matrix

Issue: "Var dimension must have a Ux variable"

Cause: Missing required variables for the selected sensor.

Solution: Check what variables your sensor needs:

sensor = CSAT3()
required = Peddy.needs_data_cols(sensor)
@show required  # Shows [:Ux, :Uy, :Uz, :Ts, :diag_sonic]

Ensure your data includes all required variables.

Issue: Time format parsing fails

Cause: Incorrect time_format specification in FileOptions.

Solution: Match the format exactly:

# For "2024-01-01 12:30:45.123"
FileOptions(time_format=dateformat"yyyy-mm-dd HH:MM:SS.s")

# For "2024-01-01 12:30:45"
FileOptions(time_format=dateformat"yyyy-mm-dd HH:MM:SS")

# For "01/01/2024 12:30"
FileOptions(time_format=dateformat"mm/dd/yyyy HH:MM")

Pipeline Execution

Issue: "Variable X not found in high frequency data"

Cause: A pipeline step references a variable that doesn't exist in your data.

Solution: Check available variables:

vars = val(dims(high_frequency_data, Var))
@show vars

# Then configure steps only for variables you have
desp = SimpleSigmundDespiking(
    variable_groups=[
        VariableGroup("Wind", [:Ux, :Uy, :Uz], spike_threshold=6.0)
    ]
)

Issue: Pipeline runs but produces all NaN results

Cause: Quality control or despiking is too aggressive, removing all data.

Solution: Relax thresholds or disable the step:

# Option 1: Relax QC bounds
qc = PhysicsBoundsCheck(
    Ux=Limit(-200, 200),  # Wider range
    Uy=Limit(-200, 200),
    Uz=Limit(-100, 100)
)

# Option 2: Disable QC
pipeline = EddyPipeline(
    sensor=sensor,
    quality_control=nothing,  # Skip QC
    despiking=desp,
    output=output
)

# Option 3: Relax despiking threshold
desp = SimpleSigmundDespiking(
    variable_groups=[
        VariableGroup("Wind", [:Ux, :Uy, :Uz], spike_threshold=10.0)  # Higher = less aggressive
    ]
)

Issue: "Block size calculation failed" or "Not enough samples"

Cause: Data is too short for the requested processing.

Solution: Ensure sufficient data:

# For double rotation with 30-minute blocks, need at least 30 minutes of data
# For MRD with M=11, need at least 2^11 = 2048 samples

# Check your data length
n_samples = length(dims(high_frequency_data, Ti))
duration_minutes = n_samples * 50 / 1000 / 60  # Assuming 50 ms sampling

# Adjust parameters for short data
rot = WindDoubleRotation(block_duration_minutes=5.0)  # Shorter blocks
mrd = OrthogonalMRD(M=8)  # Smaller maximum scale

Quality Control

Issue: Too many points marked as invalid

Cause: Physical bounds are too restrictive for your site conditions.

Solution: Inspect your data and adjust bounds:

# Check data ranges
ux = high_frequency_data[Var=At(:Ux)]
@show extrema(skipmissing(ux))

# Set bounds based on your data
qc = PhysicsBoundsCheck(
    Ux=Limit(-50, 50),  # Adjust to your site
    Uy=Limit(-50, 50),
    Uz=Limit(-30, 30),
    Ts=Limit(-30, 50)
)

Issue: Sensor diagnostics always fail

Cause: Diagnostic field has non-zero values (sensor issues).

Solution: Either fix the sensor or disable diagnostic checks:

# Option 1: Only check physical bounds, not diagnostics
qc = OnlyDiagnostics()  # This only checks diagnostics
# Actually, use PhysicsBoundsCheck instead:
qc = PhysicsBoundsCheck()

# Option 2: Manually clean diagnostic field
high_frequency_data[Var=At(:diag_sonic)] .= 0.0

Despiking

Issue: Despiking removes too much data

Cause: Threshold is too low (more aggressive).

Solution: Increase the threshold:

# Lower threshold = more aggressive (removes more spikes)
# Higher threshold = less aggressive (keeps more data)

desp = SimpleSigmundDespiking(
    window_minutes=5.0,
    variable_groups=[
        VariableGroup("Wind", [:Ux, :Uy, :Uz], spike_threshold=8.0)  # Higher = less aggressive
    ]
)

Issue: Despiking doesn't remove obvious spikes

Cause: Threshold is too high or window size is wrong.

Solution: Lower the threshold or adjust window:

desp = SimpleSigmundDespiking(
    window_minutes=2.0,  # Shorter window for faster response
    variable_groups=[
        VariableGroup("Wind", [:Ux, :Uy, :Uz], spike_threshold=4.0)  # Lower = more aggressive
    ]
)

Gap Filling

Issue: Gaps remain after gap filling

Cause: Gaps are larger than max_gap_size.

Solution: Increase the maximum gap size:

gap = GeneralInterpolation(
    max_gap_size=50,  # Fill gaps up to 50 consecutive missing values
    method=Cubic()
)

Issue: Interpolation creates unrealistic values

Cause: Using linear interpolation for nonlinear data, or gaps are too large.

Solution: Use higher-order interpolation or reduce max gap size:

# Option 1: Use cubic spline
gap = GeneralInterpolation(
    max_gap_size=10,
    method=Cubic()
)

# Option 2: Reduce max gap size
gap = GeneralInterpolation(
    max_gap_size=5,  # Only fill very small gaps
    method=Linear()
)

H2O Correction

Issue: "Variable H2O not found" or "Variable P not found"

Cause: High-frequency data missing required variables.

Solution: Ensure your data has H2O and pressure:

# Check available variables
vars = val(dims(high_frequency_data, Var))
@show vars

# H2O correction requires:
# - High-frequency: :H2O, :P
# - Low-frequency: :TA, :RH

# If missing, disable H2O correction
pipeline = EddyPipeline(
    gas_analyzer=nothing,  # Skip H2O correction
    # ... other steps
)

Issue: "Variable TA not found" or "Variable RH not found"

Cause: Low-frequency data missing temperature or relative humidity.

Solution: Provide low-frequency data with required variables:

# Low-frequency data must have :TA and :RH
lf = DimArray(
    lf_data_matrix,
    (Var([:TA, :RH, :other_vars]), Ti(lf_times))
)

# Or disable H2O correction if LF data unavailable
pipeline = EddyPipeline(
    gas_analyzer=nothing,
    # ... other steps
)

Issue: H2O correction produces NaN values

Cause: Missing calibration coefficients or invalid input data.

Solution: Provide calibration coefficients:

sensor = LICOR(
    calibration_coefficients=H2OCalibrationCoefficients(
        A=4.82004e3,
        B=3.79290e6,
        C=-1.15477e8,
        H2O_Zero=0.7087,
        H20_Span=0.9885
    )
)

# Or check for NaN in input data
@show count(isnan, high_frequency_data[Var=At(:H2O)])
@show count(isnan, low_frequency_data[Var=At(:TA)])

Double Rotation

Issue: "Variable Ux not found" (or Uy, Uz)

Cause: Wind components missing from data.

Solution: Ensure wind components exist:

required_wind = [:Ux, :Uy, :Uz]
vars = val(dims(high_frequency_data, Var))

if all(w -> w in vars, required_wind)
    rot = WindDoubleRotation()
else
    @warn "Wind components missing, skipping double rotation"
    rot = nothing
end

Issue: Rotation angles are all zero

Cause: Wind is perfectly aligned with coordinate system (rare) or data quality issue.

Solution: Check your data:

ux = high_frequency_data[Var=At(:Ux)]
uy = high_frequency_data[Var=At(:Uy)]

@show mean(skipmissing(ux))  # Should be non-zero
@show mean(skipmissing(uy))  # Should be non-zero

MRD

Issue: "Variable Uz not found" or "Variable Ts not found"

Cause: Specified variables don't exist in data.

Solution: Check available variables and adjust MRD configuration:

vars = val(dims(high_frequency_data, Var))
@show vars

# Use variables that exist
mrd = OrthogonalMRD(
    a=:Uz,      # Change if not available
    b=:Ts       # Change if not available
)

Issue: MRD results are all NaN

Cause: Data has too many gaps or insufficient samples.

Solution: Check data quality and adjust parameters:

# Check for gaps
times = collect(dims(high_frequency_data, Ti))
time_diffs = diff(times)
large_gaps = count(x -> x > Millisecond(1000), time_diffs)
@show large_gaps

# Adjust gap threshold
mrd = OrthogonalMRD(
    gap_threshold_seconds=20.0,  # Allow larger gaps
    regular_grid=true            # Backfill invalid blocks
)

Issue: MRD computation is very slow

Cause: Large M value or small shift parameter.

Solution: Reduce computational load:

# Option 1: Reduce maximum scale
mrd = OrthogonalMRD(M=10)  # Instead of 11

# Option 2: Increase shift (fewer blocks)
mrd = OrthogonalMRD(shift=512)  # Instead of 256

# Option 3: Use regular_grid=false to skip invalid blocks
mrd = OrthogonalMRD(regular_grid=false)

Output

Issue: "Cannot write to output directory"

Cause: Directory doesn't exist or permission denied.

Solution: Create directory first:

using Base.Filesystem

output_dir = "/path/to/output"
mkpath(output_dir)  # Create if doesn't exist

out = ICSVOutput(output_dir)

Issue: Output files are empty

Cause: Data is all NaN or write_data wasn't called.

Solution: Check data before writing:

# Verify data has valid values
@show count(isfinite, high_frequency_data)
@show size(high_frequency_data)

# Ensure output step is in pipeline
pipeline = EddyPipeline(
    # ... other steps
    output=ICSVOutput("/path/to/output")  # Must be included
)

Performance Issues

Issue: Pipeline runs very slowly

Cause: Large dataset or expensive operations.

Solution: Profile and optimize:

using BenchmarkTools

# Time individual steps
@time quality_control!(qc, hf, lf, sensor)
@time despike!(desp, hf, lf)
@time fill_gaps!(gap, hf, lf)

# Disable expensive steps if not needed
pipeline = EddyPipeline(
    quality_control=nothing,  # Skip if not needed
    mrd=nothing,              # MRD is expensive
    output=output
)

Issue: Memory usage is very high

Cause: Large dataset or inefficient operations.

Solution: Process in chunks or reduce data:

# Option 1: Process shorter time periods
# Instead of 1 year, process 1 month at a time

# Option 2: Reduce sampling rate before processing
# Downsample if high-frequency data is not needed

# Option 3: Use MemoryOutput only for testing
# Use ICSVOutput or NetCDFOutput for production

Data Quality

Issue: Results don't match expected values

Cause: Data preprocessing differences or parameter mismatches.

Solution: Verify pipeline configuration:

# 1. Check what steps are enabled
@show pipeline.quality_control
@show pipeline.despiking
@show pipeline.gap_filling

# 2. Verify parameters match expectations
@show pipeline.despiking.window_minutes
@show pipeline.gap_filling.max_gap_size

# 3. Compare with reference implementation
# Run with minimal pipeline first
minimal_pipeline = EddyPipeline(
    sensor=sensor,
    output=MemoryOutput()
)

Issue: NaN values increase through pipeline

Cause: Each step may introduce NaN values (expected behavior).

Solution: Monitor NaN count:

function count_nans(data)
    return count(isnan, data)
end

n_nan_initial = count_nans(hf)
@show n_nan_initial

process!(pipeline, hf, lf)

n_nan_final = count_nans(hf)
@show n_nan_final
@show n_nan_final - n_nan_initial  # Additional NaNs introduced

Debugging

Enable Debug Logging

using Logging

# Enable debug messages
logger = ConsoleLogger(stderr, Logging.Debug)
with_logger(logger) do
    process!(pipeline, hf, lf)
end

Inspect Data at Each Step

# Manually run steps to inspect intermediate results
check_data(hf, lf, sensor)

quality_control!(pipeline.quality_control, hf, lf, sensor)
@show count(isnan, hf)

despike!(pipeline.despiking, hf, lf)
@show count(isnan, hf)

fill_gaps!(pipeline.gap_filling, hf, lf)
@show count(isnan, hf)

Use Processing Logger

logger = ProcessingLogger()

pipeline = EddyPipeline(
    sensor=sensor,
    output=output,
    logger=logger
)

process!(pipeline, hf, lf)

# Write log to file
write_processing_log(logger, "/path/to/log.csv")

# Inspect events
@show logger.events
@show logger.stage_times

FAQ

Q: What Julia version should I use?

A: Julia 1.11 or later. The project specifies julia = "1.11" in Project.toml.

julia +1.11 --project=.

Q: Can I use Peddy.jl with my custom sensor?

A: Yes! Create a custom sensor type inheriting from AbstractSensor. See Extending Peddy.jl.

Q: How do I handle missing data?

A: Peddy.jl uses NaN to represent missing values. The pipeline handles NaN gracefully:

Quality control marks invalid data as NaN
Gap filling interpolates small gaps
Most functions use mean_skipnan to ignore NaN

Q: Can I run multiple pipeline configurations on the same data?

A: Yes, but remember that in-place modifications persist:

# Create a copy for each pipeline
hf1 = copy(high_frequency_data)
hf2 = copy(high_frequency_data)

process!(pipeline1, hf1, lf)
process!(pipeline2, hf2, lf)

Q: How do I combine multiple output formats?

A: Use OutputSplitter:

out = OutputSplitter(
    ICSVOutput("/path/csv"),
    NetCDFOutput("/path/nc"),
    MemoryOutput()
)

Q: What's the difference between `max_gap_size` and `gap_threshold_seconds`?

max_gap_size (gap filling): Maximum number of consecutive missing values to interpolate
gap_threshold_seconds (MRD): Maximum time gap allowed within an MRD block

Q: How do I visualize MRD results?

A: Use the built-in plotting:

using Plots

mrd = OrthogonalMRD(a=:Uz, b=:Ts)
decompose!(mrd, hf, lf)
results = get_mrd_results(mrd)

if results !== nothing
    plot(results)  # Heatmap of MRD values
end

Q: Can I process data in real-time or streaming mode?

A: Not currently. Peddy.jl is designed for batch processing of complete datasets.

Q: How do I contribute improvements or report bugs?

Check GitHub Issues
Create a minimal reproducible example
Submit an issue or pull request

Q: Where can I find example datasets?

A: See the tutorial for synthetic data examples. For real data, contact the package maintainers.

Q: How do I cite Peddy.jl?

A: See the README for citation information and DOI.

Getting Help

Check the documentation: Tutorial, API Reference, Extension Guide
Enable debug logging: See "Debugging" section above
Inspect intermediate results: Run steps manually to identify where issues occur
Create a minimal example: Reproduce the issue with synthetic data
Open an issue: Provide code, data sample, and error message