TB

MoppleIT Tech Blog

Welcome to my personal blog where I share thoughts, ideas, and experiences.

Deterministic JSON for Cleaner Diffs in PowerShell

JSON churn can bury real changes under a pile of noisy diffs: property reordering, inconsistent indentation, and stray byte order marks (BOMs). You can eliminate that noise by making your JSON output deterministic in PowerShell: recursively sort keys, preserve property order, and save as UTF-8 without BOM. The result is stable, predictable files that make code reviews faster and clearer.

Why Deterministic JSON Matters

JSON objects are inherently unordered, but most tools and humans read them as if order matters. When serialization order changes between runs, your pull requests show large, meaningless diffs. Deterministic JSON fixes that by enforcing a single, repeatable ordering.

  • Cleaner diffs: Only semantic changes appear in reviews.
  • Reproducible builds: Identical inputs produce identical artifacts.
  • Predictable automation: Scripts, CI, and policy checks behave consistently.
  • Easier merges: Less chance of unnecessary conflicts from reordered properties.

Implementing Deterministic JSON in PowerShell

Core strategy

  1. Recursively sort keys for every object in your graph using a consistent comparer.
  2. Preserve property order using [ordered] hashtables so ConvertTo-Json emits keys in the order you chose.
  3. Control depth to serialize complete objects without truncation.

A deterministic serializer function

The function below:

  • Walks the object graph and sorts all object keys using an ordinal comparer (culture-agnostic and fully deterministic).
  • Preserves that order with [ordered] hashtables.
  • Leaves arrays in their existing order (semantic order), but still normalizes any objects they contain.
function ConvertTo-DeterministicJson {
  [CmdletBinding()]
  param(
    [Parameter(Mandatory)][object]$InputObject,
    [int]$Depth = 20
  )

  function Order-Keys { param($o)
    if ($null -eq $o) { return $null }

    if ($o -is [pscustomobject] -or $o -is [System.Collections.IDictionary]) {
      # Normalize to a hashtable first
      $h = @{}
      if ($o -is [pscustomobject]) {
        $o.PSObject.Properties | ForEach-Object { $h[$_.Name] = $_.Value }
      } else {
        foreach ($k in $o.Keys) { $h[$k] = $o[$k] }
      }

      # Sort keys using an ordinal comparer for culture-invariant determinism
      $keys = [string[]]$h.Keys
      [Array]::Sort($keys, [System.StringComparer]::Ordinal)

      # Rebuild in a stable order
      $out = [ordered]@{}
      foreach ($k in $keys) { $out[$k] = Order-Keys $h[$k] }
      return $out
    }
    elseif ($o -is [System.Collections.IEnumerable] -and -not ($o -is [string])) {
      # Preserve array order; just normalize elements
      return @($o | ForEach-Object { Order-Keys $_ })
    }
    else {
      return $o
    }
  }

  (Order-Keys $InputObject) | ConvertTo-Json -Depth $Depth
}

# Example: produce stable JSON for diffs
$src = './settings.json'
$obj = Get-Content -Path $src -Raw | ConvertFrom-Json -Depth 50
$out = ConvertTo-DeterministicJson -InputObject $obj -Depth 50

# Write UTF-8 without BOM + trailing newline for clean VCS diffs
[IO.File]::WriteAllText(
  './settings.sorted.json',
  $out + [Environment]::NewLine,
  [Text.UTF8Encoding]::new($false)
)

This approach ensures the same input object always serializes to the same exact string, independent of machine locale or PowerShell version nuances that influence property enumeration.

Why ordinal sorting?

Default sorting in PowerShell can be culture-aware, which means collation rules and case handling may differ between environments. Using [System.StringComparer]::Ordinal guarantees a byte-wise comparison that is identical on every system. That makes your JSON order fully deterministic.

Depth and data types

  • Depth: ConvertTo-Json defaults to depth 2, which truncates nested data. Always pass a depth that fits your data, e.g., -Depth 20 or higher for complex graphs.
  • Arrays: Do not sort arrays by default; order is typically meaningful. If you need deterministic ordering for arrays of objects during code generation (e.g., schema bundles), explicitly sort them by a stable key before calling the function.
  • Numbers, booleans, null: These round-trip cleanly via ConvertFrom-Json/ConvertTo-Json. Dates are emitted as strings by ConvertTo-Json; ensure your consumers agree on a format (e.g., ISO 8601).

Clean File Output and Workflow Integration

Save as UTF-8 without BOM

Many Windows tools historically add a UTF-8 BOM, which shows up as noise in diffs and can confuse parsers. Write files explicitly without a BOM:

# Works on Windows PowerShell 5.1 and PowerShell 7+
[IO.File]::WriteAllText(
  $path,
  $json + [Environment]::NewLine,
  [Text.UTF8Encoding]::new($false)
)

# PowerShell 7+ alternative
$json | Set-Content -Path $path -Encoding utf8NoBOM

Also consider always ending files with a single trailing newline. It keeps diffs consistent across editors and platforms.

Make it automatic: pre-commit hook

Automate sorting so humans never have to think about it. Here is a simple Git pre-commit hook (.git/hooks/pre-commit) that normalizes staged JSON files:

#!/usr/bin/env pwsh
$ErrorActionPreference = 'Stop'

# Find staged JSON files
$staged = git diff --cached --name-only --diff-filter=ACM | Where-Object { $_ -like '*.json' }
if (-not $staged) { exit 0 }

foreach ($file in $staged) {
  try {
    $raw = Get-Content -Path $file -Raw
    $obj = $raw | ConvertFrom-Json -Depth 50
    $json = ConvertTo-DeterministicJson -InputObject $obj -Depth 50

    [IO.File]::WriteAllText($file, $json + [Environment]::NewLine, [Text.UTF8Encoding]::new($false))
    git add -- $file
  } catch {
    Write-Warning "Skipping $file: $($_.Exception.Message)"
  }
}

With this hook in place, every commit keeps JSON stable by default.

CI verification step

Guard against accidental drift by adding a CI job that fails if any JSON would be resorted:

pwsh -NoProfile -Command @'
$files = Get-ChildItem -Recurse -Include *.json | Select-Object -Expand FullName
$bad = @()
foreach ($f in $files) {
  try {
    $orig = Get-Content $f -Raw
    $obj = $orig | ConvertFrom-Json -Depth 50
    $resorted = ConvertTo-DeterministicJson -InputObject $obj -Depth 50
    if ($orig -ne ($resorted + [Environment]::NewLine) -and $orig -ne $resorted) {
      $bad += $f
    }
  } catch { }
}
if ($bad.Count) {
  Write-Error "Non-deterministic JSON detected: `n$($bad -join "`n")"; exit 1
}
'@

This check ensures developers keep using the deterministic serializer locally.

Editor and repo hygiene

  • .editorconfig: Encourage charset = utf-8 and a final newline to reduce cross-editor differences.
  • .gitattributes: Use * text=auto eol=lf or your team7s preference for consistent line endings. Deterministic JSON is most effective when combined with consistent EOL settings.

Quick sanity tests

  1. Run the serializer twice; the output should be byte-for-byte identical.
  2. Change only a value; diffs should show a single-line change, not a reorder.
  3. Switch machines (different locales); results remain identical thanks to ordinal sorting.

Putting it all together

Here is a compact end-to-end example you can drop into a repository:

# Normalize, then overwrite the original file
$path = './settings.json'
$obj  = Get-Content $path -Raw | ConvertFrom-Json -Depth 50
$json = ConvertTo-DeterministicJson -InputObject $obj -Depth 50
[IO.File]::WriteAllText($path, $json + [Environment]::NewLine, [Text.UTF8Encoding]::new($false))

That7s it: stable ordering, clean encoding, and predictable outputs. You7ll review JSON like code, not noise.

Bonus reading: If you want to go deeper on PowerShell patterns for robust tooling, check out advanced scripting resources and cookbooks that cover idiomatic functions, parameter binding, and pipeline design.

← All Posts Home →