TB

MoppleIT Tech Blog

Welcome to my personal blog where I share thoughts, ideas, and experiences.

Culture-Safe CSV Handling in PowerShell: Predictable Imports, Stable Exports

CSV files look simple, but culture and locale differences can quietly break your pipelines. Commas vs semicolons, dots vs commas for decimals, and date formats that flip month and day will all cause headaches if you assume a single locale. In this post, you'll build a culture-safe CSV flow in PowerShell: detect the list separator from the current culture, parse numbers and dates consistently, and export with an explicit delimiter and UTF-8 for stable diffs in Git and CI/CD.

Why culture matters with CSVs

Modern teams and build agents run across multiple locales. Common pitfalls include:

  • List separator: Many European locales use semicolon (;) instead of comma (,) as the list separator when the decimal separator is a comma (1,23).
  • Decimal separator: 1,234 can mean one point two three four, not one thousand two hundred thirty-four.
  • Date formats: 01/02/2024 could be January 2nd or February 1st depending on culture.
  • Encoding and quoting: UTF-8 vs UTF-16, BOM vs no-BOM, and inconsistent quoting lead to noisy diffs and broken imports.

The fix is straightforward: read using the current culture's rules, normalize to predictable representations, and write with explicit settings.

A culture-safe import-to-export pipeline

Start by detecting the current culture, then import using its list separator. Parse numbers and dates using that culture, normalize to stable representations (e.g., ISO 8601 for dates), and export with an explicit delimiter and UTF-8.

$culture = Get-Culture
$inSep   = $culture.TextInfo.ListSeparator

$data = Import-Csv -Path '.\sales.csv' -Delimiter $inSep -Encoding utf8 -ErrorAction Stop |
  ForEach-Object {
    [pscustomobject]@{
      Customer = $_.Customer.Trim()
      # Parse numbers using the current culture (supports decimal and thousands separators)
      Amount   = [double]::Parse($_.Amount, [System.Globalization.NumberStyles]::Number, $culture)
      # Parse local date, convert to UTC, and output ISO 8601 (round-trippable and sortable)
      DateUtc  = [datetime]::Parse($_.Date, $culture).ToUniversalTime().ToString('o')
    }
  }

# Normalize output: explicit delimiter and UTF-8 for stable diffs across environments
$outSep = ','
$data |
  Export-Csv -Path '.\sales.normalized.csv' -NoTypeInformation -Delimiter $outSep -Encoding utf8

Write-Host ("Rows: {0}  In='{1}' Out='{2}'" -f $data.Count, $inSep, $outSep)

This flow prevents locale-induced parsing bugs while giving you stable, consistent output suitable for Git, code review, and downstream systems.

Stronger parsing with TryParse and explicit formats

If you expect variations in input, use TryParse or ParseExact with allowed patterns. This lets you fail fast with a clear error when a value is malformed.

function ConvertTo-NormalizedRow {
  param(
    [Parameter(Mandatory)] $Row,
    [System.Globalization.CultureInfo] $Culture = (Get-Culture)
  )

  $amountStyles = [System.Globalization.NumberStyles]::Number
  if (-not [double]::TryParse($Row.Amount, $amountStyles, $Culture, [ref]$parsedAmount)) {
    throw "Invalid Amount '$($Row.Amount)' for culture $($Culture.Name)."
  }

  # Accept multiple date shapes, but still honor the input culture
  $datePatterns = @('yyyy-MM-dd', 'dd.MM.yyyy', 'M/d/yyyy', 'dd/MM/yyyy', 'yyyy-MM-ddTHH:mm:ss', 'o')
  $parsedDate = $null
  foreach ($p in $datePatterns) {
    if ([datetime]::TryParseExact($Row.Date, $p, $Culture, [System.Globalization.DateTimeStyles]::AssumeLocal, [ref]$parsedDate)) { break }
  }
  if (-not $parsedDate) {
    # Fallback: general parse as a last resort
    if (-not [datetime]::TryParse($Row.Date, $Culture, [System.Globalization.DateTimeStyles]::AssumeLocal, [ref]$parsedDate)) {
      throw "Invalid Date '$($Row.Date)' for culture $($Culture.Name)."
    }
  }

  [pscustomobject]@{
    Customer = $Row.Customer.Trim()
    Amount   = $parsedAmount
    DateUtc  = $parsedDate.ToUniversalTime().ToString('o')
  }
}

$data = Import-Csv -Path '.\sales.csv' -Delimiter (Get-Culture).TextInfo.ListSeparator |
  ForEach-Object { ConvertTo-NormalizedRow -Row $_ }

$data | Export-Csv '.\sales.normalized.csv' -NoTypeInformation -Delimiter ',' -Encoding utf8

Header hygiene and column validation

  • Trim headers and "language-powershell">$expected = @('Customer','Amount','Date') $raw = Get-Content '.\sales.csv' -Encoding utf8 $firstLine = $raw | Select-Object -First 1 $sep = (Get-Culture).TextInfo.ListSeparator $headers = $firstLine.Split($sep).ForEach({ $_.Trim() }) if (@(Compare-Object -ReferenceObject $expected -DifferenceObject $headers -SyncWindow 0).Count -gt 0) { throw "Unexpected columns. Expected: $($expected -join ', ') Actual: $($headers -join ', ')" }

    Encoding, delimiters, and stable diffs

    For Git-friendly, reproducible artifacts:

    • Delimiter: Choose and declare one explicitly (commas are common for normalized output). Avoid relying on defaults.
    • UTF-8: Export using UTF-8 so editors and tools across OSes read it correctly.
    • BOM consistency: Prefer UTF-8 without BOM for clean diffs. Newer PowerShell versions write UTF-8 without BOM by default for many cmdlets when you specify -Encoding utf8.

    If you must force UTF-8 without BOM in older hosts, use a .NET StreamWriter and ConvertTo-Csv:

    $out = '.\sales.normalized.csv'
    $rows = $data | ConvertTo-Csv -NoTypeInformation -Delimiter ','
    $utf8NoBom = New-Object System.Text.UTF8Encoding($false)
    $sw = New-Object System.IO.StreamWriter($out, $false, $utf8NoBom)
    try {
      foreach ($line in $rows) { $sw.WriteLine($line) }
    }
    finally {
      $sw.Dispose()
    }

    Tip: If your consumers are Excel users in locales where comma is the decimal separator, exporting with a semicolon delimiter can be friendlier for them. Keep your normalization rule documented and consistent across repos and pipelines.

    Automation and CI/CD integration

    Normalize at the edges

    Normalize CSV files as soon as you ingest them. Place normalization scripts in dedicated modules or tools/ folders and call them from data pipelines, scheduled jobs, or CI. This yields:

    • Fewer "works on my machine" issues due to locale.
    • Smaller diffs and more stable pull requests.
    • Predictable downstream behavior (databases, analytics, APIs).

    Lock culture in CI for reproducibility

    You can set a predictable culture for the current process (useful in headless CI runners) while still reading per-file separators when needed.

    # In CI bootstrap (per PowerShell process)
    $c = [System.Globalization.CultureInfo]::GetCultureInfo('en-US')
    [System.Globalization.CultureInfo]::DefaultThreadCurrentCulture = $c
    [System.Globalization.CultureInfo]::DefaultThreadCurrentUICulture = $c
    
    # Still detect the input separator from the file's originating culture when importing
    $inSep = (Get-Culture).TextInfo.ListSeparator

    Add a normalization job in GitHub Actions (example):

    name: Normalize CSVs
    on: [push, pull_request]
    jobs:
      normalize:
        runs-on: ubuntu-latest
        steps:
          - uses: actions/checkout@v4
          - uses: actions/setup-dotnet@v4
            with:
              dotnet-version: '8.0.x'
          - uses: PowerShell/PowerShell@v1
          - name: Normalize
            shell: pwsh
            run: |
              pwsh -NoLogo -NoProfile -Command "& { ./tools/Normalize-Sales.ps1 }"
          - name: Verify clean
            run: |
              if [[ -n $(git status --porcelain) ]]; then echo 'CSV normalization changed files. Commit updated artifacts.' && git diff && exit 1; fi

    Test what you parse

    Use Pester to ensure your parser accepts valid values and rejects malformed ones across locales.

    Describe 'CSV normalization' {
      It 'parses Amount with culture' {
        $c = [System.Globalization.CultureInfo]::GetCultureInfo('fr-FR')
        $row = [pscustomobject]@{ Customer='Acme'; Amount='1 234,56'; Date='01/02/2024' }
        $n = ConvertTo-NormalizedRow -Row $row -Culture $c
        $n.Amount | Should -Be 1234.56
      }
    
      It 'emits ISO 8601 UTC dates' {
        $row = [pscustomobject]@{ Customer='Acme'; Amount='10.00'; Date='2024-02-01' }
        $n = ConvertTo-NormalizedRow -Row $row
        $n.DateUtc | Should -Match '^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}\\.\\d{7}Z$'
      }
    }

    Performance and security tips

    Performance

    • Stream when possible: For very large files, prefer Get-Content -ReadCount with ConvertFrom-Csv to reduce peak memory usage.
    • Avoid repeated allocations: Reuse culture objects and precompiled patterns in tight loops.
    • Minimal string churn: Normalize once (e.g., to ISO 8601) and keep dates as strings if the target system expects strings; otherwise keep as DateTime until the final export.
    $culture = Get-Culture
    $sep = $culture.TextInfo.ListSeparator
    
    Get-Content '.\sales.csv' -Encoding utf8 |
      ConvertFrom-Csv -Delimiter $sep |
      ForEach-Object { ConvertTo-NormalizedRow -Row $_ -Culture $culture } |
      Export-Csv '.\sales.normalized.csv' -NoTypeInformation -Delimiter ',' -Encoding utf8

    Security and data quality

    • Validate inputs: Check column counts, required fields, and allowed ranges for numeric values.
    • Reject unexpected columns: Unknown columns may indicate schema drift or malicious input.
    • Guard against formula injection: If CSVs will be opened in spreadsheet apps, consider escaping values starting with =, +, -, or @ for user-facing exports.
    • Log parse failures: Don't silently coerce; emit clear errors and capture samples for triage.

    Practical checklist

    1. Detect the current culture and its list separator via Get-Culture.
    2. Import with -Delimiter $inSep and an explicit -Encoding.
    3. Parse numbers with [double]::Parse(..., NumberStyles::Number, $culture).
    4. Parse dates with TryParseExact (preferred) or Parse using the culture; normalize to ISO 8601 UTC.
    5. Validate headers and data early; fail fast.
    6. Export with explicit delimiter and UTF-8 for stable diffs.
    7. Automate normalization in CI and test with Pester.

    By making CSV imports culture-aware and exports explicit, you eliminate a whole class of flaky bugs, simplify reviews with stable artifacts, and create clearer, safer pipelines that work consistently across developer machines, servers, and containers.

    Further reading

    Build culture-safe data flows in PowerShell. Read the PowerShell Advanced CookBook → https://www.amazon.com/PowerShell-Advanced-Cookbook-scripting-advanced-ebook/dp/B0D5CPP2CQ/

    ← All Posts Home →