TB

MoppleIT Tech Blog

Welcome to my personal blog where I share thoughts, ideas, and experiences.

Safer CSV Parsing in PowerShell: Types, Culture, and Decimal-Accurate Totals

CSV files look simple, but they can quietly break your automations when data comes from different locales, editors, or systems. Decimal separators change, dates get ambiguous, encodings vary, and headers shift. In PowerShell, you can make CSV input predictable by locking down culture, validating headers, casting columns to concrete types, and computing money-like values with decimal precision. In this guide, you'll build a robust, production-ready parsing pattern that works reliably on your machine, in CI, and across your team's environments.

Why CSVs Break Across Locales (and How You Fix It)

Common sources of bugs

  • Culture differences: 10,50 vs 10.50 for decimals; 01/02/2024 as mm/dd or dd/mm?
  • Delimiters vary: comma vs semicolon (common when Excel uses system list separator).
  • Encoding surprises: UTF-8 with/without BOM; stray non-ASCII characters.
  • Header drift: required columns missing or renamed.
  • Type ambiguity: everything is a string until you explicitly cast and validate.

Design goals for dependable imports

  • Be explicit about delimiter and encoding.
  • Use InvariantCulture for parsing numbers and dates you control.
  • Validate headers before touching data.
  • Cast to concrete types and fail fast with actionable errors.
  • Do monetary math with decimal, not double.

A Predictable Import Recipe (Culture + Types + Totals)

Start with a clear baseline that locks down headers, culture, and types. The snippet below enforces required headers, parses using InvariantCulture, and computes a decimal-accurate Total.

$path = '.\data.csv'
$rows = Import-Csv -Path $path -Delimiter ',' -Encoding utf8 -ErrorAction Stop

if (-not $rows) { throw 'Empty CSV' }
$required = @('Date','Item','Qty','Price')
$headers  = @($rows[0].PSObject.Properties.Name)
$missing  = $required | Where-Object { $headers -notcontains $_ }
if ($missing) { throw ('Missing headers: {0}' -f ($missing -join ', ')) }

$culture = [Globalization.CultureInfo]::InvariantCulture
$parsed = $rows | ForEach-Object {
  [pscustomobject]@{
    Date  = [datetime]::Parse($_.Date, $culture)
    Item  = $_.Item
    Qty   = [int]$_.Qty
    Price = [decimal]::Parse($_.Price, $culture)
    Total = ([int]$_.Qty) * ([decimal]::Parse($_.Price, $culture))
  }
}

$parsed | Sort-Object Date | Select-Object -First 3 Item, Qty, Price, Total

This alone prevents many locale-related bugs. For data you don't fully control, harden further with validation and TryParse patterns.

Hardening the Parser for Real-World Data

A reusable, defensive function

Wrap parsing in a function that validates headers, trims input, uses TryParse/ParseExact with InvariantCulture by default, and emits clear errors with row indexes.

function Import-SafeCsv {
  [CmdletBinding()]
  param(
    [Parameter(Mandatory)]
    [string]$Path,

    [string[]]$RequiredHeaders = @('Date','Item','Qty','Price'),

    [string[]]$DateFormats = @('yyyy-MM-dd','MM/dd/yyyy','dd/MM/yyyy'),

    [char]$Delimiter = ',',

    [System.Text.Encoding]$Encoding = [System.Text.Encoding]::UTF8,

    [System.Globalization.CultureInfo]$Culture = [System.Globalization.CultureInfo]::InvariantCulture
  )

  $ErrorActionPreference = 'Stop'
  if (-not (Test-Path -LiteralPath $Path)) { throw "File not found: $Path" }

  $rows = Import-Csv -Path $Path -Delimiter $Delimiter -Encoding $Encoding -ErrorAction Stop
  if (-not $rows) { throw 'Empty CSV' }

  # Header validation
  $headers = @($rows[0].PSObject.Properties.Name)
  $missing = $RequiredHeaders | Where-Object { $headers -notcontains $_ }
  if ($missing) { throw ('Missing headers: {0}' -f ($missing -join ', ')) }

  $dateStyles   = [System.Globalization.DateTimeStyles]::AssumeLocal
  $numberStyles = [System.Globalization.NumberStyles]::Number

  $index = 0
  $parsed = foreach ($r in $rows) {
    $index++
    # Normalize/trim values
    $rawDate = ($r.Date -as [string]).Trim()
    $rawItem = ($r.Item -as [string]).Trim()
    $rawQty  = ($r.Qty  -as [string]).Trim()
    $rawPrice= ($r.Price-as [string]).Trim()

    if ([string]::IsNullOrWhiteSpace($rawItem)) { throw "Row $index: Item is required" }

    # Date
    [datetime]$date = $null
    if (-not [datetime]::TryParseExact($rawDate, $DateFormats, $Culture, $dateStyles, [ref]$date)) {
      throw "Row $index: Invalid Date '$rawDate' (expected one of: $($DateFormats -join ', '))"
    }

    # Qty
    [int]$qty = 0
    if (-not [int]::TryParse($rawQty, [ref]$qty)) {
      throw "Row $index: Invalid Qty '$rawQty' (expected integer)"
    }

    # Price
    [decimal]$price = [decimal]::Zero
    if (-not [decimal]::TryParse($rawPrice, $numberStyles, $Culture, [ref]$price)) {
      throw "Row $index: Invalid Price '$rawPrice' (expected decimal with '.' as separator)"
    }

    [pscustomobject]@{
      Date  = $date
      Item  = $rawItem
      Qty   = $qty
      Price = $price
      Total = [decimal]($qty) * $price
      Row   = $index
    }
  }

  return $parsed
}

Usage:

$orders = Import-SafeCsv -Path '.\data.csv'

# Sort, take top 3, and display
$orders | Sort-Object Date | Select-Object -First 3 Item, Qty, Price, Total

# Accurate aggregation with decimal
[decimal]$grandTotal = 0
$orders | ForEach-Object { $grandTotal += $_.Total }
"Grand total: $grandTotal"

When to use system culture

  • Files produced by Excel on users' desktops may use the system list separator. In that case, read with the user's culture explicitly: -Delimiter ((Get-Culture).TextInfo.ListSeparator).
  • Prefer InvariantCulture inside your code paths and convert/normalize at the edges.

Performance tips for large files

  • Stream and project only needed columns to reduce memory pressure. Example: read, validate headers once, then process rows in a foreach loop as shown above.
  • Avoid Measure-Object for money totals if you need exact decimals; iterative accumulation in a [decimal] variable is both precise and fast.
  • Use -Encoding utf8 explicitly to avoid costly mis-encodings and re-reads.

Security and correctness notes

  • Fail fast: throw on missing headers, empty files, or bad values. Your CI should exit non-zero rather than silently producing wrong numbers.
  • Sanitize if you later export to CSV for Excel consumption; prefix fields that start with =, +, -, or @ to mitigate CSV formula injection in spreadsheets.
  • Do not execute commands or expressions from CSV data. Treat all fields as untrusted input.

CI/CD integration example (GitHub Actions)

Lock in predictable imports in your pipeline to catch issues before production.

- name: Validate and convert CSV
  shell: pwsh
  run: |
    .\scripts\Import-SafeCsv.ps1
    $rows = Import-SafeCsv -Path 'data/sales.csv'
    if (-not $rows) { throw 'No rows parsed' }
    [decimal]$grand = 0
    $rows | ForEach-Object { $grand += $_.Total }
    Write-Host "Grand total: $grand"
    $rows | ConvertTo-Json -Depth 3 | Set-Content -Path build/sales.json -Encoding utf8

Troubleshooting checklist

  • Numbers not parsing? Confirm the decimal separator in the source and enforce InvariantCulture with [decimal]::Parse(..., [CultureInfo]::InvariantCulture).
  • Dates ambiguous? Use ParseExact/TryParseExact with a small, explicit list of formats.
  • Wrong delimiter? Don't rely on defaults; set -Delimiter every time.
  • Weird characters? Force -Encoding utf8 and ensure the producer writes UTF-8 as well.

By validating headers, fixing culture, and casting eagerly to the right types, you eliminate an entire class of flaky CSV bugs. Your scripts become deterministic across locales, your totals are penny-accurate, and your CI catches drift before it reaches production.

← All Posts Home →