Streaming Millions of CSV Rows in PowerShell with Import-Csv -ReadCount

Published on February 21, 2026 | Category: Automation | Tags: powershell, csv, performance, data-processing, automation

When a CSV crosses the million-row line, naive patterns like read-all-then-process quickly exhaust memory and slow to a crawl. The fix is simple and powerful: stream the file in predictable chunks, parse with an invariant culture so numbers and dates are stable across locales, and append results as you go. In this post, youll learn a production-ready pattern that keeps memory flat, improves throughput, and yields consistent output.

Stream in Chunks with -ReadCount

Batching is the difference between an OOM spike and a smooth, steady run. By setting -ReadCount, you receive arrays of records (chunks) from Import-Csv and process each batch before the next one arrives. That keeps the working set bounded while still moving quickly through the file.

The core pipeline

$path = './large.csv'
$out  = './summary.csv'
$read = 1000
$fmt  = [Globalization.CultureInfo]::InvariantCulture

if (Test-Path $out) { Remove-Item -Path $out -Force }

Import-Csv -Path $path -Delimiter ',' -Encoding UTF8 -ReadCount $read | ForEach-Object {
  foreach ($r in $_) {
    $amt  = if ([string]::IsNullOrWhiteSpace($r.Amount)) { 0 } else { [double]::Parse($r.Amount, $fmt) }
    $date = [datetime]::Parse($r.Date, $fmt)
    [pscustomobject]@{ Day=$date.ToString('yyyy-MM-dd'); Amount=[math]::Round($amt,2); Account=$r.Account }
  } | Export-Csv -Path $out -NoTypeInformation -Append -Encoding UTF8
}
Write-Host ("Saved -> {0}" -f (Resolve-Path -Path $out))

Why this scales:

Bounded memory: Each batch is processed and released before the next one arrives.
Continuous output: Export-Csv -Append writes as you go, so you dont hold an in-memory result set.
Predictable CPU profile: Work per chunk is consistent; the GC has less to do.

Choosing the right -ReadCount

100 6,000 is a good starting window. Try 1000 or 5000 and benchmark.
Smaller batches reduce peak memory; larger batches reduce I/O flush frequency. Find the sweet spot for your disk and CPU.

Parse Numbers and Dates with Invariant Culture

CSV data often comes from multiple locales. Without an explicit culture, double and DateTime parsing can break on commas vs. dots or region-specific formats. Use [CultureInfo]::InvariantCulture and normalize early to keep your pipeline robust and testable.

Safe numeric parsing

Prefer TryParse for resilience. It avoids exceptions on bad rows and keeps your batch moving.

$fmt = [Globalization.CultureInfo]::InvariantCulture

function Parse-AmountOrDefault {
  param([string]$s, [double]$default = 0)
  if ([string]::IsNullOrWhiteSpace($s)) { return $default }
  [double]$val = 0
  if ([double]::TryParse($s, [Globalization.NumberStyles]::Float, $fmt, [ref]$val)) { return $val }
  return $default
}

Stable date handling

Apply consistent formats on output, even if the input is messy. If the input is guaranteed in ISO 8601 (yyyy-MM-dd or yyyy-MM-ddTHH:mm:ssZ), parsing is cheap and reliable.

function Parse-DateStrict {
  param([string]$s)
  # Try the most common stable patterns first
  $styles = [Globalization.DateTimeStyles]::AssumeUniversal -bor [Globalization.DateTimeStyles]::AdjustToUniversal
  $ok = [datetime]::TryParse($s, $fmt, $styles, [ref]([datetime]$d))
  if (-not $ok) { throw "Invalid date: $s" }
  return $d
}

Tip: On output, format with $date.ToString('yyyy-MM-dd') (date only) or o (round-trip ISO) for logging and re-ingestability.

Write Efficiently and Operate at Scale

Appending per chunk is convenient and fast enough for most workloads, but a few tweaks can shave seconds off long runs and keep files clean.

Efficient output with a single header

Delete any existing output file before starting so the header is written once.
Use -NoTypeInformation to avoid the type line at the top.
On PowerShell 7+, consider -UseQuotes AsNeeded with Export-Csv to reduce file size if your version supports it.

Improved end-to-end script

This version adds robust parsing, error handling, and lightweight progress without sacrificing streaming. It still keeps memory flat and throughput high.

param(
  [string]$Path = './large.csv',
  [string]$Out = './summary.csv',
  [int]$ReadCount = 2000
)

$fmt = [Globalization.CultureInfo]::InvariantCulture

function Parse-AmountOrDefault {
  param([string]$s, [double]$default = 0)
  if ([string]::IsNullOrWhiteSpace($s)) { return $default }
  [double]$val = 0
  if ([double]::TryParse($s, [Globalization.NumberStyles]::Float, $fmt, [ref]$val)) { return $val }
  return $default
}

function Parse-DateStrict {
  param([string]$s)
  $styles = [Globalization.DateTimeStyles]::AssumeUniversal -bor [Globalization.DateTimeStyles]::AdjustToUniversal
  if ([datetime]::TryParse($s, $fmt, $styles, [ref]([datetime]$d))) { return $d }
  throw "Invalid date: $s"
}

if (Test-Path $Out) { Remove-Item -Path $Out -Force }

$rows = 0
$sw = [System.Diagnostics.Stopwatch]::StartNew()

Import-Csv -Path $Path -Delimiter ',' -Encoding UTF8 -ReadCount $ReadCount |
ForEach-Object {
  # $_ is the current batch (array of PSCustomObject)
  $batch = $_
  $rows += $batch.Count
  Write-Progress -Activity 'Processing CSV' -Status ("Rows: {0:n0}" -f $rows)

  foreach ($r in $batch) {
    try {
      $amt  = Parse-AmountOrDefault $r.Amount 0
      $date = Parse-DateStrict $r.Date
      [pscustomobject]@{
        Day     = $date.ToString('yyyy-MM-dd')
        Amount  = [math]::Round($amt, 2)
        Account = $r.Account
      }
    }
    catch {
      # Option A: skip bad rows, or write to a side log
      # Write-Warning $_
    }
  } | Export-Csv -Path $Out -NoTypeInformation -Append -Encoding UTF8
}

$sw.Stop()
Write-Host ("Wrote {0:n0} rows in {1:n1}s -> {2}" -f $rows, $sw.Elapsed.TotalSeconds, (Resolve-Path -Path $Out))

Performance tweaks that often help

Batch size: Increase -ReadCount until CPU stays busy and disk flushes arent your bottleneck.
Avoid string concatenation in loops: Build PSCustomObjects and let Export-Csv serialize.
Validate early: Parse and normalize values as soon as they enter the pipeline.
Prefer TryParse: Exceptions across millions of rows compound quickly.
Minimize per-row allocations: Reuse CultureInfo and format strings; avoid creating transient objects.

Chunked aggregations without full in-memory group-by

Need summaries but cant afford a giant hashtable in RAM? Aggregate per chunk and fold into a compact running dictionary. You still keep memory bounded.

$totals = @{}

Import-Csv -Path $path -Delimiter ',' -Encoding UTF8 -ReadCount 5000 | ForEach-Object {
  foreach ($r in $_) {
    $d = [datetime]::Parse($r.Date, $fmt).ToString('yyyy-MM-dd')
    $a = [double]::Parse($r.Amount, $fmt)
    $key = "${d}|${r.Account}"
    if (-not $totals.ContainsKey($key)) { $totals[$key] = 0.0 }
    $totals[$key] += $a
  }
}

# Stream the final totals out once
$totals.GetEnumerator() | ForEach-Object {
  $parts = $_.Key.Split('|', 2)
  [pscustomobject]@{ Day = $parts[0]; Account = $parts[1]; Amount = [math]::Round($_.Value, 2) }
} | Export-Csv -Path $out -NoTypeInformation -Encoding UTF8

If the number of unique groups is still huge, spill interim maps to disk by batch and merge in a second pass.

Operational tips and pitfalls

Encoding: On PowerShell 7+, -Encoding utf8 is UTF-8 without BOM; Windows PowerShell 5.1 writes a BOM. Keep consistent across systems that re-ingest the CSV.
Delimiters and quotes: Always specify -Delimiter. If your input uses quotes heavily or has embedded delimiters, ensure the producer escapes correctly; test with a small sample.
Schema drift: Guard against missing columns (e.g., $r.PSObject.Properties.Match('Amount')) and default gracefully.
Observability: Add periodic Write-Host or Write-Progress and capture metrics like rows/sec.
Idempotency: Remove or rotate the output file at the start; append in deterministic order.

What you get: lower memory usage, faster runs, safer parsing, and predictable output. These patterns also translate to logs, IoT telemetry, financial transaction streams, and any bulk import where you cant afford to keep the whole set in memory.

Process data at scale with practical patterns. Read the PowerShell Advanced CookBook https://www.amazon.com/PowerShell-Advanced-Cookbook-scripting-advanced-ebook/dp/B0D5CPP2CQ/