TB

MoppleIT Tech Blog

Welcome to my personal blog where I share thoughts, ideas, and experiences.

Predictable API Pagination in PowerShell: Link Headers, Next Cursors, Rate Limits, and Safe Stop Conditions

When you automate API data pulls, predictability beats speed. A predictable pagination loop ensures you don’t miss records, don’t hammer rate limits, and always know exactly where you stopped. In this post, you’ll build a resilient PowerShell pattern that follows next links from either RFC 5988 Link headers or response fields, caps pages, sleeps gently between calls, stops cleanly on empty results, and logs where you ended. What you get: no missing data, safer loops, clearer logs, and repeatable runs.

Principles for Predictable Pagination

  • Detect the next page reliably: Prefer the Link response header with rel="next". If unavailable, use a response field like next, next_url, or a cursor token (e.g., meta.next_token).
  • Cap total pages: Guard against infinite loops and bad server responses with a MaxPages limit.
  • Respect rate limits: Add a small sleep between requests and honor Retry-After, X-RateLimit-Remaining, and X-RateLimit-Reset when available.
  • Stop on empty pages: If the page has no items, stop. This avoids redundant calls and makes end-of-data behavior deterministic.
  • Log where you ended: Emit clear messages that include the page number, URI, and total items pulled. Optionally persist state so you can resume later.

A Solid Baseline Loop in PowerShell

Start with a clear, defensive pattern. The snippet below follows a next link from either a Link header or a response field, caps the pages, injects a small sleep, and stops on empty results.

$base = 'https://api.example.com/items'
$limit = 100
$max   = 20
$all   = @()
$page  = 1

while ($true) {
  if ($page -gt $max) { break }
  $uri = '{0}?page={1}&limit={2}' -f $base, $page, $limit
  try {
    $r   = Invoke-WebRequest -Uri $uri -Method Get -ErrorAction Stop
    $obj = $r.Content | ConvertFrom-Json -Depth 10
    if (-not $obj.data -or $obj.data.Count -eq 0) { break }
    $all += $obj.data

    $next = $null
    if ($r.Headers['Link']) {
      $links = $r.Headers['Link'] -split ','
      $next  = ($links | Where-Object { $_ -match 'rel=\"?next\"?' } | ForEach-Object { ($_ -split ';')[0].Trim('<> ') }) | Select-Object -First 1
    } elseif ($obj.next) {
      $next = $obj.next
    }

    if ($next) { $page++; Start-Sleep -Milliseconds 150 } else { break }
  } catch {
    Write-Warning ('Stop at page {0}: {1}' -f $page, $_.Exception.Message)
    break
  }
}

$all | Select-Object -First 3 Name, Id

This baseline already checks the essential boxes. From here, you can harden it for production and broader API shapes.

Hardening for Production

1) Prefer Invoke-RestMethod + headers, parse Link safely

Invoke-RestMethod conveniently converts JSON to PowerShell objects. Use -ResponseHeadersVariable to still access headers. The helper below detects the next page using either a Link header, a next field, or common cursor tokens. It caps pages, handles 429s with Retry-After, sleeps politely, stops on empty pages, and logs where it ended. You can also persist state to resume later.

function Get-NextLinkFromHeader {
  param(
    [Parameter(Mandatory)][string[]]$LinkHeader
  )
  foreach ($h in $LinkHeader) {
    foreach ($part in ($h -split ',')) {
      if ($part -match '<(?<url>[^>]+)>\s*;\s*rel=\"?next\"?') {
        return $Matches['url']
      }
    }
  }
  return $null
}

function Get-PaginatedItems {
  [CmdletBinding()]
  param(
    [Parameter(Mandatory)][string]$BaseUri,
    [int]$Limit = 100,
    [int]$MaxPages = 20,
    [int]$StartPage = 1,
    [int]$SleepMs = 150,
    [string]$NextField = 'next',
    [string]$StatePath
  )

  $all = New-Object System.Collections.Generic.List[object]
  $page = $StartPage
  $cursor = $null

  while ($true) {
    if ($page -gt $MaxPages) {
      Write-Warning "Reached MaxPages ($MaxPages). Stopping."
      break
    }

    if ($cursor) {
      $uri = '{0}?limit={1}&cursor={2}' -f $BaseUri, $Limit, [uri]::EscapeDataString($cursor)
    } else {
      $uri = '{0}?page={1}&limit={2}' -f $BaseUri, $page, $Limit
    }

    try {
      $respHeaders = $null
      $resp = Invoke-RestMethod -Uri $uri -Method Get -ErrorAction Stop -ResponseHeadersVariable respHeaders
    } catch {
      if ($_.Exception.Response -and ($_.Exception.Response.StatusCode.value__ -eq 429)) {
        $retryAfter = 1
        try { if ($_.Exception.Response.Headers['Retry-After']) { $retryAfter = [int]$_.Exception.Response.Headers['Retry-After'] } } catch {}
        Write-Warning ("429 received. Sleeping for {0}s before retry..." -f $retryAfter)
        Start-Sleep -Seconds $retryAfter
        continue
      } else {
        Write-Warning ("Stopping at page {0}. Error: {1}" -f $page, $_.Exception.Message)
        break
      }
    }

    # Normalize data array detection
    $data = $null
    if ($resp -is [System.Collections.IEnumerable] -and -not ($resp -is [string])) {
      $data = $resp
    } elseif ($resp.PSObject.Properties.Name -contains 'data') {
      $data = $resp.data
    } elseif ($resp.PSObject.Properties.Name -contains 'items') {
      $data = $resp.items
    }

    if (-not $data -or ($data | Measure-Object).Count -eq 0) {
      Write-Information ("Empty page at {0}. Stopping." -f $uri) -InformationAction Continue
      break
    }

    [void]$all.AddRange($data)

    # Try Link header first
    $nextLink = $null
    if ($respHeaders -and $respHeaders['Link']) {
      $nextLink = Get-NextLinkFromHeader -LinkHeader $respHeaders['Link']
    }

    # Try explicit next field or common cursor tokens
    $nextFieldValue = $null
    if ($resp.PSObject.Properties.Name -contains $NextField) { $nextFieldValue = $resp.$NextField }
    elseif ($resp.PSObject.Properties.Name -contains 'meta' -and $resp.meta.PSObject.Properties.Name -contains 'next_token') { $nextFieldValue = $resp.meta.next_token }

    if ($nextLink) {
      $page++
    } elseif ($nextFieldValue) {
      # Treat as cursor when it looks like a token, else assume it is a full URL
      if ($nextFieldValue -is [string] -and $nextFieldValue -notmatch '^https?://') {
        $cursor = $nextFieldValue
        $page++
      } else {
        $page++
      }
    } else {
      Write-Information ("No next link found at {0}. Stopping." -f $uri) -InformationAction Continue
      break
    }

    # Respect response rate hints when present
    if ($respHeaders -and $respHeaders['Retry-After']) {
      Start-Sleep -Seconds ([int]$respHeaders['Retry-After'])
    } else {
      Start-Sleep -Milliseconds $SleepMs
    }
  }

  # Optional: persist state
  if ($PSBoundParameters.ContainsKey('StatePath')) {
    $state = [ordered]@{
      endedAt = (Get-Date).ToString('o')
      baseUri = $BaseUri
      lastPage = $page
      totalItems = $all.Count
    } | ConvertTo-Json -Depth 5
    $state | Set-Content -Path $StatePath -Encoding UTF8
  }

  return $all
}

# Example usage
$items = Get-PaginatedItems -BaseUri 'https://api.example.com/items' -Limit 100 -MaxPages 20 -SleepMs 150 -StatePath './last-run.json'
$items.Count
$items | Select-Object -First 3 Name, Id

2) Validate completeness and deduplicate

  • Cross-check counts: If the API exposes a total or count field, assert that your final item count matches it.
  • Deduplicate by ID: Some APIs can return overlapping windows. Use Group-Object Id to verify uniqueness or store to a hashtable keyed by ID.
$unique = $items | Group-Object Id | Where-Object { $_.Count -gt 1 }
if ($unique) { Write-Warning ("Found duplicate IDs: {0}" -f ($unique.Name -join ', ')) }

3) Handle cursor-based APIs gracefully

Many APIs don’t use page numbers at all—they return a cursor or continuation token. You can adapt the loop by switching to a cursor query parameter and stopping when the server no longer returns a token.

$base = 'https://api.example.com/items'
$limit = 200
$cursor = $null
$all = @()

while ($true) {
  $uri = if ($cursor) { '{0}?limit={1}&cursor={2}' -f $base, $limit, [uri]::EscapeDataString($cursor) } else { '{0}?limit={1}' -f $base, $limit }
  $respHeaders = $null
  $r = Invoke-RestMethod -Uri $uri -Method Get -ResponseHeadersVariable respHeaders -ErrorAction Stop
  if (-not $r.data -or $r.data.Count -eq 0) { break }
  $all += $r.data
  $cursor = if ($r.PSObject.Properties.Name -contains 'next') { $r.next } elseif ($r.meta.next_token) { $r.meta.next_token } else { $null }
  if (-not $cursor) { break }
  Start-Sleep -Milliseconds 150
}

4) Observability: progress, logging, and resumability

  • Progress bars: Use Write-Progress with current page and total seen to make runs transparent.
  • Structured logs: Emit JSON lines with page, URI, count. Store the final state to a file to enable resumes.
  • Clear stop reasons: Always log whether you stopped due to MaxPages, empty data, or missing next token.
Write-Progress -Activity 'Pulling items' -Status ("Page {0}" -f $page) -PercentComplete (([double]$page / $MaxPages) * 100)
Write-Information ("Fetched {0} items from {1}" -f $data.Count, $uri) -InformationAction Continue

5) Defensive extras

  • Time-bound runs: Add a timeout guard using [Diagnostics.Stopwatch]::StartNew() and stop after a maximum runtime.
  • Memory footprint: Stream processing where possible (write to disk in chunks) instead of keeping entire datasets in memory for very large pulls.
  • Security: Keep tokens out of logs; prefer -Headers with a SecureString or environment variable and redact sensitive values before logging URIs.

By combining Link header parsing, flexible next-token handling, page caps, respectful sleeps, and explicit stop conditions, you’ll get deterministic pagination runs that are simple to reason about and easy to operate. Use the baseline snippet when you need a quick pull, and upgrade to the hardened helper for production-grade jobs and CI/CD pipelines.

← All Posts Home →