TB

MoppleIT Tech Blog

Welcome to my personal blog where I share thoughts, ideas, and experiences.

Safer Service Restarts in PowerShell: Timeouts, Deadlines, and Predictable Outcomes

Windows services are the backbone of many production workloads, but restarts that hang, mask errors, or race other automation can trigger outages. The cure is discipline: make every restart deliberate, bounded by time, and observable. In this post, you will learn a proven pattern for safer service restarts in PowerShell using polling with Stopwatch, hard deadlines, and explicit logging so you get predictable outcomes every time.

The restart contract: timeouts, verification, and logs

A safe restart is not just a Stop followed by Start. It is a contract:

  • You stop the service.
  • You verify it reached the Stopped state within a hard deadline.
  • You start the service.
  • You verify it reached the Running state within a hard deadline.
  • You log each step and error out if the contract is violated.

Minimal, disciplined restart with polling and deadlines

Here is a compact script that stops, verifies, then starts a service with clear deadlines and errors. It polls on a 200 ms cadence using [Diagnostics.Stopwatch] so you never wait indefinitely.

param(
  [Parameter(Mandatory)]
  [string]$Name,
  [int]$TimeoutSec = 20
)

function Wait-Status {
  param([string]$Svc,[string]$Status,[int]$Timeout)
  $sw = [Diagnostics.Stopwatch]::StartNew()
  while ((Get-Service -Name $Svc).Status.ToString() -ne $Status -and $sw.Elapsed.TotalSeconds -lt $Timeout) {
    Start-Sleep -Milliseconds 200
  }
  if ((Get-Service -Name $Svc).Status.ToString() -ne $Status) {
    throw ('Timeout waiting for {0} -> {1}' -f $Svc, $Status)
  }
}

try {
  $svc = Get-Service -Name $Name -ErrorAction Stop
  if ($svc.Status -eq 'Running' -and $svc.CanStop) { Stop-Service -Name $Name -ErrorAction Stop }
  Wait-Status -Svc $Name -Status 'Stopped' -Timeout $TimeoutSec
  Start-Service -Name $Name -ErrorAction Stop
  Wait-Status -Svc $Name -Status 'Running' -Timeout $TimeoutSec
  Write-Host ('Restarted: {0}' -f $Name)
} catch {
  Write-Warning ('Failed: {0}' -f $_.Exception.Message)
}

Why this works well:

  • Deterministic timing: Stopwatch gives you a hard cutoff. No infinite loops or unbounded waits.
  • Clear errors: When the service does not reach the expected state in time, the script throws with a precise message.
  • Poll cadence: 200 ms keeps load low but responsive. Tune for your environment.

For production, you will likely want richer logging, retries, dependent service handling, and -WhatIf/-Confirm support. Let’s harden it.

Production-ready function: Restart-ServiceSafe

The function below adds:

  • SupportsShouldProcess: Safe dry-runs via -WhatIf.
  • Retries: Optional retry loop for transient SCM hiccups.
  • Dependent services: Optionally stop dependents first, then bring them back.
  • Structured logs: File-backed logs plus Write-Information for pipeline-friendly automation.
  • Hard deadlines: Polling with Stopwatch to enforce strict state transitions.
function Restart-ServiceSafe {
  [CmdletBinding(SupportsShouldProcess=$true, ConfirmImpact='Medium')]
  param(
    [Parameter(Mandatory)][string]$Name,
    [int]$TimeoutSec = 30,
    [int]$PollMs = 200,
    [int]$Retries = 0,
    [switch]$IncludeDependents,
    [string]$LogPath
  )

  function Write-Log { param([string]$Msg,[string]$Level='INFO')
    $ts = (Get-Date).ToString('s')
    $line = '{0} [{1}] {2}' -f $ts,$Level,$Msg
    if ($LogPath) { Add-Content -Path $LogPath -Value $line }
    Write-Information $line
  }

  function Wait-Status { param([string]$Svc,[string]$Status,[int]$Timeout,[int]$Poll=$PollMs)
    $sw = [Diagnostics.Stopwatch]::StartNew()
    while ($sw.Elapsed.TotalSeconds -lt $Timeout) {
      $s = Get-Service -Name $Svc -ErrorAction Stop
      if ($s.Status.ToString() -eq $Status) { return }
      Start-Sleep -Milliseconds $Poll
    }
    throw ('Timeout waiting for {0} -> {1} in {2}s' -f $Svc,$Status,$Timeout)
  }

  $attempt = 0
  do {
    $attempt++
    $opSw = [Diagnostics.Stopwatch]::StartNew()
    try {
      $svc = Get-Service -Name $Name -ErrorAction Stop
      $dependents = @()
      if ($IncludeDependents) { $dependents = $svc.DependentServices }

      if ($PSCmdlet.ShouldProcess($Name,'restart')) {
        if ($dependents.Count -gt 0) {
          Write-Log -Msg ('Stopping {0} dependent service(s)...' -f $dependents.Count)
          foreach ($d in $dependents) {
            Write-Log -Msg ('Stopping dependent {0}' -f $d.Name)
            Stop-Service -Name $d.Name -ErrorAction Stop
          }
          foreach ($d in $dependents) { Wait-Status -Svc $d.Name -Status 'Stopped' -Timeout $TimeoutSec }
        }

        if ($svc.Status -eq 'Running') {
          if (-not $svc.CanStop) { throw 'Service cannot be stopped (CanStop = False).' }
          Write-Log -Msg ('Stopping {0} (attempt {1})' -f $Name,$attempt)
          Stop-Service -Name $Name -ErrorAction Stop
          Write-Log -Msg 'Waiting for Stopped...'
          Wait-Status -Svc $Name -Status 'Stopped' -Timeout $TimeoutSec
        } else {
          Write-Log -Msg ('Already {0}' -f $svc.Status)
        }

        Write-Log -Msg ('Starting {0}' -f $Name)
        Start-Service -Name $Name -ErrorAction Stop
        Write-Log -Msg 'Waiting for Running...'
        Wait-Status -Svc $Name -Status 'Running' -Timeout $TimeoutSec

        if ($dependents.Count -gt 0) {
          foreach ($d in $dependents) {
            Write-Log -Msg ('Starting dependent {0}' -f $d.Name)
            Start-Service -Name $d.Name -ErrorAction Continue
          }
        }

        $opSw.Stop()
        Write-Log -Msg ('Success: {0} restarted in {1:n2}s' -f $Name,$opSw.Elapsed.TotalSeconds) -Level 'INFO'
        return [pscustomobject]@{
          Name        = $Name
          Attempt     = $attempt
          Status      = 'Running'
          DurationSec = [math]::Round($opSw.Elapsed.TotalSeconds,2)
          Timestamp   = Get-Date
        }
      }
    } catch {
      Write-Log -Msg ('Attempt {0} failed: {1}' -f $attempt,$_.Exception.Message) -Level 'WARN'
      if ($attempt -le $Retries) { Start-Sleep -Seconds 1 } else { throw }
    }
  } while ($attempt -le $Retries)
}

Usage examples

# Dry run first
Restart-ServiceSafe -Name 'Spooler' -WhatIf

# Real restart with 30s deadlines and a single retry
Restart-ServiceSafe -Name 'Spooler' -TimeoutSec 30 -Retries 1 -InformationAction Continue

# Restart, handle dependent services, and log to a file
Restart-ServiceSafe -Name 'W32Time' -IncludeDependents -LogPath 'C:\\Logs\\service-restarts.log' -InformationAction Continue

# Batch restart with consistent behavior
'Spooler','W32Time' | ForEach-Object { Restart-ServiceSafe -Name $_ -TimeoutSec 45 -Retries 2 -InformationAction Continue }

Operational tips and patterns

Make deadlines part of your reliability posture

  • Use hard deadlines per phase: Stop, verify Stopped, Start, verify Running. If any step exceeds its deadline, fail fast and surface the error.
  • Standardize poll cadence: 100–500 ms is usually sufficient. Extremely tight polling increases SCM chatter without benefit.
  • Prefer graceful stop before force: Try a normal Stop-Service first. Only introduce -Force behind a feature flag and after explicit waiting for StopPending to settle.

Handle dependencies deliberately

  • Stop dependents first: Services that depend on the target should be stopped before stopping the target. Bring them back after the main service is Running.
  • Beware of disabled services: If a service is Disabled, a restart won’t work. Consider detecting Disabled via CIM (Win32_Service) and temporarily switching to Manual if your change window and policy allow it.

Observability and logging

  • Log each step: At minimum: attempting stop, reached Stopped, attempting start, reached Running, total duration, and any exception message.
  • Use Write-Information: It’s pipeline-friendly and controllable via -InformationAction. For long-term storage, append to a file or emit to your central log collector.
  • Correlate actions: Include a correlation ID (e.g., deployment ID) in each message to tie restarts to rollouts.

Automation and CI/CD integration

  • Pre-flight checks: Confirm the service exists, is not in an InstallPending state, and that you have rights. Fail fast before change windows are burned.
  • Health checks around restarts: After the service reaches Running, validate application health (HTTP 200, TCP port open, or a custom readiness script) before proceeding.
  • Rollback criteria: If health checks fail after a bounded time, abort your pipeline and alert. Don’t keep retrying blind restarts.

Security and safety

  • Least privilege: Run under an account restricted to the required services. Avoid full local admin for routine restarts.
  • Script hygiene: Sign your scripts, store them in source control, and code-review changes to restart logic.
  • Predictable output: Make the function return a simple object with name, status, attempt, and duration so other tooling can parse results reliably.

Performance and resilience tips

  • Backoff between retries: Add a short delay (e.g., 1–3 seconds). Consider exponential backoff for noisy neighbors or slow tear-downs.
  • Telemetry on timing: Track median and p95 of stop/start durations per service to catch regressions early.
  • Guard rails: In clustered or multi-instance apps, stagger restarts and enforce concurrency limits to keep capacity available.

By treating restarts as a contract with timeouts, verification, and explicit logging, you dramatically reduce outages, speed up incident response, and give your CI/CD pipelines deterministic behavior. Start with the minimal script for ad-hoc use, then adopt the production-ready function in automation where predictability and telemetry matter most.

Further reading: Keep production stable with disciplined service handling. See the PowerShell Advanced Cookbook → https://www.amazon.com/PowerShell-Advanced-Cookbook-scripting-advanced-ebook/dp/B0D5CPP2CQ/

← All Posts Home →