Role-Based Historical Data Migrations with Progress Tracking

๐Ÿ“– 4 minutes read

Fixing historical data issues in production is nerve-wracking. You need to update thousands of records, but a single wrong JOIN can cascade the update to user data you shouldn’t touch.

Here’s a pattern I use: role-based filtering combined with progress tracking to make data migrations safer and more observable.

The Problem: Broad Updates Are Dangerous

Say you need to fix a file naming pattern in a generated_reports table. The naive approach:

public function up(): void
{
    DB::table('generated_reports')
        ->where('file_name', 'LIKE', 'Old_Format_%')
        ->update([
            'file_name' => DB::raw("REPLACE(file_name, 'Old_Format_', 'New_Format_')")
        ]);
}

But what if this table has reports generated by customers, internal admins, AND automated systems? You only want to fix admin reports, but this query hits everything.

The Pattern: Role-Based Scoping

Use JOINs to filter by user roles, then add progress tracking:

use Symfony\Component\Console\Output\ConsoleOutput;

public function up(): void
{
    $output = new ConsoleOutput();
    
    // Step 1: Find affected records with role filtering
    $affected = DB::table('generated_reports')
        ->join('users', 'generated_reports.user_id', '=', 'users.id')
        ->join('role_user', 'users.id', '=', 'role_user.user_id')
        ->join('roles', 'role_user.role_id', '=', 'roles.id')
        ->whereIn('roles.slug', ['admin', 'finance', 'manager'])
        ->where('generated_reports.file_name', 'LIKE', 'Old_Format_%')
        ->where('generated_reports.created_at', '>=', '2024-01-01')
        ->select('generated_reports.id')
        ->distinct()  // Prevent duplicate IDs from multiple role assignments
        ->get();
    
    $output->writeln("Found {$affected->count()} records to update");
    
    // Step 2: Update in chunks with progress indicator
    $affected->chunk(100)->each(function ($chunk) use ($output) {
        DB::table('generated_reports')
            ->whereIn('id', $chunk->pluck('id'))
            ->update([
                'file_name' => DB::raw("REPLACE(file_name, 'Old_Format_', 'New_Format_')")
            ]);
        
        $output->write('.');  // Progress indicator
    });
    
    $output->writeln("\nMigration complete: {$affected->count()} records updated");
}

Why This Matters

This pattern:

  • Scopes updates safely: JOINs to roles table ensure you only touch records created by specific user types
  • Prevents accidental cascade: If roles misconfigured, updates fail instead of hitting wrong records
  • Visible progress: ConsoleOutput lets you watch long-running migrations in real-time
  • Handles duplicates: distinct() prevents row multiplication from many-to-many role relationships
  • Testable in dev: Run on staging with --pretend to see affected IDs without changes

Key Components Explained

1. Role-Based JOIN Chain

->join('users', 'generated_reports.user_id', '=', 'users.id')
->join('role_user', 'users.id', '=', 'role_user.user_id')
->join('roles', 'role_user.role_id', '=', 'roles.id')
->whereIn('roles.slug', ['admin', 'finance', 'manager'])

This filters to records created by users with specific roles. If your app uses a different permission system (Spatie, custom), adjust accordingly.

2. ConsoleOutput for Progress

use Symfony\Component\Console\Output\ConsoleOutput;

$output = new ConsoleOutput();
$output->writeln("Found X records");  // Green text
$output->write('.');  // Progress dots

Works in migrations because they run via Artisan. You see progress live in your terminal.

3. distinct() to Avoid Duplicates

When a user has multiple roles, JOIN creates duplicate rows. distinct() collapses them:

->select('generated_reports.id')
->distinct()

Without this, the same report gets updated multiple times (harmless but wasteful).

When to Use This

  • Historical data fixes that should only affect specific user types (admins, internal users, etc.)
  • Migrations on large tables where you need to see progress
  • When UPDATE scope is safety-critical (don’t want to accidentally touch customer data)
  • When you need to generate an affected-IDs report before applying changes

Testing Before Running

Always verify scope on staging first:

# See the query without executing
php artisan migrate --pretend

# Or add a dry-run to your migration:
if (app()->environment('local')) {
    $output->writeln("DRY RUN - would update: " . $affected->pluck('id')->implode(', '));
    return;
}

Remember: Data migrations in production are one-way. Role-based filtering gives you an extra safety net to ensure you’re only touching the records you intend to fix.

Daryle De Silva

VP of Technology

11+ years building and scaling web applications. Writing about what I learn in the trenches.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *