Table of Contents
Fixing historical data issues in production is nerve-wracking. You need to update thousands of records, but a single wrong JOIN can cascade the update to user data you shouldn’t touch.
Here’s a pattern I use: role-based filtering combined with progress tracking to make data migrations safer and more observable.
The Problem: Broad Updates Are Dangerous
Say you need to fix a file naming pattern in a generated_reports table. The naive approach:
public function up(): void
{
DB::table('generated_reports')
->where('file_name', 'LIKE', 'Old_Format_%')
->update([
'file_name' => DB::raw("REPLACE(file_name, 'Old_Format_', 'New_Format_')")
]);
}
But what if this table has reports generated by customers, internal admins, AND automated systems? You only want to fix admin reports, but this query hits everything.
The Pattern: Role-Based Scoping
Use JOINs to filter by user roles, then add progress tracking:
use Symfony\Component\Console\Output\ConsoleOutput;
public function up(): void
{
$output = new ConsoleOutput();
// Step 1: Find affected records with role filtering
$affected = DB::table('generated_reports')
->join('users', 'generated_reports.user_id', '=', 'users.id')
->join('role_user', 'users.id', '=', 'role_user.user_id')
->join('roles', 'role_user.role_id', '=', 'roles.id')
->whereIn('roles.slug', ['admin', 'finance', 'manager'])
->where('generated_reports.file_name', 'LIKE', 'Old_Format_%')
->where('generated_reports.created_at', '>=', '2024-01-01')
->select('generated_reports.id')
->distinct() // Prevent duplicate IDs from multiple role assignments
->get();
$output->writeln("Found {$affected->count()} records to update ");
// Step 2: Update in chunks with progress indicator
$affected->chunk(100)->each(function ($chunk) use ($output) {
DB::table('generated_reports')
->whereIn('id', $chunk->pluck('id'))
->update([
'file_name' => DB::raw("REPLACE(file_name, 'Old_Format_', 'New_Format_')")
]);
$output->write('.'); // Progress indicator
});
$output->writeln("\nMigration complete: {$affected->count()} records updated ");
}
Why This Matters
This pattern:
- Scopes updates safely: JOINs to
rolestable ensure you only touch records created by specific user types - Prevents accidental cascade: If roles misconfigured, updates fail instead of hitting wrong records
- Visible progress:
ConsoleOutputlets you watch long-running migrations in real-time - Handles duplicates:
distinct()prevents row multiplication from many-to-many role relationships - Testable in dev: Run on staging with
--pretendto see affected IDs without changes
Key Components Explained
1. Role-Based JOIN Chain
->join('users', 'generated_reports.user_id', '=', 'users.id')
->join('role_user', 'users.id', '=', 'role_user.user_id')
->join('roles', 'role_user.role_id', '=', 'roles.id')
->whereIn('roles.slug', ['admin', 'finance', 'manager'])
This filters to records created by users with specific roles. If your app uses a different permission system (Spatie, custom), adjust accordingly.
2. ConsoleOutput for Progress
use Symfony\Component\Console\Output\ConsoleOutput;
$output = new ConsoleOutput();
$output->writeln("Found X records "); // Green text
$output->write('.'); // Progress dots
Works in migrations because they run via Artisan. You see progress live in your terminal.
3. distinct() to Avoid Duplicates
When a user has multiple roles, JOIN creates duplicate rows. distinct() collapses them:
->select('generated_reports.id')
->distinct()
Without this, the same report gets updated multiple times (harmless but wasteful).
When to Use This
- Historical data fixes that should only affect specific user types (admins, internal users, etc.)
- Migrations on large tables where you need to see progress
- When UPDATE scope is safety-critical (don’t want to accidentally touch customer data)
- When you need to generate an affected-IDs report before applying changes
Testing Before Running
Always verify scope on staging first:
# See the query without executing
php artisan migrate --pretend
# Or add a dry-run to your migration:
if (app()->environment('local')) {
$output->writeln("DRY RUN - would update: " . $affected->pluck('id')->implode(', '));
return;
}
Remember: Data migrations in production are one-way. Role-based filtering gives you an extra safety net to ensure you’re only touching the records you intend to fix.
Leave a Reply