Laravel, segmentation faults and circular references

A curious bug report came in at work recently, where a process in a client's Laravel app had seemingly stopped processing midway through execution. Some models had been updated correctly, but subsequent models had not.

Strangely, there were no error reports in Sentry; no exceptions appearing in the Forge logs; MySQL had not crashed; tests were passing. I then noticed that Forge's queue worker had restarted, and the uptime corresponded with the last updated_at timestamp of the last saved model.

The workers are managed by Supervisor, so I dug into the server's supervisord.log to see this:

825 INFO exited: worker-408103_00 (terminated by SIGSEGV (core dumped); not expected)

Debugging segmentation faults

SIGSEGV (short for segmentation violation) is a signal sent to a process when that process attempts to use memory in a way it is not allowed. Segmentation faults in PHP are often difficult to trace because the SIGSEGV signal causes the PHP process to immediately terminate, hence the empty error logs.

One option is to try a hack that utilises PHP's register_tick_function to log the last successful executed line. This has varying degrees of success based on the PHP version you're using.

Alternatively you may be able to view the core dump of the process before it was terminated. This is a snapshot of the memory allocations that led to the crash - detailed but time consuming to dig into. Forge doesn't have core dumps enabled by default, but you can enable this with the process.dumpable flag in /etc/php/{php_version}/fpm/pool.d/www.conf.

On this server the core dump files were stored in /var/crash as .crash files. In order to inspect the core dump, do the following:

$ apt install gdb
$ apport-unpack path_to_file.crash unpack_dir
$ cd unpack_dir
$ gdb `cat ExecutablePath` CoreDump

This will load the core dump into gdb where you can do various commands to determine the cause of the segfault:

print (char *)(executor_globals.function_state_ptr->function)->common.function_name
print (char *)executor_globals.active_op_array->function_name
print (char *)executor_globals.active_op_array->filename

As you can see, delving into gdb requires some C knowledge. Luckily, the above commands gave me enough of a hint as to where the issue lay.

Eloquent & circular references

Back in our application, the process seemed to crash when looping through a collection of Setting models, which are loaded as relations of a Device:

Action.phpforeach ($device->settings as $setting) {
    // ...
    $setting->save();
}

When each setting is saved, the application observes Eloquent's "updated" event to perform additional tasks, which involve accessing the $device:

SettingObserver.phpif ($setting->device->hasSomeCondition()) {
   ...
}

This would require a fresh SQL query to load the device relation for each setting. Since there are many settings, and the device is already loaded in memory, we optimise this by using Eloquent's setRelation() method:

Action.phpforeach ($device->settings as $setting) {
    // ...
    $setting->setRelation('device', $device);
    $setting->save();
}

No further queries are executed because the device relation is now present on each setting model. All unit & feature tests for this pass. So what's the problem?

Serializing job data

Under certain conditions, our application needs to sync device information with an external API. This is also handled by observing an Eloquent event and dispatching a job to the queue:

DeviceObserver.phpif ($device->hasSomeCondition()) {
   BackgroundJob::dispatch($device);
}

Again, in our unit & feature tests this works just fine; Laravel automatically stores the device's ID in the jobs table and retrieves the corresponding device when the job is run. This feature is described in the docs:

If your queued job accepts an Eloquent model in its constructor, only the identifier for the model will be serialized onto the queue.

However, immediately underneath is this (emphasis mine):

When the job is actually handled, the queue system will automatically re-retrieve the full model instance and its loaded relationships from the database.

So the documentation isn't entirely accurate; the model indentifier and information about its loaded relations are serialized.

Looking back at our optimisation, we manually set the $device relation on each $setting. Behind the scenes, this creates a circular reference:

$device -> $settings[n] -> $device

Under a specific set of conditions, our job can accept a $device whose relations contain a circular reference. This is impossible to serialize, triggering the segmentation fault.

Preventing the issue

As it turns out, the Laravel team is aware of this; Dries Vints stated that "This indeed isn't supported. You should build your job in a way that the circular dependency is not present."

Personally I find it strange that the queue system retrieves the loaded relations; I can't see any particular benefit unless your application logic is built around which relations happen to be loaded at any given time (which is not a good idea).

Regardless, this was an easy fix; we could either remove the setRelation() call and deal with the redundant queries this would generate, or break the circular reference before dispatching the job:

BackgroundJob::dispatch($device->unsetRelations());

So there you have it - be careful of manually setting relations if you pass those models into your queues.