References

PHP references (in the sense of the & symbol) are mostly transparent to userland code, but require consistent special handling in the implementation. This chapter discusses how references are represented, and how internal code should deal with them.

Reference semantics

Before going into the internal representation of PHP references, it may be helpful to clarify some common misconceptions about the semantics of references in PHP. Consider this basic example:

$a = 0;
$b =& $a;
$a++;
$b++;
var_dump($a); // int(2)
var_dump($b); // int(2)

People will commonly say that “$b is a reference to $a”. However, this is not quite correct, in that references in PHP have no concept of directionality. After $b =& $a, both $a and $b reference a common value, and neither of the variables is privileged in any way.

This becomes particularly problematic when we consider the interaction of references and array copies:

$array = [0];
$ref =& $array[0];
$array2 = $array;
$array2[] = 42; // Triggering copy-on-write makes no difference here.
$ref++;
var_dump($array[0]); // int(1)
var_dump($array2[0]); // int(1)

The $ref =& $array[0] line creates a reference between $ref and $array[0]. When the array is subsequently copied, it becomes a reference between $ref, $array[0] and $array2[0], as the reference is also copied.

Intuitively this behavior is wrong. There’s two reasons why it happens: The first one is the aforementioned lack of directionality. This behavior would make sense if we had written $array[0] =& $ref. In this case it would be expected that a copy of $array2[0] also points to $ref. However, we cannot actually distinguish these two cases.

The second and more important reason is a more technical one: $array2 = $array only performs a refcount increment, which means we wouldn’t have a chance to drop the reference even if we wanted to.

Representation

References are represented using an IS_REFERENCE zval that points to a zend_reference structure:

struct _zend_reference {
    zend_refcounted_h              gc;
    zval                           val;
    zend_property_info_source_list sources;
};

Zvals themselves do not have a reference count, and cannot be shared. The zend_reference structure essentially represents a reference-counted zval that can be shared. Multiple zvals can point to the same zend_reference, and any change to the val it contains will be observable from all sources.

Type sources

Normally, PHP does not track who or what makes use of a given reference. The only knowledge that is stored is how many users there are (through the refcount), so that the reference may be destroyed in time.

However, due to the introduction of typed properties in PHP 7.4, we do need to track of which typed properties make use of a certain reference, in order to enforce property types for indirect modifications through references:

class Test {
    public int $prop = 42;
}
$test = new Test;
$ref =& $test->prop;
$ref = "string"; // TypeError

The sources member of zend_reference stores a list of zend_property_info pointers to track typed properties that use the reference. Macros like ZEND_REF_HAS_TYPE_SOURCES(), ZEND_REF_ADD_TYPE_SOURCE(), and ZEND_REF_DEL_TYPE_SOURCE() are used to manage this source list, but typically only engine code needs to deal with this.

Initializing references

Just like other zvals, references are initialized through a set of macros. The most basic one accepts an already created zend_reference pointer:

zval ref;
ZVAL_REF(ref, zend_reference_ptr);

To create a reference from scratch, ZVAL_NEW_REF() can be used:

zval ref;
zval initial_val;
ZVAL_STRING(initial_val, "test");
ZVAL_NEW_REF(&ref, &initial_val);

This macro accepts an initial value for the reference. Note that it is moved into the reference using ZVAL_COPY_VALUE, the refcount is not incremented. Alternatively, ZVAL_NEW_EMPTY_REF() leaves the value uninitialized:

zval ref;
ZVAL_NEW_EMPTY_REF(&ref);
ZVAL_STRING(Z_REFVAL(ref), "test");

Here we create an empty reference and then initialize the reference value Z_REFVAL(ref) directly. Finally, ZVAL_MAKE_REF() can be used to promote an existing zval into a reference:

zval *zv = /* ... */;
ZVAL_MAKE_REF(zv);

If zv was already a reference, this does nothing. It if wasn’t a reference yet, this will change zv into a reference and set its initial value to the old value of zv.

Dereferencing and unwrapping

Most code does not want to handle references in any special way, and simply want to look through to the underlying value:

zval *zv = /* ... */;
if (Z_ISREF_P(zv)) {
    zv = Z_REFVAL_P(zv);
}

If the value is a reference (Z_ISREF), we switch to looking at the value it contains. This operation is called “dereferencing” and is more compactly written as ZVAL_DEREF(zv). It is extremely common and should be applied essentially at any point where reference zvals might occur. For example, this is how a typical loop over an array might look like:

zval *val;
ZEND_HASH_FOREACH_VAL(ht, val) {
    ZVAL_DEREF(val);

    /* Do something with val, now a guaranteed non-reference. */
} ZEND_HASH_FOREACH_END();

The ZVAL_COPY_DEREF(target, source) macro is a combined form of ZVAL_COPY and ZVAL_DEREF. It copies the dereferenced value of source into target.

Dereferencing simply moves a pointer from the outer to the inner zval, without changing either. It is also possible to actually remove the reference wrapper by performing an unwrap. It is probably easiest to understand this operation by looking at its implementation:

static zend_always_inline void zend_unwrap_reference(zval *op) {
    if (Z_REFCOUNT_P(op) == 1) {
        ZVAL_UNREF(op);
    } else {
        Z_DELREF_P(op);
        ZVAL_COPY(op, Z_REFVAL_P(op));
    }
}

If the refcount is 1, then the inner value is moved into op and the reference wrapper is destroyed. This is what ZVAL_UNREF() does. If the refcount is greater than one, then we decrement the refcount of the reference wrapper, and copy (with refcount increase) the inner value into op. This means that an unwrap operation does not necessarily destroy the reference (if it has other users), but will remove one particular use.

Indirect zvals

Next to references, PHP also has a more direct mechanism to share zvals. The IS_INDIRECT type stores a direct pointer to another zval:

zval val1;
ZVAL_LONG(&val1, 42);

zval val2;
ZVAL_INDIRECT(&val2, &val1);

ZEND_ASSERT(Z_INDIRECT(val2) == &val1);

While there is some surface similarity to references, this mechanism is not generally usable, because nothing ensures that the pointed-to zval isn’t deallocated. For this reason, indirect zvals can only be used in controlled situations, for example to point from a property hash table to a property slot table. This is possible, because we know that the property slot table is not reallocated during the lifetime of an object, and the property hash table and property slot table are deallocated at the same time, so no dangling pointers are left behind.

As such, indirect zvals can only occur in specific situations, and cannot be stored in general-purpose userland-exposed zvals.