Memory management¶
To work with zvals correctly and efficiently, it is important to understand how their memory management works. Broadly, we can classify values into two categories: Simple values like integers, which are stored directly inside the zval, and complex values like strings, for which the zval only stores a pointer to a separate structure.
Reference-counted values¶
All complex values share a common header with the following structure:
typedef struct _zend_refcounted_h {
uint32_t refcount;
union {
uint32_t type_info;
} u;
} zend_refcounted_h;
This header stores a reference count, which tracks in how many places this structure is used. If the structure is used in a new zval, the refcount is incremented. If it stops being used, it is decremented. If the reference count reaches zero, we know that the structure is no longer used and can be freed. This is the core mechanism of PHP’s memory management.
The type_info
field encodes additional information, such as the type of the structure, a number of type-specific
flags, as well as a garbage collection root. We will discuss the purpose of this information later.
There are functions for creating different kinds of refcounted structures, which will create them with an initial refcount of one:
zend_string *str = zend_string_init("test", sizeof("test")-1, /* persistent */ 0); // refcount=1
zend_array *arr = zend_new_array(/* size hint */ 0); // refcount=1
// Do something with str and arr.
zend_string_release(str); // refcount=0 => destroy!
zend_array_release(arr); // refcount=0 => destroy!
The zend_string_release()
and zend_array_release()
functions will decrement the refcount of the string or array
and if it reaches zero, destroy it. For example, the following code is perfectly valid:
zend_string *str = zend_string_init("test", sizeof("test")-1, /* persistent */ 0); // refcount=1
zend_hash_add_empty_element(arr, str); // refcount=2
zend_string_release(str); // refcount=1
This adds an element with key str
to an array and releases the string afterwards. However, the
zend_hash_add_empty_element()
function will have incremented the refcount of the string, as such the
zend_string_release()
call will not destroy it. It will only get destroyed once the array is destroyed as well and
no references to the string remain.
Immutable values¶
While all complex structures share the zend_refcounted_h
header, the refcount is not always actually used. Strings
and arrays can be immutable, which means that the entire structure, including the reference count, must never be
modified. Such structures can be reused without incrementing the reference count and are guaranteed to not be destroyed
until (at least) the end of the request.
There are a number of reasons why immutable strings and arrays exist:
Any structures stored in opcache shared memory are immutable, because they are shared across multiple processes. You can set the
opcache.protect_memory=1
ini setting in order to enforce this throughmprotect()
. This will make most immutability violations result in crashes rather than misbehavior.The empty array is declared
const
and as such typically allocated in a read-only segment. Attempting to modify it will result in a crash.Persistent strings that are created outside a request but may be used inside it (such as ini values) must be immutable, because there may be multiple threads using them in parallel. As PHP’s reference counting is non-atomic, performing normal refcounting would not be safe.
Finally, while the above reasons make immutable structures a technical requirement, having them also serves as a performance optimization, as refcounting operations can be avoided in many common cases.
When working with higher-level APIs such as zend_string_copy()
or ZVAL_COPY()
, immutable structures will be
correctly handled automatically. However, if you use lower-level APIs, you need to take them into account explicitly.
The low-level interface is provided primarily through the following macros:
Macro |
Description |
---|---|
|
Get type of the structure ( |
|
Get flags. |
|
Get reference count. |
|
Increment refcount. Structure must be mutable. |
|
Decrement refcount. Structure must be mutable. Does not release structure if refcount reaches zero. |
|
Increment refcount if mutable, otherwise do nothing. |
Immutable structures set the GC_IMMUTABLE
flag (which has a number of aliases like IS_STR_INTERNED
and
IS_ARRAY_IMMUTABLE
), which can be used to determine whether incrementing the refcount is safe:
zend_string *str = /* ... */;
if (!(GC_FLAGS(str) & GC_IMMUTABLE)) {
GC_ADDREF(str);
}
// Same as:
GC_TRY_ADDREF(str);
// Same as (high-level API):
zend_string_addref(str);
Macros that have TRY
in the name generally indicate that an operation should only be performed for mutable
structures. You’ll encounter more examples like Z_TRY_ADDREF
and GC_TRY_PROTECT_RECURSION
where the meaning is
the same.
Persistent structures¶
PHP makes use of two allocators: The per-request allocator, which releases all memory at the end of a request, and the persistent allocator, which retains allocations across multiple requests. The persistent allocator is effectively the same as the normal system allocator. See the PHP Lifecycle and Zend Memory Manager chapters for more information on PHP’s allocation management.
Many functions that create refcounted structures will accept a persistent
flag to determine which allocator to
use. An example of this is the last argument of zend_string_init()
. If a function exposes no persistent
flag,
then a good default assumption is that the per-request (non-persistent) allocator is used. For example the
zend_array_new()
function always creates a per-request array, while lower-level APIs have to be used to create
a persistent array.
Persistent structures set the GC_PERSISTENT
flag, and their destructors will automatically take care of using
the correct allocator to free the memory. As such, you generally do not need to worry about this flag beyond using the
correct allocator in the first place (usually the per-request one).
However, it is important to understand how persistent structures interact with code executed during a request: Persistent structures can potentially be used by multiple threads. As PHP’s reference counting is non-atomic, performing refcounting from multiple threads results in a data race (that will result in crashes).
As such, any persistent structure that is also used during the request must either be immutable or thread-local.
PHP can be compiled using CFLAGS="-DZEND_RC_DEBUG=1"
to diagnose such issues automatically. This problem most
typically affects strings, in which case they can be made immutable through interning. The
GC_MAKE_PERSISTENT_LOCAL()
macro is used to mark a persistent structure as thread-local. This macro doesn’t do
anything beyond disabling the ZEND_RC_DEBUG
verification.
Zval memory management¶
With the preliminaries out of the way, we can discuss how memory management interacts with zvals. Refcounted structures can be used independently, but storing them inside zvals is certainly one of the more common use-cases.
Zvals themselves are never individually heap-allocated. They are either allocated temporarily on the stack, or embedded as part of a larger heap-allocated structure.
This basic example shows the initialization of a stack-allocated zval, and its subsequent destruction:
zval str_val;
ZVAL_STRING(&str_val, "foo"); // Creates zend_string (refcount=1).
// ... Do something with str_val.
zval_ptr_dtor(&str_val); // Decrements to refcount=0, and destroys the string.
ZVAL_STRING()
creates a string zval and zval_ptr_dtor()
releases it. We’ll discuss different initialization
macros and destructors in a moment.
A stack-allocated zval can only be used in the scope it was declared in. While it is technically possible to return
a zval
, you will find that PHP never passes or returns zvals by value. Instead zvals are always passed by
pointer. In order to return a zval, an out-parameter needs to be passed to the function:
// retval is an output parameter.
void init_zval(zval *retval) {
ZVAL_STRING(retval, "foo");
}
void some_other_function() {
zval val;
init_zval(&val);
// ... Do something with val.
zval_ptr_dtor(&val);
}
While zvals themselves are generally not shared, it’s possible to share the structures they point to using the
refcounting mechanism. The Z_REFCOUNT
, Z_ADDREF
and Z_DELREF
macros work the same way as the
corresponding GC_*
macros, but operate on zvals. Importantly, these macros can only be used if the zval does
point to a refcounted structure, and the structure is not immutable. The IS_TYPE_REFCOUNTED
type flag
determines whether this is the case, and can be accessed through Z_REFCOUNTED
:
void fill_array(zval *array) {
zval val;
init_zval(&val);
// Manually check REFCOUNTED:
if (Z_REFCOUNTED(val)) {
Z_ADDREF(val);
}
add_index_zval(array, 0, &val);
// Or use the TRY macro:
Z_TRY_ADDREF(val);
add_index_zval(array, 1, &val);
zval_ptr_dtor(&val);
}
This example adds the same value to an array twice, which means the refcount has to be incremented twice. While it’s
possible to manually check whether the zval is Z_REFCOUNTED
, it is preferred to use Z_TRY_ADDREF
instead,
which only increments the refcount for refcounted structures.
Something to consider here is who is responsible for incrementing the refcount. In this example, the caller of
add_index_zval()
is responsible for the increment. Unfortunately, PHP APIs are not very consistent in this regard.
As a very rough rule of thumb, array values expect the refcount to be incremented by the caller, while most other
APIs will take care of it themselves.
Copying zvals¶
It is common that zvals need to be copied from one location to another. For this purpose, a number of copying macros
are provided. The first is ZVAL_COPY_VALUE()
:
void init_zval_indirect(zval *retval) {
zval val;
init_zval(&val);
ZVAL_COPY_VALUE(retval, &val);
}
This (rather silly) example initializes a stack zval and then moves the value over into the retval
out parameter.
The ZVAL_COPY_VALUE
macro performs a simple zval copy without incrementing the refcount. As such, its primary
usage is to move a zval, which means that the original zval will no longer be used (which includes that it should
not be destroyed). Sometimes, this macro is also used as an optimization to copy a zval that we know not to be
refcounted.
The ZVAL_COPY_VALUE
macro differs from a simple assignment (*retval = val
) in that it only copies the zval
value and type, but not its u2
member. As such, it is safe to ZVAL_COPY_VALUE
into a zval whose u2
member is
in use, as it will not be overwritten.
The second macro is ZVAL_COPY
, which is an optimized combination of ZVAL_COPY_VALUE
and Z_TRY_ADDREF
:
void init_pair(zval *retval1, zval *retval2) {
zval val;
init_zval(&val); // refcount=1
ZVAL_COPY(retval1, &val); // refcount=2
ZVAL_COPY(retval2, &val); // refcount=3
zval_ptr_dtor(&val); // refcount=2
}
This example copies the value twice, incrementing the refcount (if it has one) twice. A different, and slightly more efficient way to write this function would be:
void init_pair(zval *retval1, zval *retval2) {
zval val;
init_zval(&val); // refcount=1
ZVAL_COPY(retval1, &val); // refcount=2
ZVAL_COPY_VALUE(retval2, &val); // refcount=2
}
This copies the value once into retval1
, and then performs a move into retval2
, saving a redundant refcount
increment and decrement. Finally, the way we would probably write this code in practice is this:
void init_pair(zval *retval1, zval *retval2) {
init_zval(retval1); // refcount=1
ZVAL_COPY(retval2, retval1); // refcount=2
}
Here, the value is directly initialized into retval1
and then copied into retval2
. This version is both the
simplest and the most efficient.
The ZVAL_DUP
macro is similar to ZVAL_COPY
, but will duplicate arrays, rather than just incrementing their
refcount. If you are using this macro, you are almost certainly doing something very wrong.
Finally, ZVAL_COPY_OR_DUP
is a very specialized copy macro that can be used when copying from a potentially
persistent zval during the request. As mentioned before, incrementing the refcount is illegal in this case, because
it would not be thread-safe. This macro will increment the refcount on non-persistent values, but perform a full
string/array duplication for persistent values.
Destroying zvals¶
The above examples have already been making use of zval_ptr_dtor()
to destroy zvals. If the value is refcounted,
this function decrements the refcount and destroys the value when it reaches zero.
However, there is one subtlety here: Reference counting is not sufficient to detect unused values that are part
of cycles. For this reason, PHP employs an additional mark and sweep style circular garbage collector (GC). When the
refcount is decremented but does not reach zero, and the structure is marked as potentially circular (the
GC_NOT_COLLECTABLE
flag is not set), then PHP will add the structure to the GC root buffer.
The zval_ptr_dtor_nogc()
function is a variant that does not perform GC root buffer checks, and is only safe to
use if you know that the destroyed data is non-circular. zval_dtor()
is a legacy alias for the same function.
Another variant that can be encountered in internal code is i_zval_ptr_dtor()
, which is the same as
zval_ptr_dtor()
but using an inlined implementation. The i_
prefix is a general convention for functions that
have both inlined and outlined variants.
Initializing zvals¶
Until now, we have been using an abstract init_zval()
function that somehow initializes a zval. It will not
come as a surprise that PHP handles zval initialization using a plethora of macros. The initialization of simple
types is especially straightforward:
zval val;
ZVAL_UNDEF(&val);
zval val;
ZVAL_NULL(&val);
zval val;
ZVAL_FALSE(&val);
zval val;
ZVAL_TRUE(&val);
zval val;
ZVAL_BOOL(&val, zero_or_one);
zval val;
ZVAL_LONG(&val, 42);
zval val;
ZVAL_DOUBLE(&val, 3.141);
For strings, there are quite a few initialization options. The most fundamental is the ZVAL_STR()
macro, which
takes an already constructed zend_string*
:
zval val;
ZVAL_STR(&val, zend_string_init("test", sizeof("test")-1, 0));
As creating a zend_string
from a string literal or an existing string is so common, there are two convenience
wrappers:
zval val;
ZVAL_STRINGL(&val, "test", sizeof("test")-1);
zval val;
ZVAL_STRING(&val, "test"); // Uses strlen() for length.
The ZVAL_STR
macro will set the IS_TYPE_REFCOUNTED
flag based on whether the string is immutable or not.
There are two optimized variants that can be known if it is known in advance whether the string is interned:
// This string is definitely not interned/immutable.
zval val;
ZVAL_NEW_STR(&val, zend_string_init("test", sizeof("test")-1, 0));
// This string is definitely interned.
zval val;
ZVAL_INTERNED_STR(&val, ZSTR_CHAR('a'));
Empty strings have a separate helper:
zval val;
ZVAL_EMPTY_STRING(&val);
The ZVAL_STRINGL_FAST
macro can be used to avoid a zend_string
allocation if the string is empty or has a
single character, as such strings always have interned variants that can be fetched quickly:
zval val;
ZVAL_STRINGL_FAST(&val, str, len);
Finally, the ZVAL_STR_COPY
macro is a combination of ZVAL_STR
and zend_string_copy
, where the latter
increments the refcount of the string:
zval val;
ZVAL_STR_COPY(&val, zstr); // Refcount will be incremented.
// More efficient/compact version of:
ZVAL_STR(&val, zend_string_copy(zstr));
For arrays, we thankfully only have to consider two initialization macros:
zval val;
ZVAL_ARR(&val, zend_new_array(/* size_hint */ 0));
zval val;
ZVAL_EMPTY_ARRAY(&val);
The first one initializes an array zval to an existing zend_array*
structure, while the latter initializes an
empty array in particular. Note that while both of the above examples initialize an empty array, they are not the
same. ZVAL_EMPTY_ARRAY()
uses an immutable shared empty array, while zend_new_array()
creates a new one. If
you plan to modify the array directly afterwards, you should be using the zend_new_array()
variant.
Object zvals are initialized using ZVAL_OBJ
:
zval val;
ZVAL_OBJ(&val, obj_ptr);
zval val;
ZVAL_OBJ_COPY(&val, obj_ptr); // Increments refcount
While these are somewhat common when dealing with already existing objects, object_init_ex()
is the typical way
to create an object from scratch. This will covered in a later chapter on objects.
Finally, resources are initialized using ZVAL_RES
:
zval val;
ZVAL_RES(&val, zend_register_resource(ptr, le_resource_type));
Separating zvals¶
In PHP, all values follow by-value semantics by default. This means that if you write $a = $b
, then modification
of $a
will have no effect on $b
and vice versa. At the same time, $a = $b
is essentially implemented as:
zval_ptr_dtor(a);
ZVAL_COPY(a, b);
That is, $a
and $b
will both point to the same structure with an incremented refcount. This means that a
naive modification of $a
would also modify $b
.
This is where the copy-on-write concept comes in: You are only permitted to modify structures that you exclusively own, which means that they must have a refcount of one. If a structure has a refcount greater than one, it needs to be separated first. Separation is just a fancy word for duplicating the structure.
In practice “structure” can be replaced with “array”. While in theory the concept also applies to strings, strings
are almost never mutated after construction in PHP. As such SEPARATE_ARRAY()
is the main separation macro, which
can only be applied to IS_ARRAY
zvals:
zval a, b;
ZVAL_ARR(&b, zend_new_array(0));
ZVAL_COPY(&a, &b);
SEPARATE_ARRAY(&b); // b now holds a separate copy of the array.
// Modification of b will no longer affect a.
The SEPARATE_ARRAY()
macro takes care not only of shared arrays, but also of immutable ones:
zval val;
ZVAL_EMPTY_ARRAY(&val); // Immutable empty array.
SEPARATE_ARRAY(&val); // Mutable copy of empty array.
The SEPARATE_ZVAL_NOREF()
macro separates a generic zval, but is only rarely useful, as separation typically
directly precedes a modification, and you need to know the zval type to perform any meaningful modification anyway.
Objects and resources do not require separation, as they have reference-like semantics.