Basic structure¶
A zval (short for “Zend value”) represents an arbitrary PHP value. As such it is likely the most important structure in all of PHP and you’ll be working with it a lot. This section describes the basic concepts behind zvals and their use.
Types and values¶
Among other things, every zval stores some value and the type this value has. This is necessary because PHP is a dynamically typed language and as such variable types are only known at run-time and not at compile-time. Furthermore, the type can change during the life of a zval, so if the zval previously stored an integer it may contain a string at a later point in time.
The type is stored as an integer tag, which can take one of several values. Some values correspond to the eight
types available in PHP, others are used for internal engine purposes only. These values are referred to using constants
of the form IS_TYPE
. E.g. IS_NULL
corresponds to the null type and IS_STRING
corresponds to the string type.
The actual value is stored in a union, which is defined as follows:
typedef union _zend_value {
zend_long lval; // For IS_LONG
double dval; // For IS_DOUBLE
zend_refcounted *counted;
zend_string *str; // For IS_STRING
zend_array *arr; // For IS_ARRAY
zend_object *obj; // For IS_OBJECT
zend_resource *res; // For IS_RESOURCE
zend_reference *ref; // For IS_REFERENCE
zend_ast_ref *ast; // For IS_CONSTANT_AST (special)
zval *zv; // For IS_INDIRECT (special)
void *ptr;
zend_class_entry *ce;
zend_function *func;
struct {
uint32_t w1;
uint32_t w2;
} ww;
} zend_value;
To those not familiar with the concept of unions: A union defines multiple members of different types, but only one of
them can ever be used at a time. E.g. if the value.lval
member was set, then you also need to look up the value
using value.lval
and not one of the other members (doing so would violate “strict aliasing” guarantees and lead to
undefined behaviour). The reason is that unions store all their members at the same memory location and just interpret
the value located there differently depending on which member you access. The size of the union is the size of its
largest member.
When working with zvals the type tag is used to find out which of the union’s member is currently in use. Before having a look at the APIs used to do so, let’s walk through the different types PHP supports and how they are stored:
The simplest type is IS_NULL
: It doesn’t need to actually store any value, because there is just one null
value.
Booleans use either the IS_TRUE
or IS_FALSE
types and don’t need to store a value either. PHP internally
represents true and false as separate types for efficiency reasons, even though these are considered values from a
user perspective. There also exists an _IS_BOOL
type, however it is never used as a zval type. It is used
internally to indicate casts to boolean and similar purposes.
For storing numbers PHP provides the types IS_LONG
and IS_DOUBLE
, which make use of the zend_long lval
and
double dval
members respectively. The former is used to store integers, whereas the latter stores floating point
numbers.
There are some things that one should be aware of about the zend_long
type: Firstly, this is a signed integer type,
i.e. it can store both positive and negative integers, but is commonly not well suited for doing bitwise operations.
Secondly, zend_long
is not the same as long
, because it abstracts away platform differences. zend_long
is always 4 bytes large on 32-bit platorms and 8 bytes large on 64-bit platforms, even if the long
type may have
a different size.
For this reason, it is important to use macros written specifically for use with zend_long
, such as
SIZEOF_ZEND_LONG
or ZEND_LONG_MAX
. You can find more relevant macros in
Zend/zend_long.h.
The double
type used to store floating point numbers is an 8-byte value following the IEEE-754 specification.
The details of this format won’t be discussed here, but you should at least be aware of the fact that this type has
limited precision and commonly doesn’t store the exact value you want.
The remaining four types will only be mentioned here quickly and discussed in greater detail in their own chapters:
Strings (IS_STRING
) are stored in a zend_string
structure, which combines the string length and the string
contents in a single allocation. You will find more information about the zend_string
structure and its
dedicated API in the string chapter.
Arrays use the IS_ARRAY
type tag and are stored in the zend_array *arr
member. How the HashTable
structure
works will be discussed in the Hashtables chapter.
Objects (IS_OBJECT
) use the zend_object *obj
member. PHP’s class and object system will be described in the
objects chapter.
Resources (IS_RESOURCE
) are use the zend_resource *res
member. Resources are covered in the
Resources chapter.
To summarize, here’s a table with all the available “normal” type tags and the corresponding storage location for their values:
Type tag |
Storage location |
---|---|
|
none |
|
none |
|
|
|
|
|
|
|
|
|
|
|
|
Special types¶
There are a number of additional types that do not have a directly corresponding userland type, and are only used
internally. Of these, IS_UNDEF
and IS_REFERENCE
are the only types you will encounter routinely.
The IS_UNDEF
type is used to indicate an uninitialized zval. This type tag has a value of zero, so zeroing out
a zval using memset
will result in an UNDEF
zval. The exact meaning of IS_UNDEF
depends on the context,
for example it can indicate an unintialized/unset object property, or an unused hashtable bucket.
The IS_REFERENCE
type in conjunction with the zend_reference *ref
member is used to represent a PHP
reference. While from a userland perspective references are not a separate type, internally references are represented
as a wrapper around another zval, that can be shared by multiple places.
The zend_refcounted *counted
member accesses a common header for all reference-counted types, including strings,
arrays, objects, resources and references. How this works is discussed in the memory management chapter.
The IS_CONSTANT_AST
type and zend_ast_ref *ast
member are used to store unevaluated constant expression abstract syntax trees (ASTs). It can occur only in specific places, such as property default values. ASTs will be discussed
in the compiler chapter.
The IS_INDIRECT
type and zval *zv
member are used to store a direct pointer to another zval. This is used
primarily for symbol types and dynamic property tables, in order to point to an actual value stored elsewhere.
The IS_PTR
type together with the void *ptr
field are used to store an arbitrary pointer. In C, any pointer
type can be converted into void *
and the other way around. This is used to store pointers in places that normally
only accept zvals, such as hashtable values.
The zend_class_entry *ce
and zend_function *func
members just specify a more precise type, but otherwise
serve the same purpose as ptr
.
The zval struct¶
Let’s now have a look at how the zval
struct actually looks like:
struct _zval_struct {
zend_value value;
union {
uint32_t type_info;
struct {
ZEND_ENDIAN_LOHI_3(
zend_uchar type,
zend_uchar type_flags,
union {
uint16_t extra;
} u)
} v;
} u1;
union {
uint32_t next; /* hash collision chain */
uint32_t cache_slot; /* cache slot (for RECV_INIT) */
uint32_t opline_num; /* opline number (for FAST_CALL) */
uint32_t lineno; /* line number (for ast nodes) */
uint32_t num_args; /* arguments number for EX(This) */
uint32_t fe_pos; /* foreach position */
uint32_t fe_iter_idx; /* foreach iterator index */
uint32_t access_flags; /* class constant access flags */
uint32_t property_guard; /* single property guard */
uint32_t constant_flags; /* constant flags */
uint32_t extra; /* not further specified */
} u2;
};
This structure looks a bit more complicated than it really is. At its core, it stores an 8 byte value
and a
single byte type
tag, both of which we have already discussed above.
This would theoretically leave us with a zval size of 9 bytes. However, to allow efficient access, it is necessary to align the structure size of an 8 byte boundary, such that the total size becomes 16 bytes. As the additional space will be used anyway, PHP makes some use of the “wasted” space:
The type
tag is part of a larger type_info
structure, which additionally stores type_flags
. As of PHP 7.4
there are only two type flags: IS_TYPE_REFCOUNTED
indicates that the value is reference-counted, while
IS_TYPE_COLLECTABLE
indicates that it participates in circular garbage collection. We will discuss both of these
in the future.
The u2
member is a 32-bit space to store arbitrary data, and is used for different purposes depending on context.
Hashtables use it to store the collision resolution chain, but as the above comments indicate, there are many other
usages as well. It should be noted that standard zval macros will never modify or copy the u2
field.
The u1.v.u.extra
field that is part of the type is very rarely used to also store additional information.
However, use of this field is only possible in very specific circumstances, as PHP will usually assume that it is
zero.
Access macros¶
Knowing the zval structure you can now write code making use of it:
zval *zv_ptr = /* ... get zval from somewhere */;
if (zv_ptr->u1.v.type == IS_LONG) {
php_printf("Zval is a long with value " ZEND_LONG_FMT "\n", zv_ptr->value.lval);
} else /* ... handle other types */
While the above code works, this is not the idiomatic way to write it. It directly accesses the zval members rather than using a special set of access macros for this purpose:
zval *zv_ptr = /* ... */;
if (Z_TYPE_P(zv_ptr) == IS_LONG) {
php_printf("Zval is a long with value " ZEND_LONG_FMT "\n", Z_LVAL_P(zv_ptr));
} else /* ... */
The above code uses the Z_TYPE_P()
macro for retrieving the type tag and Z_LVAL_P()
to get the long (integer)
value. All the access macros have variants with a _P
(for “pointer”) suffix or no suffix at all. Which one you
use depends on whether you are working on a zval
or a zval*
zval zv;
zval *zv_ptr;
Z_TYPE(zv); // Same as Z_TYPE_P(&zv).
Z_TYPE_P(zv_ptr); // Same as Z_TYPE(*zv_ptr).
Similarly to Z_LVAL
there are also macros for fetching values of all the other types. To demonstrate their usage
we’ll create a simple function for dumping a zval:
PHP_FUNCTION(dump)
{
zval *zv_ptr;
if (zend_parse_parameters(ZEND_NUM_ARGS(), "z", &zv_ptr) == FAILURE) {
return;
}
try_again:
switch (Z_TYPE_P(zv_ptr)) {
case IS_NULL:
php_printf("NULL: null\n");
break;
case IS_TRUE:
php_printf("BOOL: true\n");
break;
case IS_FALSE:
php_printf("BOOL: false\n");
break;
case IS_LONG:
php_printf("LONG: %ld\n", Z_LVAL_P(zv_ptr));
break;
case IS_DOUBLE:
php_printf("DOUBLE: %g\n", Z_DVAL_P(zv_ptr));
break;
case IS_STRING:
php_printf("STRING: value=\"");
PHPWRITE(Z_STRVAL_P(zv_ptr), Z_STRLEN_P(zv_ptr));
php_printf("\", length=%zd\n", Z_STRLEN_P(zv_ptr));
break;
case IS_RESOURCE:
php_printf("RESOURCE: id=%d\n", Z_RES_HANDLE_P(zv_ptr));
break;
case IS_ARRAY:
php_printf("ARRAY: hashtable=%p\n", Z_ARRVAL_P(zv_ptr));
break;
case IS_OBJECT:
php_printf("OBJECT: object=%p\n", Z_OBJ_P(zv_ptr));
break;
case IS_REFERENCE:
// For references, remove the reference wrapper and try again.
// Yes, you are allowed to use goto for this purpose!
php_printf("REFERENCE: ");
zv_ptr = Z_REFVAL_P(zv_ptr);
goto try_again;
EMPTY_SWITCH_DEFAULT_CASE() // Assert that all types are handled.
}
}
Lets try it out:
dump(null); // NULL: null
dump(true); // BOOL: true
dump(false); // BOOL: false
dump(42); // LONG: 42
dump(4.2); // DOUBLE: 4.2
dump("foo"); // STRING: value="foo", length=3
dump(fopen(__FILE__, "r")); // RESOURCE: id=???
dump(array(1, 2, 3)); // ARRAY: hashtable=0x???
dump(new stdClass); // OBJECT: object=0x???
The following table summarizes the most commonly used accessor macros, though there are quite a few more than that.
Macro |
Returned type |
Required zval type |
Description |
---|---|---|---|
|
|
Type of the zval. One of the |
|
|
|
|
Integer value. |
|
|
|
Floating-point value. |
|
|
|
Pointer to full |
|
|
|
String contents of the |
|
|
|
String length of the |
|
|
|
Pointer to |
|
|
|
Alias of |
|
|
|
Pointer to |
|
|
|
Class entry of the object. |
|
|
|
Pointer to |
|
|
|
Pointer to |
|
|
|
Pointer to the zval the reference wraps. |
When you want to access the contents of a zval, you should always go through these macros, rather than directly accessing its members. This maintains a level of abstraction and will, to some degree, insulate you from changes in the implementation.