Basic structure¶
A zval (short for “Zend value”) represents an arbitrary PHP value. As such it is likely the most important structure in all of PHP and you’ll be working with it a lot. This section describes the basic concepts behind zvals and their use.
Types and values¶
Among other things, every zval stores some value and the type this value has. This is necessary because PHP is a dynamically typed language and as such variable types are only known at run-time and not at compile-time. Furthermore the type can change during the life of a zval, so if the zval previously stored an integer it may contain a string at a later point in time.
The type is stored as an integer tag (an unsigned char). It can be one of eight values, which correspond to the eight
types available in PHP. These values are referred to using constants of the form IS_TYPE
. E.g. IS_NULL
corresponds to the null type and IS_STRING
corresponds to the string type.
The actual value is stored in a union, which is defined as follows:
typedef union _zvalue_value {
long lval;
double dval;
struct {
char *val;
int len;
} str;
HashTable *ht;
zend_object_value obj;
} zvalue_value;
To those not familiar with the concept of unions: A union defines multiple members of different types, but only one of
them can ever be used at a time. E.g. if the value.lval
member was set, then you also need to look up the value
using value.lval
and not one of the other members (doing so would violate “strict aliasing” guarantees and lead to
undefined behaviour). The reason is that unions store all their members at the same memory location and just interpret
the value located there differently depending on which member you access. The size of the union is the size of its
largest member.
When working with zvals the type tag is used to find out which of the union’s member is currently in use. Before having a look at the APIs used to do so, let’s walk through the different types PHP supports and how they are stored:
The simplest type is IS_NULL
: It doesn’t need to actually store any value, because there is just one null
value.
For storing numbers PHP provides the types IS_LONG
and IS_DOUBLE
, which make use of the long lval
and
double dval
members respectively. The former is used to store integers, whereas the latter stores floating point
numbers.
There are some things that one should be aware of about the long
type: Firstly, this is a signed integer type, i.e.
it can store both positive and negative integers, but is commonly not well suited for doing bitwise operations.
Secondly, long
has different sizes on different platforms: On 32bit systems it is 32 bits / 4 bytes large, but on
64bit systems it’s size will be either 4 or 8 bytes. In particular 64bit Unix systems typically have 8 byte longs,
whereas 64bit Windows uses only 4 bytes.
For this reason you shouldn’t rely on any particular size for the long
type. The minimum and maximum values a
long
can store are available via LONG_MIN
and LONG_MAX
and the size of the type can be accessed using
SIZEOF_LONG
(unlike sizeof(long)
this is also usable in #if
directives).
The double
type used to store floating point numbers is (typically) an 8-byte value following the IEEE-754
specification. The details of this format won’t be discussed here, but you should at least be aware of the fact that
this type has limited precision and commonly doesn’t store the exact value you want.
Booleans use the IS_BOOL
flag and are stored in the long lval
member as values 0 (for false) and 1 (for true).
As there are only these two values, one could theoretically use some smaller type instead (like zend_bool
, which is
an unsigned char), but as the zvalue_value
union has the size of its largest member this would not actually result
in any memory savings. As such the lval
member is reused.
Strings (IS_STRING
) are stored in struct { char *val; int len; } str
, i.e. they consist of a char*
string
and an int
length. PHP strings need to store an explicit length in order to allow use of NUL bytes ('\0'
) in
them (“binary safety”). Regardless of this, the strings used by PHP are still NUL-terminated to ease interoperability
with library functions which don’t take length arguments and expect NUL-terminated strings instead. Of course in this
case the strings won’t be binary safe anymore and will be cut off at the first NUL byte they contain. For example many
filesystem related functions behave like this, as well as most libc string functions.
The length of a string is in bytes (not Unicode code points) and does not include the terminating NUL byte: The
length of the string "foo"
is 3, even though it is actually stored using 4 bytes. If you determine the length of a
constant string using sizeof
you need to make sure to subtract one: strlen("foo") == sizeof("foo") - 1
Furthermore it’s important to realize that the string length is stored in an int
and not a long
or some other
type. This is an unfortunate historical artifact, which limits the length of strings to 2147483647 bytes. Strings larger
than this would cause an overflow (thus making the length negative).
The remaining three types will only be mentioned here quickly and discussed in greater detail in their own chapters:
Arrays use the IS_ARRAY
type tag and are stored in the HashTable *ht
member. How the HashTable
structure
works will be discussed in the Hashtables chapter.
Objects (IS_OBJECT
) use the zend_object_value obj
member, which consists of an “object handle”, which is an
integer ID used to look up the actual data of the object, and a set of “object handlers”, which define how the object
behaves. PHP’s class and object system will be described in the Classes and objects chapter.
Resources (IS_RESOURCE
) are similar to objects in that they also store a unique ID that can be used to look up the
actual value. This ID is stored in the long lval
member. Resources are covered in the Resources chapter (which
doesn’t exist yet).
To summarize, here’s a table with all the available type tags and the corresponding storage location for their values:
Type tag |
Storage location |
---|---|
|
none |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Access macros¶
Lets now have a look at how the zval
struct actually looks like:
typedef struct _zval_struct {
zvalue_value value;
zend_uint refcount__gc;
zend_uchar type;
zend_uchar is_ref__gc;
} zval;
As already mentioned, the zval has members to store a value
and its type
. The value is stored in the
zvalue_value
union discussed above and the type tag is held in a zend_uchar
. Additionally the structure has two
properties ending in __gc
, which are used for the garbage collection mechanism PHP employs. We’ll ignore them for
now and discuss their function in the next section.
Knowing the zval structure you can now write code making use of it:
zval *zv_ptr = /* ... get zval from somewhere */;
if (zv_ptr->type == IS_LONG) {
php_printf("Zval is a long with value %ld\n", zv_ptr->value.lval);
} else /* ... handle other types */
While the above code works, this is not the idiomatic way to write it. It directly accesses the zval members rather than using a special set of access macros for this purpose:
zval *zv_ptr = /* ... */;
if (Z_TYPE_P(zv_ptr) == IS_LONG) {
php_printf("Zval is a long with value %ld\n", Z_LVAL_P(zv_ptr));
} else /* ... */
The above code uses the Z_TYPE_P()
macro for retrieving the type tag and Z_LVAL_P()
to get the long (integer)
value. All the access macros have variants with a _P
suffix, a _PP
suffix or no suffix at all. Which one you
use depends on whether you are working on a zval
, a zval*
or a zval**
:
zval zv;
zval *zv_ptr;
zval **zv_ptr_ptr;
zval ***zv_ptr_ptr_ptr;
Z_TYPE(zv); // = zv.type
Z_TYPE_P(zv_ptr); // = zv_ptr->type
Z_TYPE_PP(zv_ptr_ptr); // = (*zv_ptr_ptr)->type
Z_TYPE_PP(*zv_ptr_ptr_ptr); // = (**zv_ptr_ptr_ptr)->type
Basically the number of P
s should be the same as the number of *
s of the type. This only works until
zval**
, i.e. there are no special macros for working with zval***
as this is rarely necessary in practice
(you’ll just have to dereference the value first using the *
operator).
Similarly to Z_LVAL
there are also macros for fetching values of all the other types. To demonstrate their usage
we’ll create a simple function for dumping a zval:
PHP_FUNCTION(dump)
{
zval *zv_ptr;
if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z", &zv_ptr) == FAILURE) {
return;
}
switch (Z_TYPE_P(zv_ptr)) {
case IS_NULL:
php_printf("NULL: null\n");
break;
case IS_BOOL:
if (Z_BVAL_P(zv_ptr)) {
php_printf("BOOL: true\n");
} else {
php_printf("BOOL: false\n");
}
break;
case IS_LONG:
php_printf("LONG: %ld\n", Z_LVAL_P(zv_ptr));
break;
case IS_DOUBLE:
php_printf("DOUBLE: %g\n", Z_DVAL_P(zv_ptr));
break;
case IS_STRING:
php_printf("STRING: value=\"");
PHPWRITE(Z_STRVAL_P(zv_ptr), Z_STRLEN_P(zv_ptr));
php_printf("\", length=%d\n", Z_STRLEN_P(zv_ptr));
break;
case IS_RESOURCE:
php_printf("RESOURCE: id=%ld\n", Z_RESVAL_P(zv_ptr));
break;
case IS_ARRAY:
php_printf("ARRAY: hashtable=%p\n", Z_ARRVAL_P(zv_ptr));
break;
case IS_OBJECT:
php_printf("OBJECT: ???\n");
break;
}
}
const zend_function_entry funcs[] = {
PHP_FE(dump, NULL)
PHP_FE_END
};
Lets try it out:
dump(null); // NULL: null
dump(true); // BOOL: true
dump(false); // BOOL: false
dump(42); // LONG: 42
dump(4.2); // DOUBLE: 4.2
dump("foo"); // STRING: value="foo", length=3
dump(fopen(__FILE__, "r")); // RESOURCE: id=???
dump(array(1, 2, 3)); // ARRAY: hashtable=0x???
dump(new stdClass); // OBJECT: ???
The macros for accessing the values are pretty straightforward: Z_BVAL
for bools, Z_LVAL
for longs, Z_DVAL
for doubles. For strings Z_STRVAL
returns the actual char*
string, whereas Z_STRLEN
provides us with the
length. The resource ID can be fetched using Z_RESVAL
and the HashTable*
of an array is accessed with
Z_ARRVAL
. How object values are accessed will not be covered here as it requires some more background knowledge.
When you want to access the contents of a zval you should always go through these macros, rather than directly accessing
its members. This maintains a level of abstraction and makes the intention clearer: For example, if you directly
accessed the lval
member you could either be fetching the bool value, the long value or the resource ID. Using
Z_BVAL
, Z_LVAL
and Z_RESVAL
instead makes the intention unambiguous. Using the macros also serves as a
protection against changes to the internal zval representation in future PHP versions.
Setting the value¶
Most of the macros introduced above just access some member of the zval structure and as such you can use them both to read and to write the respective values. As an example consider the following function, which simply returns the string “hello world!”:
PHP_FUNCTION(hello_world) {
Z_TYPE_P(return_value) = IS_STRING;
Z_STRVAL_P(return_value) = estrdup("hello world!");
Z_STRLEN_P(return_value) = strlen("hello world!");
};
/* ... */
PHP_FE(hello_world, NULL)
/* ... */
Running php -r "echo hello_world();"
should now print hello world!
to the terminal.
In the above example we set the return_value
variable, which is a zval*
provided by the PHP_FUNCTION
macro.
We’ll look at this variable in more detail in the next chapter, for now it should suffice to know that the value of this
variable will be the return value of the function. By default it is initialized to have type IS_NULL
.
Setting a zval value using the access macros is really straightforward, but there are some things one should keep in
mind: First of all you need to remember that the type tag determines the type of a zval. It doesn’t suffice to just set
the value (via Z_STRVAL
and Z_STRLEN
here), you always need to set the type tag as well.
Furthermore you need to be aware of the fact that in most cases the zval “owns” its value and that the zval will have a longer life-time than the scope in which you set its value. Sometimes this doesn’t apply when dealing with temporary zvals, but in most cases it’s true.
Using the above example this means that the return_value
will live on after our function body leaves (which is quite
obvious, otherwise nobody could use the return value), so it can’t make use of any temporary values of the function.
E.g. just writing Z_STRVAL_P(return_value) = "hello world!"
would be invalid, because the string literal
"hello world!"
ceases to exist after the body is left (which is true for every stack allocated value in C).
Because of this we need to copy the string using estrdup()
. This will create a separate copy of the string on the
heap. Because the zval “owns” its value, it will make sure to free this copy when the zval is destroyed. This also
applies to any other “complex” value of the zval. E.g. if you set the HashTable*
for an array, the zval will take
ownership of it and free it when the zval is destroyed. When using primitive types like integers or doubles you
obviously don’t need to care about this, as they are always copied.
Lastly, it should be pointed out that not all of the access macros directly return a member. The Z_BVAL
macro for
example is defined as follows:
#define Z_BVAL(zval) ((zend_bool)(zval).value.lval)
Because this macro contains a cast you will not be able to write Z_BVAL_P(return_value) = 1
. Apart from some of the
object-related macros this is the only exception though. All the other access macros can be used to set values.
In practice you won’t have to worry about the last bit though: As setting the zval value is such a common task, PHP provides another set of macros for this purpose. They allow you to set the type tag and the value at the same time. Rewriting the previous example using such a macro yields:
PHP_FUNCTION(hello_world) {
ZVAL_STRINGL(return_value, estrdup("hello world!"), strlen("hello world!"), 0);
}
As it is very common that the string has to be copied when assigning to the zval, the last (boolean) parameter of the
ZVAL_STRINGL
macro can handle this for you. If you pass 0
the string is used as is, but if you pass 1
it
will be copied using estrndup()
. Thus our example can be rewritten as:
PHP_FUNCTION(hello_world) {
ZVAL_STRINGL(return_value, "hello world!", strlen("hello world!"), 1);
}
Furthermore we don’t need to manually compute the strlen
and can use the ZVAL_STRING
macro (without the L
at
the end) instead:
PHP_FUNCTION(hello_world) {
ZVAL_STRING(return_value, "hello world!", 1);
}
If you know the length of the string (because it was passed to you in some way) you should always make use of it via the
ZVAL_STRINGL
macro in order to preserve binary-safety. If you don’t know the length (or know that the string doesn’t
contain NUL bytes, as is usually the case with literals) you can use ZVAL_STRING
instead.
Apart from ZVAL_STRING(L)
there are a few more macros for setting values, which are listed in the following
example:
ZVAL_NULL(return_value);
ZVAL_BOOL(return_value, 0);
ZVAL_BOOL(return_value, 1);
/* or better */
ZVAL_FALSE(return_value);
ZVAL_TRUE(return_value);
ZVAL_LONG(return_value, 42);
ZVAL_DOUBLE(return_value, 4.2);
ZVAL_RESOURCE(return_value, resource_id);
ZVAL_EMPTY_STRING(return_value);
/* = ZVAL_STRING(return_value, "", 1); */
ZVAL_STRING(return_value, "string", 1);
/* = ZVAL_STRING(return_value, estrdup("string"), 0); */
ZVAL_STRINGL(return_value, "nul\0string", 10, 1);
/* = ZVAL_STRINGL(return_value, estrndup("nul\0string", 10), 10, 0); */
Note that these macros will set the value, but not destroy any value that the zval might have previously held. For the
return_value
zval this doesn’t matter because it was initialized to IS_NULL
(which has no value that needs to be
freed), but in other cases you’ll have to destroy the old value first using the functions described in the following
section.