Serialization¶
In this section we’ll have a look at PHP’s serialization format and the different mechanisms PHP provides to serialize object data. As usual we’ll use the typed arrays implementation as an example.
PHP’s serialization format¶
You probably already know how the output of serialize()
roughly looks like: It has some kind of type specifier (like
s
or i
), followed by a colon, followed by the actual data, followed by a semicolon. As such the serialization
format for the “simple” types looks as follows:
NULL: N;
true: b:1;
false: b:0;
42: i:42;
42.3789: d:42.378900000000002;
^-- Precision controlled by serialize_precision ini setting (default 17)
"foobar": s:6:"foobar";
^-- strlen("foobar")
resource: i:0;
^-- Resources can't really be serialized, so they just get the value int(0)
For arrays a list of key-value pairs is contained in curly braces:
[10, 11, 12]: a:3:{i:0;i:10;i:1;i:11;i:2;i:12;}
^-- count([10, 11, 12])
v-- key v-- value
["foo" => 4, "bar" => 2]: a:2:{s:3:"foo";i:4;s:3:"bar";i:2;}
^-- key ^-- value
For objects there are two serialization mechanisms: The first one simply serializes the object properties just like it
is done for arrays. This mechanism uses O
as the type specifier.
Consider the following class:
class Test {
public $public = 1;
protected $protected = 2;
private $private = 3;
}
This is serialized as follows:
v-- strlen("Test") v-- property v-- value
O:4:"Test":3:{s:6:"public";i:1;s:12:"\0*\0protected";i:2;s:13:"\0Test\0private";i:3;}
^-- property ^-- value ^-- property ^-- value
The \0
in the above serialization string are NUL bytes. As you can see private and protected members are serialized
with rather peculiar names: Private properties are prefixed with \0ClassName\0
and protected properties with
\0*\0
. These names are the result of name mangling, which is something we’ll cover in a later section.
The second mechanism allows for custom serialization formats. It delegates the actual serialization to the serialize
method of the Serializable
interface and uses the C
type specifier. For example consider this class:
class Test2 implements Serializable {
public function serialize() {
return "foobar";
}
public function unserialize($str) {
// ...
}
}
It will be serialized as follows:
C:5:"Test2":6:{foobar}
^-- strlen("foobar")
In this case PHP will just put the result of the Serializable::serialize()
call inside the curly braces.
Another feature of PHP’s serialization format is that it will properly preserve references:
$a = ["foo"];
$a[1] =& $a[0];
a:2:{i:0;s:3:"foo";i:1;R:2;}
The important part here is the R:2;
element. It means “reference to the second value”. What is the second value?
The whole array is the first value, the first index (s:3:"foo"
) is the second value, so that’s what is referenced.
As objects in PHP exhibit a reference-like behavior serialize
also makes sure that the same object occurring twice
will really be the same object on unserialization:
$o = new stdClass;
$o->foo = $o;
O:8:"stdClass":1:{s:3:"foo";r:1;}
As you can see it works the same way as with references, just using the small r
instead of R
.
Serializing internal objects¶
As internal objects don’t store their data in ordinary properties PHP’s default serialization mechanism will not work.
For example, if you try to serialize an ArrayBuffer
all you’ll get is this:
O:11:"ArrayBuffer":0:{}
Thus we’ll have to write a custom handler for serialization. As mentioned above there are two ways in which objects can
be serialized (O
and C
). I’ll demonstrate how to use both, starting with the C
format that uses the
Serializable
interface. For this method we’ll create our own serialization format based on the primitives that are
provided by serialize
. In order to do so we need to include two headers:
#include "ext/standard/php_var.h"
#include "ext/standard/php_smart_str.h"
The php_var.h
header exports some serialization functions, the php_smart_str.h
header contains PHPs
smart_str
API. This API provides a dynamically resized string structure, that allows us to easily create strings
without concerning ourselves with allocation.
Now let’s see how the serialize
method for an ArrayBuffer
could look like:
PHP_METHOD(ArrayBuffer, serialize)
{
buffer_object *intern;
smart_str buf = {0};
php_serialize_data_t var_hash;
zval zv, *zv_ptr = &zv;
if (zend_parse_parameters_none() == FAILURE) {
return;
}
intern = zend_object_store_get_object(getThis() TSRMLS_CC);
if (!intern->buffer) {
return;
}
PHP_VAR_SERIALIZE_INIT(var_hash);
INIT_PZVAL(zv_ptr);
/* Serialize buffer as string */
ZVAL_STRINGL(zv_ptr, (char *) intern->buffer, (int) intern->length, 0);
php_var_serialize(&buf, &zv_ptr, &var_hash TSRMLS_CC);
/* Serialize properties as array */
Z_ARRVAL_P(zv_ptr) = zend_std_get_properties(getThis() TSRMLS_CC);
Z_TYPE_P(zv_ptr) = IS_ARRAY;
php_var_serialize(&buf, &zv_ptr, &var_hash TSRMLS_CC);
PHP_VAR_SERIALIZE_DESTROY(var_hash);
if (buf.c) {
RETURN_STRINGL(buf.c, buf.len, 0);
}
}
Apart from the usual boilerplate this method contains a few interesting elements: Firstly, we declared a
php_serialize_data_t var_hash
variable, which is initialized with PHP_VAR_SERIALIZE_INIT
and destroyed with
PHP_VAR_SERIALIZE_DESTROY
. This variable is really of type HashTable*
and is used to remember the serialized
values for the R
/r
reference preservation mechanism.
Furthermore we create a smart string using smart_str buf = {0}
. The = {0}
initializes all members of the struct
with zero. This struct looks as follows:
typedef struct {
char *c;
size_t len;
size_t a;
} smart_str;
c
is the buffer of the string, len
the currently used length and a
the size of the current allocation (as
this is smart string this doesn’t necessarily match len
).
The serialization itself happens by using a dummy zval (zv_ptr
). We first write a value into it and then call
php_var_serialize
. The first serialized value is the actual buffer (as a string), the second value are the
properties (as an array).
A bit more complicated is the unserialize
method:
PHP_METHOD(ArrayBuffer, unserialize)
{
buffer_object *intern;
char *str;
int str_len;
php_unserialize_data_t var_hash;
const unsigned char *p, *max;
zval zv, *zv_ptr = &zv;
if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &str, &str_len) == FAILURE) {
return;
}
intern = zend_object_store_get_object(getThis() TSRMLS_CC);
if (intern->buffer) {
zend_throw_exception(
NULL, "Cannot call unserialize() on an already constructed object", 0 TSRMLS_CC
);
return;
}
PHP_VAR_UNSERIALIZE_INIT(var_hash);
p = (unsigned char *) str;
max = (unsigned char *) str + str_len;
INIT_ZVAL(zv);
if (!php_var_unserialize(&zv_ptr, &p, max, &var_hash TSRMLS_CC)
|| Z_TYPE_P(zv_ptr) != IS_STRING || Z_STRLEN_P(zv_ptr) == 0) {
zend_throw_exception(NULL, "Could not unserialize buffer", 0 TSRMLS_CC);
goto exit;
}
intern->buffer = Z_STRVAL_P(zv_ptr);
intern->length = Z_STRLEN_P(zv_ptr);
INIT_ZVAL(zv);
if (!php_var_unserialize(&zv_ptr, &p, max, &var_hash TSRMLS_CC)
|| Z_TYPE_P(zv_ptr) != IS_ARRAY) {
zend_throw_exception(NULL, "Could not unserialize properties", 0 TSRMLS_CC);
goto exit;
}
if (zend_hash_num_elements(Z_ARRVAL_P(zv_ptr)) != 0) {
zend_hash_copy(
zend_std_get_properties(getThis() TSRMLS_CC), Z_ARRVAL_P(zv_ptr),
(copy_ctor_func_t) zval_add_ref, NULL, sizeof(zval *)
);
}
exit:
zval_dtor(zv_ptr);
PHP_VAR_UNSERIALIZE_DESTROY(var_hash);
}
The unserialize
method again declares a var_hash
variable, this time of type php_unserialize_data_t
,
initialized with PHP_VAR_UNSERIALIZE_INIT
and destructed with PHP_VAR_UNSERIALIZE_DESTROY
. It has pretty much
the same function as its serialize equivalent: Storing variables for R
/r
.
In order to use the php_var_unserialize
function we need two pointers to the serialized string: The first one is
p
, which is the current position in the string. The second one is max
and points to the end of the string. The
p
position is passed to php_var_unserialize
by-reference and will be modified to point to the start of the next
value that is to be unserialized.
The first unserialization reads the buffer, the second the properties. The largest part of the code is various error
handling. PHP has a long history of serialization related crashes (and security issues), so one should be careful to
ensure all the data is valid. You should also not forget that methods like unserialize
even though they have a
special meaning can still called as normal methods. In order to prevent such calls the above call aborts if
intern->buffer
is already set.
Now let’s look at the second serialization mechanism, which will be used for the buffer views. In order to implement the
O
serialization we’ll need a custom get_properties
handler (which returns the “properties” to serialize)
and a __wakeup
method (which restores the state from the serialized properties).
The get_properties
handler allows you to fetch the properties of an object as a hashtable. The engine does this in
various places, one of them being O
serialization. Thus we can use this handler to return the view’s buffer object,
offset and length as properties, which will then be serialized just like any other property:
static HashTable *array_buffer_view_get_properties(zval *obj TSRMLS_DC)
{
buffer_view_object *intern = zend_object_store_get_object(obj TSRMLS_CC);
HashTable *ht = zend_std_get_properties(obj TSRMLS_CC);
zval *zv;
if (!intern->buffer_zval) {
return ht;
}
Z_ADDREF_P(intern->buffer_zval);
zend_hash_update(ht, "buffer", sizeof("buffer"), &intern->buffer_zval, sizeof(zval *), NULL);
MAKE_STD_ZVAL(zv);
ZVAL_LONG(zv, intern->offset);
zend_hash_update(ht, "offset", sizeof("offset"), &zv, sizeof(zval *), NULL);
MAKE_STD_ZVAL(zv);
ZVAL_LONG(zv, intern->length);
zend_hash_update(ht, "length", sizeof("length"), &zv, sizeof(zval *), NULL);
return ht;
}
Note that these magic properties will now also turn up in the debugging output, which in this case is probably a good
idea. Also the properties will be accessible as “normal” properties, but only after this handler has been called. E.g.
you would be able to access the $view->buffer
property after serializing the object. We can’t really do anything
against this side-effect (other than using the other serialization method).
In order to restore the state after unserialization we implement the __wakeup
magic method. This method is called
right after unserialization and allows you to read the object properties and reconstruct the internal state from them:
PHP_FUNCTION(array_buffer_view_wakeup)
{
buffer_view_object *intern;
HashTable *props;
zval **buffer_zv, **offset_zv, **length_zv;
if (zend_parse_parameters_none() == FAILURE) {
return;
}
intern = zend_object_store_get_object(getThis() TSRMLS_CC);
if (intern->buffer_zval) {
zend_throw_exception(
NULL, "Cannot call __wakeup() on an already constructed object", 0 TSRMLS_CC
);
return;
}
props = zend_std_get_properties(getThis() TSRMLS_CC);
if (zend_hash_find(props, "buffer", sizeof("buffer"), (void **) &buffer_zv) == SUCCESS
&& zend_hash_find(props, "offset", sizeof("offset"), (void **) &offset_zv) == SUCCESS
&& zend_hash_find(props, "length", sizeof("length"), (void **) &length_zv) == SUCCESS
&& Z_TYPE_PP(buffer_zv) == IS_OBJECT
&& Z_TYPE_PP(offset_zv) == IS_LONG && Z_LVAL_PP(offset_zv) >= 0
&& Z_TYPE_PP(length_zv) == IS_LONG && Z_LVAL_PP(length_zv) > 0
&& instanceof_function(Z_OBJCE_PP(buffer_zv), array_buffer_ce TSRMLS_CC)
) {
buffer_object *buffer_intern = zend_object_store_get_object(*buffer_zv TSRMLS_CC);
size_t offset = Z_LVAL_PP(offset_zv), length = Z_LVAL_PP(length_zv);
size_t bytes_per_element = buffer_view_get_bytes_per_element(intern);
size_t max_length = (buffer_intern->length - offset) / bytes_per_element;
if (offset < buffer_intern->length && length <= max_length) {
Z_ADDREF_PP(buffer_zv);
intern->buffer_zval = *buffer_zv;
intern->offset = offset;
intern->length = length;
intern->buf.as_int8 = buffer_intern->buffer;
intern->buf.as_int8 += offset;
return;
}
}
zend_throw_exception(
NULL, "Invalid serialization data", 0 TSRMLS_CC
);
}
The method is more or less pure error-checking boilerplate (as is usual when dealing with serialization). The only
thing it really does is to fetch the three magic properties using zend_hash_find
, check their validity and then
initialize the internal object from them.
Denying serialization¶
Sometimes objects can’t be reasonably serialized. In this case you can deny serialization by assigning special serialization handlers:
ce->serialize = zend_class_serialize_deny;
ce->unserialize = zend_class_unserialize_deny;
The serialize
and unserialize
class handlers are used to implement the Serializable
interface, i.e. the
C
serialization. As such assigning to them will deny serialization and C
unserialization, but will still allow
O
unserialization. To disallow that case too, simply throw an error from __wakeup
:
PHP_METHOD(SomeClass, __wakeup)
{
if (zend_parse_parameters_none() == FAILURE) {
return;
}
zend_throw_exception(NULL, "Unserialization of SomeClass is not allowed", 0 TSRMLS_CC);
}
And with this we leave the array buffers behind and turn towards magic interfaces as the next topic.