Iterators¶
In the last section we implemented a few object handlers to improve integration of typed arrays into the language. One aspect is still missing though: Iteration. In this section we’ll look at how iterators are implemented internally and how we can make use of them. Once again typed arrays will serve as the example.
The get_iterator
handler¶
Internally iteration works very similar to the userland IteratorAggregate
interface. The class has a
get_iterator
handler that returns a zend_object_iterator*
, which looks as follows:
struct _zend_object_iterator {
void *data;
zend_object_iterator_funcs *funcs;
ulong index; /* private to fe_reset/fe_fetch opcodes */
};
The index
member is used internally by the foreach
implementation. It is incremented on each iteration and is
used for the keys if you don’t specify a custom key function. The funcs
member contains handlers for the different
iteration actions:
typedef struct _zend_object_iterator_funcs {
/* release all resources associated with this iterator instance */
void (*dtor)(zend_object_iterator *iter TSRMLS_DC);
/* check for end of iteration (FAILURE or SUCCESS if data is valid) */
int (*valid)(zend_object_iterator *iter TSRMLS_DC);
/* fetch the item data for the current element */
void (*get_current_data)(zend_object_iterator *iter, zval ***data TSRMLS_DC);
/* fetch the key for the current element (optional, may be NULL) */
void (*get_current_key)(zend_object_iterator *iter, zval *key TSRMLS_DC);
/* step forwards to next element */
void (*move_forward)(zend_object_iterator *iter TSRMLS_DC);
/* rewind to start of data (optional, may be NULL) */
void (*rewind)(zend_object_iterator *iter TSRMLS_DC);
/* invalidate current value/key (optional, may be NULL) */
void (*invalidate_current)(zend_object_iterator *iter TSRMLS_DC);
} zend_object_iterator_funcs;
The handlers are pretty similar to the Iterator
interface, only with slightly different names. The only handler
that has no correspondence in userland is invalidate_current
, which can be used to destroy the current key/value.
The handler is largely unused though, in particular foreach
won’t even call it.
The last member in the struct is data
, which can be used to carry around some custom data. Usually this one slot
isn’t enough though, so instead of the structure is extended, similarly to what you have already seen with
zend_object
.
In order to iterate typed arrays we’ll have to store a few things: First of all, we need to hold a reference to the
buffer view object (otherwise it may be destroyed during iteration). We can store this in the data
member.
Furthermore we should keep around the buffer_view_object
so we don’t have to refetch it on every handler call.
Additionally we’ll have to store the current iteration offset
and the zval*
of the current element (you’ll see
a bit later why we need to do this):
typedef struct _buffer_view_iterator {
zend_object_iterator intern;
buffer_view_object *view;
size_t offset;
zval *current;
} buffer_view_iterator;
Let’s also declare a dummy zend_object_iterator_funcs
structure so we have something to work on:
static zend_object_iterator_funcs buffer_view_iterator_funcs = {
buffer_view_iterator_dtor,
buffer_view_iterator_valid,
buffer_view_iterator_get_current_data,
buffer_view_iterator_get_current_key,
buffer_view_iterator_move_forward,
buffer_view_iterator_rewind
};
Now we can implement the get_iterator
handler. This handler receives the class entry, the object and whether the
iteration is done by reference and returns a zend_object_iterator*
. All we have to do is allocate the iterator and
set the respective members:
zend_object_iterator *buffer_view_get_iterator(
zend_class_entry *ce, zval *object, int by_ref TSRMLS_DC
) {
buffer_view_iterator *iter;
if (by_ref) {
zend_throw_exception(NULL, "Cannot iterate buffer view by reference", 0 TSRMLS_CC);
return NULL;
}
iter = emalloc(sizeof(buffer_view_iterator));
iter->intern.funcs = &buffer_view_iterator_funcs;
iter->intern.data = object;
Z_ADDREF_P(object);
iter->view = zend_object_store_get_object(object TSRMLS_CC);
iter->offset = 0;
iter->current = NULL;
return (zend_object_iterator *) iter;
}
Finally we have to adjust the macro for registering buffer view classes:
#define DEFINE_ARRAY_BUFFER_VIEW_CLASS(class_name, type) \
INIT_CLASS_ENTRY(tmp_ce, #class_name, array_buffer_view_functions); \
type##_array_ce = zend_register_internal_class(&tmp_ce TSRMLS_CC); \
type##_array_ce->create_object = array_buffer_view_create_object; \
type##_array_ce->get_iterator = buffer_view_get_iterator; \
type##_array_ce->iterator_funcs.funcs = &buffer_view_iterator_funcs; \
zend_class_implements(type##_array_ce TSRMLS_CC, 2, \
zend_ce_arrayaccess, zend_ce_traversable);
The new things are the assignment to the get_iterator
and iterator_funcs.funcs
as well as the implementation
of the Traversable
interface.
Iterator functions¶
Now let’s actually implement the buffer_view_iterator_funcs
that we specified above:
static void buffer_view_iterator_dtor(zend_object_iterator *intern TSRMLS_DC)
{
buffer_view_iterator *iter = (buffer_view_iterator *) intern;
if (iter->current) {
zval_ptr_dtor(&iter->current);
}
zval_ptr_dtor((zval **) &intern->data);
efree(iter);
}
static int buffer_view_iterator_valid(zend_object_iterator *intern TSRMLS_DC)
{
buffer_view_iterator *iter = (buffer_view_iterator *) intern;
return iter->offset < iter->view->length ? SUCCESS : FAILURE;
}
static void buffer_view_iterator_get_current_data(
zend_object_iterator *intern, zval ***data TSRMLS_DC
) {
buffer_view_iterator *iter = (buffer_view_iterator *) intern;
if (iter->current) {
zval_ptr_dtor(&iter->current);
}
if (iter->offset < iter->view->length) {
iter->current = buffer_view_offset_get(iter->view, iter->offset);
*data = &iter->current;
} else {
*data = NULL;
}
}
#if ZEND_MODULE_API_NO >= 20121212
static void buffer_view_iterator_get_current_key(
zend_object_iterator *intern, zval *key TSRMLS_DC
) {
buffer_view_iterator *iter = (buffer_view_iterator *) intern;
ZVAL_LONG(key, iter->offset);
}
#else
static int buffer_view_iterator_get_current_key(
zend_object_iterator *intern, char **str_key, uint *str_key_len, ulong *int_key TSRMLS_DC
) {
buffer_view_iterator *iter = (buffer_view_iterator *) intern;
*int_key = (ulong) iter->offset;
return HASH_KEY_IS_LONG;
}
#endif
static void buffer_view_iterator_move_forward(zend_object_iterator *intern TSRMLS_DC)
{
buffer_view_iterator *iter = (buffer_view_iterator *) intern;
iter->offset++;
}
static void buffer_view_iterator_rewind(zend_object_iterator *intern TSRMLS_DC)
{
buffer_view_iterator *iter = (buffer_view_iterator *) intern;
iter->offset = 0;
iter->current = NULL;
}
The functions should be rather straightforward, so only a few comments:
get_current_data
gets a zval*** data
as the parameter and expects us to write a zval**
into it using
*data = ...
. The zval**
is required because iteration can also happen by reference, in which case zval*
won’t suffice. The zval**
is the reason why we have to store the current zval*
in the iterator.
How the get_current_key
handler looks like depends on the PHP version: With PHP 5.5 you simply have to write the
key into the passed key
variable using one of the ZVAL_*
macros.
On older versions of PHP the get_current_key
handler takes three parameters that can be set depending on which key
type is returned. If you return HASH_KEY_NON_EXISTENT
the resulting key will be null
and you don’t have to set
any of them. For HASH_KEY_IS_LONG
you set the int_key
argument. For HASH_KEY_IS_STRING
you have to set
str_key
and str_key_len
. Note that here str_key_len
is the string length plus one (similar to how it is done
in the zend_hash
APIs).
Honoring inheritance¶
Once again we need to consider what happens when the user extends the class and wants to change the iteration behavior. Right now he would have to reimplement the iteration mechanism manually, because the individual iteration handlers are not exposed to userland (only through foreach).
As already with the object handlers we’ll solve this by also implementing the normal Iterator
interface. This time
we won’t need special handling to ensure that PHP actually calls the overridden methods: PHP will automatically use the
fast internal handlers when the class is used directly, but will use the Iterator
methods if the class is extended.
In order to implement the Iterator
methods we have to add a new size_t current_offset
member to
buffer_view_object
, which stores the current offset for the iteration methods (and is completely separate from the
iteration state used by get_iterator
-style iterators). The methods itself are to the most part just argument
checking boilerplate:
PHP_FUNCTION(array_buffer_view_rewind)
{
buffer_view_object *intern;
if (zend_parse_parameters_none() == FAILURE) {
return;
}
intern = zend_object_store_get_object(getThis() TSRMLS_CC);
intern->current_offset = 0;
}
PHP_FUNCTION(array_buffer_view_next)
{
buffer_view_object *intern;
if (zend_parse_parameters_none() == FAILURE) {
return;
}
intern = zend_object_store_get_object(getThis() TSRMLS_CC);
intern->current_offset++;
}
PHP_FUNCTION(array_buffer_view_valid)
{
buffer_view_object *intern;
if (zend_parse_parameters_none() == FAILURE) {
return;
}
intern = zend_object_store_get_object(getThis() TSRMLS_CC);
RETURN_BOOL(intern->current_offset < intern->length);
}
PHP_FUNCTION(array_buffer_view_key)
{
buffer_view_object *intern;
if (zend_parse_parameters_none() == FAILURE) {
return;
}
intern = zend_object_store_get_object(getThis() TSRMLS_CC);
RETURN_LONG((long) intern->current_offset);
}
PHP_FUNCTION(array_buffer_view_current)
{
buffer_view_object *intern;
zval *value;
if (zend_parse_parameters_none() == FAILURE) {
return;
}
intern = zend_object_store_get_object(getThis() TSRMLS_CC);
value = buffer_view_offset_get(intern, intern->current_offset);
RETURN_ZVAL(value, 1, 1);
}
/* ... */
ZEND_BEGIN_ARG_INFO_EX(arginfo_buffer_view_void, 0, 0, 0)
ZEND_END_ARG_INFO()
/* ... */
PHP_ME_MAPPING(rewind, array_buffer_view_rewind, arginfo_buffer_view_void, ZEND_ACC_PUBLIC)
PHP_ME_MAPPING(next, array_buffer_view_next, arginfo_buffer_view_void, ZEND_ACC_PUBLIC)
PHP_ME_MAPPING(valid, array_buffer_view_valid, arginfo_buffer_view_void, ZEND_ACC_PUBLIC)
PHP_ME_MAPPING(key, array_buffer_view_key, arginfo_buffer_view_void, ZEND_ACC_PUBLIC)
PHP_ME_MAPPING(current, array_buffer_view_current, arginfo_buffer_view_void, ZEND_ACC_PUBLIC)
Obviously we now should also implement Iterator
rather than Traversable
:
#define DEFINE_ARRAY_BUFFER_VIEW_CLASS(class_name, type) \
INIT_CLASS_ENTRY(tmp_ce, #class_name, array_buffer_view_functions); \
type##_array_ce = zend_register_internal_class(&tmp_ce TSRMLS_CC); \
type##_array_ce->create_object = array_buffer_view_create_object; \
type##_array_ce->get_iterator = buffer_view_get_iterator; \
type##_array_ce->iterator_funcs.funcs = &buffer_view_iterator_funcs; \
zend_class_implements(type##_array_ce TSRMLS_CC, 2, \
zend_ce_arrayaccess, zend_ce_iterator);
One last consideration regarding this: In general it is always better to implement IteratorAggregate
rather than
Iterator
, because IteratorAggregate
decouples the iterator state from the main object. This is obviously simply
better design, but also allows things like independent nested iteration. I still chose to implement Iterator
here,
because aggregates have a higher implementational overhead (as they require a separate class that has to interact with
an independent object).