Perl's aggregate data types--arrays and hashes--allow you to store scalars indexed by integer or string keys. Perl 5's references (references) allow you to access aggregate data types through special scalars. Nested data structures in Perl, such as an array of arrays or a hash of hashes, are possible through the use of references.
Use the anonymous reference declaration syntax to declare a nested data structure:
Use Perl's reference syntax to access elements in nested data structures. The sigil denotes the amount of data to retrieve, and the dereferencing arrow indicates that the value of one portion of the data structure is a reference:
The only way to nest a multi-level data structure is through references, so the arrow is superfluous. You may omit it for clarity, except for invoking function references:
Use disambiguation blocks to access components of nested data structures as if they were first-class arrays or hashes:
... or to slice a nested data structure:
Whitespace helps, but does not entirely eliminate the noise of this construct. Use temporary variables to clarify:
... or use for
's implicit aliasing to $_
to avoid the use of an intermediate reference:
perldoc perldsc
, the data structures cookbook, gives copious examples of how to use Perl's various data structures.
When you attempt to write to a component of a nested data structure, Perl will create the path through the data structure to the destination as necessary:
After the second line of code, this array of arrays of arrays of arrays contains an array reference in an array reference in an array reference in an array reference. Each array reference contains one element. Similarly, treating an undefined value as if it were a hash reference in a nested data structure will make it so:
This useful behavior is autovivification. While it reduces the initialization code of nested data structures, it cannot distinguish between the honest intent to create missing elements in nested data structures and typos. The autovivification
pragma (pragmas) from the CPAN lets you disable autovivification in a lexical scope for specific types of operations.
You may wonder at the contradiction between taking advantage of autovivification while enabling strict
ures. The question is one of balance. Is it more convenient to catch errors which change the behavior of your program at the expense of disabling error checks for a few well-encapsulated symbolic references? Is it more convenient to allow data structures to grow rather than specifying their size and allowed keys?
The answers depend on your project. During early development, allow yourself the freedom to experiment. While testing and deploying, consider an increase of strictness to prevent unwanted side effects. Thanks to the lexical scoping of the strict
and autovivification
pragmas, you can enable these behaviors where and as necessary.
You can verify your expectations before dereferencing each level of a complex data structure, but the resulting code is often lengthy and tedious. It's better to avoid deeply nested data structures by revising your data model to provide better encapsulation.
The complexity of Perl 5's dereferencing syntax combined with the potential for confusion with multiple levels of references can make debugging nested data structures difficult. Two good visualization tools exist.
The core module Data::Dumper
converts values of arbitrary complexity into strings of Perl 5 code:
This is useful for identifying what a data structure contains, what you should access, and what you accessed instead. Data::Dumper
can dump objects as well as function references (if you set $Data::Dumper::Deparse
to a true value).
While Data::Dumper
is a core module and prints Perl 5 code, its output is verbose. Some developers prefer the use of the YAML::XS
or JSON
modules for debugging. They do not produce Perl 5 code, but their outputs can be much clearer to read and to understand.
Perl 5's memory management system of reference counting (reference_counts) has one drawback apparent to user code. Two references which eventually point to each other form a circular reference that Perl cannot destroy on its own. Consider a biological model, where each entity has two parents and zero or more children:
Both $alice
and $robert
contain an array reference which contains $cianne
. Because $cianne
is a hash reference which contains $alice
and $robert
, Perl can never decrease the reference count of any of these three people to zero. It doesn't recognize that these circular references exist, and it can't manage the lifespan of these entities.
Either break the reference count manually yourself (by clearing the children of $alice
and $robert
or the parents of $cianne
), or use weak references. A weak reference is a reference which does not increase the reference count of its referent. Weak references are available through the core module Scalar::Util
. Its weaken()
function prevents a reference count from increasing:
Now $cianne
will retain references to $alice
and $robert
, but those references will not by themselves prevent Perl's garbage collector from destroying those data structures. Most data structures do not need weak references, but when they're necessary, they're invaluable.
While Perl is content to process data structures nested as deeply as you can imagine, the human cost of understanding these data structures and their relationships--to say nothing of the complex syntax--is high. Beyond two or three levels of nesting, consider whether modeling various components of your system with classes and objects (moose) will allow for clearer code.
Hey! The above document had some coding errors, which are explained below:
- Around line 3:
-
A non-empty Z<>
- Around line 111:
-
A non-empty Z<>
- Around line 195:
-
A non-empty Z<>