TITLE

Synopsis 7: Lists and Iteration

VERSION

Created: 07 September 2015

Last Modified: 07 September 2015
Version: 1

Design Overview

Perl 6 provides a wide array of list-related features, including eager, lazy, and parallel evaluation of sequences of values, and arrays offering compact and multi-dimensional storage. Laziness in particular needs careful handling, so as to provide the power advanced users desire while not creating surprises for typical language users who have the (reasonable) expectation that an assignment into an array will have immediate effect. Additionally, it is important to give the programmer control of when values will and won't be retained. Finally, all of this needs to be done in a way that provides the convenience that a Perl is expected to provide, while still having a model that can be understood through understanding a small number of rules.

Sequences vs. Lists

In Perl 6, we use the term "sequence" to refer to something that can, when requested, produce a sequence of values. Of note, it can only be asked to produce them once. We use the term "list" to refer to things that hold (and so remember) values. There are various concrete types that represent various kinds of list and sequence with different semantics:

(1, 2, 3)       # a List, the simplest kind of list
[1, 2, 3]       # an Array, a list of (assignable) Scalar containers
|(1, 2)         # a Slip, a list which flattens into a surrounding List
$*IN.lines      # a Seq, a sequence that can be processed serially
(^1000).race    # a HyperSeq, a sequence that can be processed in parallel

The single argument rule

The @ sigil in Perl indicates "these", while $ indicates "the". This kind of plural/single distinction shows up in various places in the language, and much convenience in Perl comes from it. Flattening is the idea that an @-like thing will, in certain contexts, have its values automatically incorporated into the surrounding list. Traditionally this has been a source of both great power and great confusion in Perl. Perl 6 has been through a number of models relating to flattening during its evolution, before settling on a straightforward one known as the "single argument rule".

The single argument rule is best understood by considering the number of iterations that a for loop will do. The thing to iterate over is always treated as a single argument to the for loop, thus the name of the rule.

for 1, 2, 3 { }         # List of 3 things; 3 iterations
for (1, 2, 3) { }       # List of 3 things; 3 iterations
for [1, 2, 3] { }       # Array of 3 things (put in Scalars); 3 iterations
for @a, @b { }          # List of 2 things; 2 iterations
for (@a,) { }           # List of 1 thing; 1 iteration
for (@a) { }            # List of @a.elems things; @a.elems iterations
for @a { }              # List of @a.elems things; @a.elems iterations

The first two are equivalent because parentheses do not actually construct a list, but only group. It is the infix:<,> operator that forms a list. The third also performs three iterations, since in Perl 6 [...] constructs an Array but does not wrap it into a Scalar container. The fourth will do two iterations, since the argument is a list of two things; that they both happen to have the @-sigil does not, alone, lead to any kind of flattening. The same goes for the fifth; infix:<,> will happily form a list of one thing.

The single argument rule does respect Scalar containers. Therefore:

for $(1, 2, 3) { }      # List in a Scalar; 1 iteration
for $[1, 2, 3] { }      # Array in a Scalar; 1 iteration
for $@a { }             # Array in a Scalar; 1 iteration

The single argument rule is implemented consistently throughout the language. For example, consider the append method:

@a.append: 1, 2, 3;       # appends 3 values to @a
@a.append: [1, 2, 3];     # appends 3 values to @a
@a.append: @b;            # appends @b.elems values to @a
@a.append: @b,;           # same, trailing comma doesn't make > 1 argument
@a.append: $(1, 2, 3);    # appends 1 value (a List) to @a
@a.append: $[1, 2, 3];    # appends 1 value (an Array) to @a

Additionally, the list constructor (the infix:<,> operator) and the array composer (the [...] circumfix) follow the rule:

[1, 2, 3]               # Array of 3 elements
[@a, @b]                # Array of 2 elements
[@a, 1..10]             # Array of 2 elements
[@a]                    # Array with the elements of @a copied into it
[1..10]                 # Array with 10 elements
[$@a]                   # Array with 1 element (@a)
[@a,]                   # Array with 1 element (@a)
[[1]]                   # Same as [1]
[[1],]                  # Array with a single element that is [1]
[$[1]]                  # Array with a single element that is [1]

The only one of these that is likely to provide a surprise is [[1]], but it is deemed sufficiently rare that it does not warrant an exception to the very general single argument rule.

User-level Types

List

A List is an immutable, potentially infinite, list of values. The simplest way to form a List is with the infix:<,> operator:

1, 2, 3

A List can be indexed and, provided it is finite, asked for its number of elements:

say (1, 2, 3)[1];       # 2
say (1, 2, 3).elems;    # 3

As it is immutable, it is not possible to push, pop, shift, unshift, or splice a List. The reverse and rotate operations return new Lists.

While a List itself is immutable, it may freely contain mutable things, including Scalar containers. Thus:

my $a = 2;
my $b = 4;
($a, $b)[0]++;
($a, $b)[1] *= 2;
say $a;             # 3
say $b;             # 8

Trying to assign to an immutable value in a List is an error, however.

(1, 2, 3)[0]++;     # Dies: Cannot assign to an immutable value

Slip

The Slip type is a subclass of List. A Slip will have its values incorporated into a surrounding List.

(1, (2, 3), 4).elems        # 3
(1, slip(2, 3), 4).elems    # 4

It is possible to coerce a List to a Slip, so the above can also be written as:

(1, (2, 3).Slip, 4).elems   # 4

This is a common way to get flattening in places it will not magically take place:

my @a = 1, 2, 3;
my @b = 4, 5;
for @a.Slip, @b.Slip { }    # 5 iterations

It's also a bit verbose, which is why the prefix:<|> operator will, anywhere other than a function call argument list, do a Slip coercion:

my @a = 1, 2, 3;
my @b = 4, 5;
for |@a, |@b { }            # 5 iterations

It can also be useful in forms such as:

my @prefixed-values = 0, |@values;

Where the single argument rule would otherwise make @prefixed-values have two elements, the zero and @values.

The Slip type is also respected by map, gather/take, and lazy loops. It is the way a map can place multiple values into its result stream:

my @a = 1, 2;
say @a.map({ $_ xx 2 }).elems;      # 2
say @a.map({ |($_ xx 2) }).elems;   # 4

Array

An Array is a subclass of List that places values assigned to it into Scalar containers, meaning they can be mutated. It is the default type that an @-sigil variable gets.

my @a = 1, 2, 3;
say @a.WHAT;        # (Array)
@a[1]++;
say @a;             # 1 3 3

In the absence of a shape specification, it will grow automatically.

my @a;
@a[5] = 42;
say @a.elems;       # 6

An Array supports push, pop, shift, unshift, and splice.

Assignment to an array is eager by default, and creates a new set of Scalar containers:

my @a = 1, 2, 3;
my @b = @a;
@a[1]++;
say @b;             # 1, 2, 3

Note that the [...] Array constructor is equivalent to creating and then assigning to an anonymous Array, and so has the same semantics with regard to eagerness and fresh containers.

Seq

A Seq is a one-shot producer of values. Most list processing operations return a Seq, as do most synchronous sources of multiple values.

say (1, 2, 3).map(* + 1).^name;  # Seq
say (1, 2 Z 'a', 'b').^name;     # Seq
say (1, 1, * + * ... *).^name;   # Seq
say $*IN.lines.^name;            # Seq

Since a Seq will not by default remember its values, it can only be consumed once. For example, if a Seq is stored:

my \seq = (1, 2, 3).map(* + 1);

Then only one attempt to iterate it will work; subsequent attempts will die as the values have already been consumed:

for seq { .say }    # 2\n3\n4\n
for seq { .say }    # Dies: This Seq has already been iterated

This means you can be confident that a loop going over a file's lines:

for open('data').lines {
    .say if /beer/;
}

Will not be retaining the lines of the file in memory. Additionally, it is easy to set up processing pipelines that also will not retain all of the lines in memory:

my \lines = open('products').lines;
my \beer = lines.grep(/beer/);
my \excited = beer.map(&uc);
.say for excited;

However, any attempt to re-use lines, beer, or excited will result in an error. This program is equivalent in performance to:

.say for open('products').lines.grep(/beer/).map(&uc);

But provides a chance to name the stages. Note that it's possible to use Scalar variables instead, but the single argument rule means that the final loop would have to be:

.say for |$excited;

Assigning a Seq to an Array will - so long as the sequence is not marked lazy - eagerly perform the operation and store the results into the Array. Therefore, there are no surprises to anyone writing:

my @lines = open('products').lines;
my @beer = @lines.grep(/beer/);
my @excited = @beer.map(&uc);
.say for @excited;

Re-using any of these arrays will work out fine. Of course, the memory behavior of this program is radically different, and it will be slower due to all of the extra Scalar containers created (resulting in extra garbage collection) and poor locality of reference (we have to talk about the same string many times over the programs lifetime).

Occasionally it can be useful to request that a Seq cache itself. This can be done by calling the cache method on a Seq, which makes a lazy List from the Seq and returns it. Subsequent calls to cache will return the same lazy list. Note that the first call to cache counts as consuming the Seq, and so it will not work out if any prior iteration has taken place, and any later attempt to iterate the Seq after calling cache will also fail. It is only .cache which may be called more than once.

A Seq does not do the Positional role like a List. Therefore, it can not be bound to an @-sigil variable:

my @lines := $*IN.lines;    # Dies

One consequence of this is that, naively, you could not pass a Seq as an argument to be bound to an @-sigil parameter:

sub process(@data) {
}
process($*IN.lines);

This would be rather too inconvenient. Therefore, the signature binder (which actually uses ::= assignment semantics rather than :=) will spot failure to bind the <@>-sigil parameter, and then check if the argument does the PositionalBindFailover role. If it does, then it will call the cache method on the argument and bind the result of that instead.

Iterable

Both Seq and List, along with various other types in Perl 6, do the Iterable role. The primary purpose of this role is to promise that an iterator method is available. The average Perl 6 user will rarely need to care about the iterator method and what it returns.

The secondary purpose of Iterable is to mark out things that will flatten on demand, when the flat method or function is used on them.

my @a = 1, 2, 3;
my @b = 4, 5;
for flat @a, @b { }         # 5 iterations
say [flat @a, @b].elems;    # 5

Another use of flat is to flatten nested List structure. For example, the Z (zip) operator produces a List of List:

say (1, 2 Z 'a', 'b').perl; # ((1, "a"), (2, "b")).Seq

A flat can be used to flatten these, which is useful in conjunction with a for loop using a pointy block with multiple parameters:

for flat 1, 2 Z 'a', 'b' -> $num, $letter { }

Note that flat respects Scalar containers, and so:

for flat $(1, 2) { }

Will only do one iteration. Remember that an Array stores everything in a Scalar container, and so flat on an Array - short of weird tricks with binding - will always be the same as iterating over the Array itself. In fact, the flat method on an Array returns identity.

HyperSeq

array

The Iterator API and Implementation Types

The Iterator role

The IterationBuffer class

Parallelism

AUTHORS

Jonathan Worthington <jnthn@jnthn.net>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S07-lists.pod

S07-lists.pod

TITLE

VERSION

Design Overview

Sequences vs. Lists

The single argument rule

User-level Types

List

Slip

Array

Seq

Iterable

HyperSeq

array

The Iterator API and Implementation Types

The Iterator role

The IterationBuffer class

Parallelism

AUTHORS

Files

S07-lists.pod

Latest commit

History

S07-lists.pod

File metadata and controls

TITLE

VERSION

Design Overview

Sequences vs. Lists

The single argument rule

User-level Types

List

Slip

Array

Seq

Iterable

HyperSeq

array

The Iterator API and Implementation Types

The Iterator role

The IterationBuffer class

Parallelism

AUTHORS