diff --git a/MANIFEST b/MANIFEST index 949ed2a0d618..193d703ec67e 100644 --- a/MANIFEST +++ b/MANIFEST @@ -5355,6 +5355,7 @@ pod/perlbot.pod pod/perlcall.pod Perl calling conventions from C pod/perlcheat.pod Perl cheat sheet pod/perlclass.pod Perl class syntax +pod/perlclassguts.pod Internals of class syntax pod/perlclib.pod Internal replacements for standard C library functions pod/perlcommunity.pod Perl community information pod/perldata.pod Perl data structures diff --git a/pod/perl.pod b/pod/perl.pod index f75ec3b81e60..58fa27a0cad9 100644 --- a/pod/perl.pod +++ b/pod/perl.pod @@ -161,6 +161,7 @@ aux h2ph h2xs perlbug pl2pm pod2html pod2man splain xsubpp perlmroapi Perl method resolution plugin interface perlreapi Perl regular expression plugin interface perlreguts Perl regular expression engine internals + perlclassguts Internals of class syntax perlapi Perl API listing (autogenerated) perlintern Perl internal functions (autogenerated) diff --git a/pod/perlclass.pod b/pod/perlclass.pod index b990f665c405..ad035c921897 100644 --- a/pod/perlclass.pod +++ b/pod/perlclass.pod @@ -74,6 +74,12 @@ Additionally, in the class BLOCK you are allowed to declare fields and methods. field VARIABLE_NAME; + field VARIABLE_NAME = EXPR; + + field VARIABLE_NAME : ATTRIBUTES; + + field VARIABLE_NAME : ATTRIBUTES = EXPR; + Fields are variables which are visible in the scope of the class - more specifically within L and C blocks. Each class instance get their own storage of fields, independent of each other. @@ -84,17 +90,28 @@ accessible from the outside). The main difference is that different instances access different values in the same scope. class WithFields { - field $scalar; - field @array; - field %hash; + field $scalar = 42; + field @array = qw(this is just an array); + field %hash = (species => 'Martian', planet => 'Mars'); + } - ADJUST { - $scalar = 42; - @array = qw(this is just an array); - %hash = (species => 'Marsian', planet => 'Mars'); - } +Fields may optionally have initializing expressions. If present, the expression +will be evaluated within the constructor of each object instance. During each +evaluation, the expression can use the value of any previously-set field, as +well as see any other variables in scope. + + class WithACounter { + my $next_count = 1; + field $count = $next_count++; } +When combined with the C<:param> field attribute, the defaulting expression can +use any of the C<=>, C or C<||=> operators. Expressions using C<=> will +apply whenever the caller did not pass the corresponding parameter to the +constructor at all. Expressions using C will also apply if the caller did +pass the parameter but the value was undefined, and expressions using C<||=> +will apply if the value was false. + =head2 method method METHOD_NAME SIGNATURE BLOCK @@ -167,7 +184,19 @@ already loaded. =head2 Field attributes -None yet. +=head3 :param + +A scalar field with a C<:param> attribute will take its value from a named +parameter passed to the constructor. By default the parameter will have the +same name as the field (minus its leading C<$> sigil), but a different name +can be specified in the attribute. + + field $x :param; + field $y :param(the_y_value); + +If there is no defaulting expression then the parameter is required by the +constructor; the caller must pass it or an exception is thrown. With a +defaulting expression this becomes optional. =head2 Method attributes @@ -209,6 +238,69 @@ C<'OBJECT'>. Just like with other references, when object reference count reaches zero it will automatically be destroyed. +=head1 TODO + +This feature is still experimental and very incomplete. The following list +gives some overview of the kinds of work still to be added or changed: + +=over 4 + +=item * Roles + +Some syntax for declaring a role (likely a C keyword), and for consuming +a role into a class (likely a C<:does()> attribute). + +=item * Parameters to ADJUST blocks + +Some syntax for declaring that an C block can consume named +parameters, which become part of the class constructor's API. This might be +inspired by a similar plan to add named arguments to subroutine signatures. + + class X { + ADJUST (:$alpha, :$beta = 123) { + ... + } + } + + my $obj = X->new(alpha => 456); + +=item * ADJUST blocks as true blocks + +Currently, every ADJUST block is wrapped in its own CV that gets invoked with +the full ENTERSUB overhead. It should be possible to use the same mechanism +that makes all field initializer expressions appear within the same CV on +ADJUST blocks as well, merging them all into a single CV per class. This will +make it faster to invoke if a class has more than one of them. + +=item * Accessor generator attributes + +Attributes to request that accessor methods be generated for fields. Likely +C<:reader> and C<:writer>. + + class X { + field $name :reader; + } + +Equivalent to + + class X { + field $name; + method name { return $name; } + } + +=item * Metaprogramming + +An extension of the metaprogramming API (currently proposed by +L) which adds knowledge of +classes, methods, fields, ADJUST blocks, and other such class-related details. + +=item * Extension Customisation + +Ways in which out-of-core modules can interact with the class system, +including an ability for them to provide new class or field attributes. + +=back + =head1 AUTHORS Paul Evans diff --git a/pod/perlclassguts.pod b/pod/perlclassguts.pod new file mode 100644 index 000000000000..acecc815cfc8 --- /dev/null +++ b/pod/perlclassguts.pod @@ -0,0 +1,409 @@ +=head1 NAME + +perlclassguts - Internals of how C and class syntax works + +=head1 DESCRIPTION + +This document provides in-depth information about the way in which the perl +interpreter implements the C syntax and overall behaviour. +It is not intended as an end-user guide on how to use the feature. For that, +see L. + +The reader is assumed to be generally familiar with the perl interpreter +internals overall. For a more general overview of these details, see also +L. + +=head1 DATA STORAGE + +=head2 Classes + +A class is fundamentally a package, and exists in the symbol table as an HV +with an aux structure in exactly the same way as a non-class package. It is +distinguished from a non-class package by the fact that the +C macro will return true on it. + +Extra information relating to it being a class is stored in the +C structure attached to the stash, in the following fields: + + HV *xhv_class_superclass; + CV *xhv_class_initfields_cv; + AV *xhv_class_adjust_blocks; + PADNAMELIST *xhv_class_fields; + PADOFFSET xhv_class_next_fieldix; + HV *xhv_class_param_map; + +=over 4 + +=item * + +C will be C for a class with no superclass. It +will point directly to the stash of the parent class if one has been set with +the C<:isa()> class attribute. + +=item * + +C will contain a C pointing to a function to be +invoked as part of the constructor of this class or any subclass thereof. This +CV is responsible for initializing all the fields defined by this class for a +new instance. This CV will be an anonymous real function - i.e. while it has no +name and no GV, it is I a protosub and may be directly invoked. + +=item * + +C may point to an AV containing CV pointers to each of +the C blocks defined on the class. If the class has a superclass, this +array will additionally contain duplicate pointers of the CVs of its parent +class. The AV is created lazily the first time an element is pushed to it; it +is valid for there not to be one, and this pointer will be C in that +case. + +The CVs are stored directly, not via RVs. Each CV will be an anonymous real +function. + +=item * + +C will point to a C containing Cs, +each being one defined field of the class. They are stored in order of +declaration. Note however, that the index into this array will not necessarily +be equal to the C of each field, because in the case of a subclass, +the array will begin at zero but the index of the first field in it will be +non-zero if its parent class contains any fields at all. + +For more information on how individual fields are represented, see L. + +=item * + +C gives the field index that will be assigned to the +next field to be added to the class. It is only useful at compile-time. + +=item * + +C may point to an HV which maps field C<:param> attribute +names to the field index of the field with that name. This mapping is copied +from parent classes; each class will contain the sum total of all its parents +in addition to its own. + +=back + +=head2 Fields + +A field is still fundamentally a lexical variable declared in a scope, and +exists in the C of its corresponding CV. Methods and other +method-like CVs can still capture them exactly as they can with regular +lexicals. A field is distinguished from other kinds of pad entry in that the +C macro will return true on it. + +Extra information relating to it being a field is stored in an additional +structure accessible via the C macro on the padname. This +structure has the following fields: + + PADOFFSET fieldix; + HV *fieldstash; + OP *defop; + SV *paramname; + bool def_if_undef; + bool def_if_false; + +=over 4 + +=item * + +C stores the "field index" of the field; that is, the index into the +instance field array where this field's value will be stored. Note that the +first index in the array is not specially reserved. The first field in a class +will start from field index 0. + +=item * + +C stores a pointer to the stash of the class that defined this +field. This is necessary in case there are multiple classes defined within the +same scope; it is used to disambiguate the fields of each. + + { + class C1; field $x; + class C2; field $x; + } + +=item * + +C may store a pointer to a defaulting expression optree for this field. +Defaulting expressions are optional; this field may be C. + +=item * + +C may point to a regular string SV containing the C<:param> name +attribute given to the field. If none, it will be C. + +=item * + +One of C and C will be true if the defaulting +expression was set using the C or C<||=> operators respectively. + +=back + +=head2 Methods + +A method is still fundamentally a CV, and has the same basic representation as +one. It has an optree and a pad, and is stored via a GV in the stash of its +containing package. It is distinguished from a non-method CV by the fact that +the C macro will return true on it. + +(Note: This macro should not be confused with the one that was previously +called C. That one does not relate to the class system, and was +renamed to C to avoid this confusion.) + +There is currently no extra information that needs to be stored about a method +CV, so the structure does not add any new fields. + +=head2 Instances + +Object instances are represented by an entirely new SV type, whose base type +is C. This should still be blessed into its class stash and wrapped +in an RV in the usual manner for classical object. + +As these are their own unique container type, distinct from hashes or arrays, +the core C function returns a new value when asked about +these. That value is C<"OBJECT">. + +Internally, such an object is an array of SV pointers whose size is fixed at +creation time (because the number of fields in a class is known after +compilation). An object instance stores the max field index within it (for +basic error-checking on access), and a fixed-size array of SV pointers storing +the individual field values. + +Fields of array and hash type directly store AV or HV pointers into the array; +they are not stored via an intervening RV. + +=head1 API + +The data structures described above are supported by the following API +functions. + +=head2 Class Manipulation + +=head3 class_setup_stash + + void class_setup_stash(HV *stash); + +Called by the parser on encountering the C keyword. It upgrades the +stash into being a class and prepares it for receiving class-specific items +like methods and fields. + +=head3 class_seal_stash + + void class_seal_stash(HV *stash); + +Called by the parser at the end of a C block, or for unit classes its +containing scope. This function performs various finalisation activities that +are required before instances of the class can be constructed, but could not +have been done until all the information about the members of the class is +known. + +Any additions to or modifications of the class under compilation must be +performed between these two function calls. Classes cannot be modified once +they have been sealed. + +=head3 class_add_field + + void class_add_field(HV *stash, PADNAME *pn); + +Called by F as part of defining a new field name in the current pad. +Note that this function does I create the padname; that must already be +done by F. This API function simply informs the class that the new +field name has been created and is now available for it. + +=head3 class_add_ADJUST + + void class_add_ADJUST(HV *stash, CV *cv); + +Called by the parser once it has parsed and constructed a CV for a new +C block. This gets added to the list stored by the class. + +=head2 Field Manipulation + +=head3 class_prepare_initfield_parse + + void class_prepare_initfield_parse(); + +Called by the parser just before parsing an initializing expression for a +field variable. This makes use of a suspended compcv to combine all the field +initializing expressions into the same CV. + +=head3 class_set_field_defop + + void class_set_field_defop(PADNAME *pn, OPCODE defmode, OP *defop); + +Called by the parser after it has parsed an initializing expression for the +field. Sets the defaulting expression and mode of application. C +should either be zero, or one of C or C depending +on the defaulting mode. + +=head3 padadd_FIELD + + #define padadd_FIELD + +This flag constant tells the C family of functions that the +new name should be added as a field. There is no need to call +C; this will be done automatically. + +=head2 Method Manipulation + +=head3 class_prepare_method_parse + + void class_prepare_method_parse(CV *cv); + +Called by the parser after C but immediately before doing +anything else. This prepares the C for parsing a method; arranging +for the C test to be true, adding the C<$self> lexical, and any +other activities that may be required. + +=head3 class_wrap_method_body + + OP *class_wrap_method_body(OP *o); + +Called by the parser at the end of parsing a method body into an optree but +just before wrapping it in the eventual CV. This function inserts extra ops +into the optree to make the method work correctly. + +=head2 Object Instances + +=head3 SVt_PVOBJ + + #define SVt_PVOBJ + +An SV type constant used for comparison with the C macro. + +=head3 ObjectMAXFIELD + + SSize_t ObjectMAXFIELD(sv); + +A function-like macro that obtains the maximum valid field index that can be +accessed from the C array. + +=head3 ObjectFIELDS + + SV **ObjectFIELDS(sv); + +A function-like macro that obtains the fields array directly out of an object +instance. Fields can be accessed by their field index, from 0 up to the maximum +valid index given by C. + +=head1 OPCODES + +=head2 OP_METHSTART + + newUNOP_AUX(OP_METHSTART, ...); + +An C is an C which must be present at the start of a +method CV in order to make it work properly. This is inserted by +C, and even appears before any optree fragment +associated with signature argument checking or extraction. + +This op is responsible for shifting the value of C<$self> out of the arguments +list and binding any field variables that the method requires access to into +the pad. The AUX vector will contain details of the field/pad index pairings +required. + +This op also performs sanity checking on the invocant value. It checks that it +is definitely an object reference of a compatible class type. If not, an +exception is thrown. + +If the C field includes the C flag, this indicates +that the op begins the special C CV. In this case it +should additionally take the second value from the arguments list, which +should be a plain HV pointer (I, not via RV). and bind it to the +second pad slot, where the generated optree will expect to find it. + +=head2 OP_INITFIELD + +An C is only invoked as part of the C +CV during the construction phase of an instance. This is the time that the +individual SVs that make up the mutable fields of the instance (including AVs +and HVs) are actually assigned into the C array. The +C and C private flags indicate whether it is +creating an AV or HV; if neither is set then an SV is created. + +If the op has the C flag it expects to find an initializing value +on the stack. For SVs this is the topmost SV on the data stack. For AVs and +HVs it expects a marked list. + +=head1 COMPILE-TIME BEHAVIOUR + +=head2 C Phasers + +During compiletime, parsing of an C phaser is handled in a +fundamentally different way to the existing perl phasers (C, etc...) + +Rather than taking the usual route, the tokenizer recognises that the +C keyword introduces a phaser block. The parser then parses the body +of this block similarly to how it would parse an (anonymous) method body, +creating a CV that has no name GV. This is then inserted directly into the +class information by calling C, entirely bypassing the +symbol table. + +=head2 Attributes + +During compilation, attributes of both classes and fields are handled in a +different way to existing perl attributes on subroutines and lexical +variables. + +The parser still forms an C optree of C nodes, but these +are passed to the C or C +functions. Rather than using a class lookup for a method in the class being +parsed, a fixed internal list of known attributes is used to find functions to +apply the attribute to the class or field. In future this may support +user-supplied extension attribute, though at present it only recognises ones +defined by the core itself. + +=head2 Field Initializing Expressions + +During compilation, the parser makes use of a suspended compcv when parsing +the defaulting expression for a field. All the expressions for all the fields +in the class share the same suspended compcv, which is then compiled up into +the same internal CV called by the constructor to intialize all the fields +provided by that class. + +=head1 RUNTIME BEHAVIOUR + +=head2 Constructor + +The generated constructor for a class itself is an XSUB which performs three +tasks in order: it creates the instance SV itself, invokes the field +initializers, then invokes the ADJUST block CVs. The constructor for any class +is always the same basic shape, regardless of whether the class has a +superclass or not. + +The field initializers are collected into a generated optree-based CV called +the field initializer CV. This is the CV which contains all the optree +fragments for the field initializing expressions. When invoked, the field +initializer CV might make a chained call to the superclass initializer if one +exists, before invoking all of the individual field initialization ops. The +field initializer CV is invoked with two items on the stack; being the +instance SV and a direct HV containing the constructor parameters. Note +carefully: this HV is passed I, not via an RV reference. This is +permitted because both the caller and the callee are directly generated code +and not arbitrary pure-perl subroutines. + +The ADJUST block CVs are all collected into a single flat list, merging all of +the ones defined by the superclass as well. They are all invoked in order, +after the field initializer CV. + +=head2 C<$self> Access During Methods + +When C is called, it arranges that the pad of +the new CV body will begin with a lexical called C<$self>. Because the pad +should be freshly-created at this point, this will have the pad index of 1. +The function checks this and aborts if that is not true. + +Because of this fact, code within the body of a method or method-like CV can +reliably use pad index 1 to obtain the invocant reference. The C +opcode also relies on this fact. + +In similar fashion, during the C the next pad slot is +relied on to store the constructor parameters HV, at pad index 2. + +=head1 AUTHORS + +Paul Evans + +=cut diff --git a/win32/pod.mak b/win32/pod.mak index 597e688aa299..f5b9830aaac9 100644 --- a/win32/pod.mak +++ b/win32/pod.mak @@ -98,6 +98,7 @@ POD = perl.pod \ perlcall.pod \ perlcheat.pod \ perlclass.pod \ + perlclassguts.pod \ perlclib.pod \ perlcommunity.pod \ perldata.pod \ @@ -274,6 +275,7 @@ MAN = perl.man \ perlcall.man \ perlcheat.man \ perlclass.man \ + perlclassguts.man \ perlclib.man \ perlcommunity.man \ perldata.man \ @@ -450,6 +452,7 @@ HTML = perl.html \ perlcall.html \ perlcheat.html \ perlclass.html \ + perlclassguts.html \ perlclib.html \ perlcommunity.html \ perldata.html \ @@ -626,6 +629,7 @@ TEX = perl.tex \ perlcall.tex \ perlcheat.tex \ perlclass.tex \ + perlclassguts.tex \ perlclib.tex \ perlcommunity.tex \ perldata.tex \