Perl – Usage Of Sigils And ()/[]/{}

Anyone interested in using Perl quickly grasps the difference between what a scalar, array and a hash are (signified by $, @ and % respectively). However what is often less clear is why $ is used when accessing arrays and hashes. Also when does one use (), [] and {}, and what are the differences? Then there are typeglobs (*)…

Hopefully this post will help to demystify the above. Oh and in case you were wondering, the term sigil refers to $, @, %, & and *, i.e. punctuation characters that signify specific things.


Sigil Context

Ok so the following:

    my $password;
    my @lines_of_text;
    my %account_details;
    

Declares three variables: a scalar, an array and then lastly a hash (in that order). So far so good.

Scalars are simple variables. They can contain numbers (both floating or integer), strings, file handles, arbitrary binary data or references to other data items.

Arrays and hashes on the other hand are used to store scalars in groups or containers and allow for their easy retrieval. One might first think that only being able to store scalars in a container would be limiting. However one can store references to other Perl variables in containers as references are themselves just another form of scalar.

Just a quick reminder: Arrays allow one to access items via a one up number, like a simple numbered list (for this reason they are also known as lists). Whilst hashes allow one to access items via an arbitrary key. Hashes are equivalent to dictionaries in Python and maps in C++. They are also sometimes called associative arrays.

Anyway back to our variables.

When referring to the $password scalar in code all seems to make sense. But then you come across stuff like this:

    $lines_of_text[1] = "This is a line of text";
    $account_details{"John Smith"} = "overdrawn:closed";
    

These are lists and hashes, surely I should be using @ and % respectively?

In response to that question I would ask what is being accessed inside that array or hash? A scalar, and so hence the use of the $. Or to put it another way what is immediately to the right of the $ is an expression that yields a scalar value and so that is denoted by the $.

Ok so I use $ when accessing single elements in a container. When would I use @ or % with a container?

Well for arrays (@) typically you would use them when dealing with whole arrays or slices of those arrays (where you refer to more than one array element at a time). For example:

    @weekday = ("Monday", "Tuesday", "Wednesday", "Thursday", "Friday");
    @weekend = ("Saturday", "Sunday");
    @week = (@weekday, @weekend);
    @copy_of_week = @week;
    $day = $weekday[2];
    @three_day_week = @weekday[0, 1, 2];
    

In assigning stuff to @week we are creating a list made up from joining together the lists @weekday and @weekend. Likewise @copy_of_week is a simple copy of one list to another. In both cases we are referring to the whole container which in turn contains multiple scalars. Incidentally if a container only contains one element then you still refer to the whole container with @ and not $. Making you do otherwise would be sadistic!

The assignment to $day uses $weekday as we are only accessing one scalar element.

However in the assignment to @three_day_week we use @ as the subsequent index expression specifies a list of elements to retrieve and so the whole expression yields a list of scalar values and not just one scalar.

Ok so what about a hash slice? For example:

    %item_cost = ("soap" => 50, "beans" => 90, "bread" => 85);
    @costs = @item_cost{"soap", "bread"};
    

Keep calm, breathe deeply. It does actually make sense. Firstly, what do you think @item_cost{"soap"} would give you? The scalar value of 50. I.e. you gave your hash a key and got back its corresponding value. Ok so what do you think would happen if you gave a hash a list of keys to look up? You would expect to get back a value for each key specified. In fact you get back a list of corresponding values for each key that was given. Sounds very reasonable. But remember it is a list that you get back and not a hash nor a scalar, hence the use of @.

The assignment to @costs could have been done like this:

    @costs = ($item_cost{"soap"}, $item_cost{"bread"});
    

However this is rather long winded.

So to sum up:

When accessing a container, the sigil used relates to what is returned from that container and not to the type of container being accessed.

So if you are using a simple index or key that refers to only one element then use $ as only one value will be returned. If you are using a list slice or multiple keys then use @ as you will get a list of matching values returned. If no index or keys are given then the whole container will be returned and so one must use @ or % depending upon the type of container being accessed.


Use of (), [] and {}

In short one uses () to specify list literals. [] are used to index into lists and to specify a reference to a list literal. Likewise {} are used to index into hashes and to specify a reference to a hash literal.

()

If you want to assign a literal value to a list you can do:

    @weekday = ("Monday", "Tuesday", "Wednesday", "Thursday", "Friday");
    

The parentheses are simply used to delimit a list literal.

If you want to assign a literal value to a hash then again you use a list literal. Although a hash is a collection of key-value pairs, this is represented as a list with the key-value pairs listed in sequence. For example:

    %item_cost = ("soap", 50, "beans", 90, "bread", 85);
    %item_cost = ("soap" => 50, "beans" => 90, "bread" => 85);
    

The above two assignments are equivalent. One can think of => being an alias for ,. I think most of you would agree that the second form makes it much clearer as to what is going on.

[]

Square brackets are used to specify indices when fetching elements out of a list. For example:

    $a_day = $weekday[2];
    @days = @weekday[0 .. 2];
    @more_days = @weekday[0, 2, 3];
    

However they are also used to denote a reference to a list literal. For example:

    $weekday = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"];
    

This may seem to be completely at odds with what I have said before. Why do I not assign to @weekday instead of $weekday? The clue is in the square brackets and my mention of references.

In Perl references simply refer to other Perl data items; be they scalars, arrays, hashes or typeglobs. You could think of a reference as a pointer if you like, but a pointer that is also reference counted (like everything else in Perl). So probably the best analogy would be a C++ reference counting smart pointer.

A single reference refers to one thing at a time and is stored as a scalar entity. When you use [] that basically creates a list from your literal, as did (), but then it takes its reference and returns that instead of the list itself. Hence the assignment to $weekday instead of @weekday.

{}

Curly braces are used to specify keys when fetching values out of a hash. For example:

    $cost = $item_cost{"soap"};
    @costs = $item_cost{"beans", "bread"};
    

Like [], {} are also used to denote a reference to a hash literal. For example:

    $item_cost = {"soap" => 50, "beans" => 90, "bread" => 85};
    

This does the same thing as using [] with lists. It creates a hash using the specified literal and then returns a reference to it instead of the hash itself.

Ok so why not just achieve the above by doing:

    $item_cost = ["soap" => 50, "beans" => 90, "bread" => 85];
    

After all I said that one assigns a hash literal by doing:

    %item_cost = ("soap" => 50, "beans" => 90, "bread" => 85);
    

The answer is that with the above assignment to the hash, Perl can determine the type of the target container and can construct the hash from the literal accordingly. When assigning a reference to a scalar it has no such clue. Hence the use of {} to make this unambiguous.

When wishing to create a reference to a literal list or hash, simply use the brackets that would be used to index or key into that type of container.


Typeglobs

Typeglobs are denoted by the * sigil. These are basically global symbol table entries and hold scalar, array, hash, subroutine, format and I/O handle information for a given symbol name.

So for instance. If you had two variables declared as $name and @names, although these are separate variables, they would be stored under the same typeglob – one in its scalar field and the other in the array field.

Typeglobs were commonly used when dealing with old style file handles as in:

    open(FILE_HANDLE, "/etc/passwd");
    read(FILE_HANDLE, $buffer, 10);
    close(FILE_HANDLE);
    

However thankfully we do not have to do that any more as we can just use an ordinary scalar as a file handle.

Most of the time you will have no use for typeglobs, but you can do neat things with them. For instance quite often when writing a class you may have several constructors and then want to alias one of them to new(). You can easily do this with typeglobs. For example:

    sub new_with_file_name($$);
    sub new_with_raw_data($$);
    *new = *new_with_file_name;
    

Defines an alias for new_with_file_name() called new(). In actual fact it defines an alias for anything named new_with_file_name but it is unlikely that there is going to be another global identifier in the same package with that name and so we can get away with it. However if you just wanted to alias the function name and nothing more then this would be a precise and desirable way to do it:

    *new = \&new_with_file_name;
    

Incidentally, if you did not know, & is the sigil for subroutines. Normally you do not need to specify & unless you are taking a reference.

For the most part you will not need to worry about typeglobs. They are the sort of thing that does not make a lot of sense until you come across a case where you really need to use one. I have used them when dealing with the IPC::Open3 module, introspection and for doing the above subroutine aliasing trick.

Happy coding :-).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s