A Perl Tutorial: Super-Basics
That First Line, And Other Details
Grossly oversimplifying: A Perl program belongs in a file (which you have made executable with some command like
chmod +x filename
) in your bin directory, where the first line of the file is
#!/usr/bin/perl
Everything after that first line is Perl, except for lines beginning with #, which are comments. Every command in Perl should end in a semi-colon; missing semi-colons account for 90% of novice user errors, and 80% of expert user errors. So Perl programs look like this:
#!/usr/bin/perl
command;
command;
# this is a comment, you can say whatever you want here
command;
Once you've written your Perl program, you run it by typing the name of the file. If you've made some mistake in your "code" (that's geek for "program," as if "program" weren't geeky enough) Perl will refuse to run it, spitting out some usually-informative error message instead.
Enough abstraction; on to some examples.
The Print Statement
The most basic thing you're going to want a perl script to do is tell you things--that is, usually, send output to the terminal. Here's a fully functioning one-line Perl script:
#!/usr/bin/perl
print "1 + 2 = 3\n";
Run it, and here's what it prints on your screen:
1 + 2 = 3
The \n is an end-of-line statement; you'll spend a lot of time in Perl sticking those in and taking them out, unless you want all of your output to come all mushed together on one line and generally misbehaving.
Variables
You can store data in "variables", so that you can tinker with them and use them at will. Here's the above program using variables:
#!/usr/bin/perl
$x = 1;
$y = 2;
$z = 3;
print "$x + $y = $z\n";
Any variables in quotes will be interpolated--that is, they'll be translated into the data they contain. So when you run that program, you still get
1 + 2 = 3
Assigning to Variables
The print statement above cheated, though; it didn't really add the numbers. You can add $x and $y with a line like this:
$z = $x + $y;
That line really performs the math, and stores the result in $z. So a program like this
#!/usr/bin/perl
$x = 1;
$y = 2;
$z = $x + $y;
print "$x + $y = $z\n";
will once again print out
1 + 2 = 3
Getting Input From the User
Variables give your program flexibility. You can now change that program so it adds any two numbers you type in. The way to get input from the keyboard into a running program is with a line like this:
$x = <>;
When your program encounters a line like that, it will stop and wait for numbers (or letters, or anything else) to be typed in at the keyboard, until RETURN is hit. Then it will assign everything you just typed into $x, and the program can then play with it. So this program
#!/usr/bin/perl
print "Type in a number: ";
$x = <>;
chop($x);
print "Type in a number: ";
$y = <>;
chop($y);
$z = $x + $y;
print "$x + $y = $z\n";
will wait for two numbers, add them up, and then print the result.
The one strange thing in that program is the chop command. When the value of $x comes in from the keyboard, it's stored along with the \n character from hitting RETURN. So if you typed 35 and then hit RETURN, the value of $x became
35\n
Chop just chops the last character off of a variable. Its most common use is for getting rid of that nasty "\n"--which, believe it or not, counts as one character. (Even more to the point: it counts as a charcter, which means "35\n" is not anything like "35". You can't add with it, you can't print it without getting an unwanted line break, and it won't do anything else you might expect. Chopping is good Perl practice.)
Subroutines
See how, in the above program, the first six lines basically do the same thing twice? Computer programs are designed to perform repetitive tasks for you; if you find yourself typing the same thing over and over again in a program, it's a good clue that you should condense your program with a subroutine.
A subroutine is like a command you define yourself. So let's make a version of the program that uses a subroutine called getnumber that gets a number from the keyboard:
#!/usr/bin/perl
$x = &getnumber;
$y = &getnumber;
$z = $x + $y;
print "$x + $y = $z\n";
# The program ends there...
# the subroutine is just tacked on at the end:
sub getnumber {
print "Type in a number: ";
$number = <>;
chop($number);
$number;
}
Everything inside the curly brackets is the subroutine. Now that section of the program will be run every time you use the command
&getnumber;
in your program. In this case, the subroutine "getnumber" returns a variable which is then assigned to $x and $y. Not all subroutines are designed to return variables. This one returns the contents of $number because of the line
$number;
So the command
$x = &getnumber;
first runs the subroutine, and then, when it sees the line $number; it exits the subroutine, and gives $x the value of $number.
But like I said, not all subroutines are designed to return a value. For instance, you could have a subroutine that just printed out a standard warning:
sub warning {
print "WARNING: The program is about to ask you to type in a number!\n";
}
So the line
&warning;
will now print out
WARNING: The program is about to ask you to type in a number!
every time you use it.
You can stick the code for subroutines anywhere you want in your program, but the traditional place is to group them all together at the end of the program, after the main section.
While
Right now the program just executes once and then kicks you out. Many times you'll want your program to do something over and over again; you can do this with a control structure that creates a "loop". There are many types of control structures; we'll start you off with a simple one called while. It basically means: do this set of commands while this statement is true. For instance, here's a code fragment that loops over and over as long as you type in the number "15":
$x = 15;
while($x == 15) {
$x = &getnumber;
}
Two things to note: 1) $x == 15 means "$x is equal to 15". It's a comparative statement; it should never be confused with $x = 15, which assigns the value 15 to $x. Confusing == and = is one of the most common novice errors; in this case, it would create an infinite loop.
2) The first line of the code fragment sets the value of $x to 15. If you didn't do that, $x wouldn't equal 15 the first time around, so the program would never enter the loop at all, and would never ask for a number.
Here's our adding program, with a loop so you can use it over and over:
#!/usr/bin/perl
$doagain = "yes";
while($doagain eq "yes") {
$x = &getnumber;
$y = &getnumber;
$z = $x + $y;
print "$x + $y = $z\n";
print "Do it again? (yes or no) ";
$doagain = <>;
chop($doagain);
}
(I left out the subroutine to save space here, but you'd have to include it if you wanted that program to run.) Note that the comparative statement this time uses "eq" rather than "==". == compares for numerical values; eq is for variables made up of letters, numbers, and other characters (called "string" variables.)
Pattern Matching and Regular Expressions
If you were paying attention, you noticed a huge loophole in the programs above: there's nothing to prevent you from typing in a string variable when you're supposed to be typing in a number. You can type in "dog" and "cat", and the program will try to add "dog" and "cat" (which, if you're curious, gives a result of zero.) You need some way to check to make sure that the person actually typed in numbers; then, if they didn't, you can ask them again (with a looping control structure), until they get it right.
Welcome to the concepts of pattern matching and regular expressions, two of Perl's powerful text-processing tools. Let's start with a simple pattern first: one letter. If you want to test a variable to see if it contains the (lower-case) letter "z", use this syntax:
if ($x =~ /z/) {
print "$x has a z in it!\n";
}
Let's take that apart: if is just like while, except it only checks once (that is, it won't loop around again and again.) Like while, it will execute every command inside the curly brackets if the statement inside the parentheses is true.
The statement inside the parentheses works like this: =~ makes a comparison between $x and whatever's inbetween the two slashes; in this case, if there's a z anywhere inside $x, then the statement is true.
Let's up the ante, and match only if $x begins with the letter z:
if ($x =~ /^z/) {
print "$x begins with a z!\n";
}
^z is a regular expression; the carat (^) stands for the beginning of the string. Thus, the matching statement has to find a z immediately following the beginning of the string in order to be true.
How about words that begin with z and end with e? Use the regexp
/^z.*e$/
The $ stands for the end of the string; the period stands for "any character whatsoever"; combined with the asterisk, it means "zero or more characters." Without the asterisk,
/^z.e$/
would mean "z followed by one character followed by e."
There's a lot of different regular expressions. For instance,
/^z.+e$/
means "z followed by at least one character, followed by e."
/^z\w*e$/
means "z followed by zero or more word characters followed by e"--that is, "z!e" wouldn't match.
So to make sure that somebody's typing in numbers in our adding program, and not words, make the subroutine getnumber look like this:
sub getnumber {
$number = "blah";
while($number =~ /\D/){
print "Enter a number ";
$number = <>;
chop($number);
}
$number;
}
"\D" is the regular expression for non-digits; if any character in $number is not 0-9, the expression won't match, and you'll get asked to enter a number again.
Note how we had to set $number to include a non-digit ($number = "blah") to get inside the loop the first time around.
Substitution
You can transform variables at will or whim using regular expressions, by use of the substitution command. This command will change the letters "dog" to "cat" if they occur in the variable $x:
$x =~ s/dog/cat/;
Actually, that will only change the last occurence (so "dogdog" would become "dogcat".) To make the change "global", add a g at the end:
$x =~ s/dog/cat/g;
This will change "dig" and "dog" (but not "doug") to "cat":
$x =~ s/d.g/cat/g;
A Little About Arrays
A single number or string can be assigned to a scalar variable, in the form
$x = 45;
If you have a bunch of variables and want to store them together, that's an array. Here's how you assign the numbers 3, 5, 7, and 9 into an array called @dog:
@dog = (3, 5, 7, 9);
Although the entire array is referred to as @dog, an individual element--let's say, the first one--is $dog[0]. So $dog[0] equals 3, and $dog[1] equals 5. (Throughout history, programmers have begun their lists with the number zero, and throughout history, this has caused nothing but grief and turmoil.)
Note that the variable $dog and the array elements $dog[0]...$dog[3] are totally unrelated. $dog could be "zebra" or 87000, and it would never affect any of the elements of @dog.
That's just the barest hint of what arrays can do. I'm only bringing them up now so you won't freak out if I use them in an example.
More Control Structures
If/Then/Elsif/Else
You can use if all by itself, in lines like
print "You're 28!\n" if ($age == 28);
That's really shorthand for
if ($age == 28) {
print "You're 28!\n";
}
which means, "if $age equals 28, then do everything inside the curly brackets." The if/then control structure can be extended with elsif and else:
if ($age == 28) {
print "You're 28!\n";
} elsif ($sex eq "f") {
print "You're a female!\n";
} elsif ($sex eq "m") {
print "You're a male!\n";
} else {
print "You're a mystery to me...";
}
Only the first true statement will match. So if ($age == 28) and ($sex eq "f"), you'll get the output
You're 28!
else is the default; its commands will be executed only if none of the preceding statements in the control structure were true.
For
If you want to do something X number of times, the traditional method is with a for loop. For instance, to print the numbers 1 through 10:
#!/usr/bin/perl
for($x=1; $x<=10; $x++) {
print "$x\n";
}
The three parts inside the for() translate as: a) set the counter, $x, to equal one; b) check to see if $x is less than or equal to ten, and if it is, execute all the commands inside the curly brackets; c) once you've executed the commands inside the curly brackets, add 1 to $x ($x++ means "add one to x") and then go back to b).
There's fancier things you can do with a for loop, but we'll leave it at that for now. Suffice it to say that the three statements inside the parens take the form:
Opening statement (execute this command the first time through the loop)
Check statement (exit the loop when this is no longer true)
Command (execute this command after each time through the loop)
As with the while control structure, the check statement has to be true the first time it is encountered, or the commands inside the curly brackets won't be executed even once.
Unless
Unless is the opposite of if, and you can use it just like if:
print "Dog isn't cat!\n" unless ($dog eq $cat);
or
unless ($dog eq $cat) {
block of commands
...
}
Until
Until is the opposite of while; the loop executes until the statement becomes true. Like:
until($x == "15"){
print "You have to type in 15!\n";
$x = &getnumber;
}
Foreach
OK, remember how arrays work? I explained earlier that they're a bunch of variables stored together, like ((@array) = (1,5,7,9)); but I didn't explain what advantage you might possibly get by storing them together. Well, here's one: foreach is a cool control structure that can access the elements of an array, one by one, in order:
$seven = 7;
(@array) = (1,5,$seven,"catapult");
foreach $element (@array) {
print "$element\n";
}
So, for instance, if your array "@movies" was loaded up with the names of the 6000 movies in your collection of videotapes, you could
foreach $title (@movies){
print "$title\n" if ($title =~ /Tonight/);
}
to find out how many of your movies have the word "Tonight" in the title.
Fun stuff like that can keep a Perl programmer up all night, I guarantee. Zzzzz!
File Manipulation
Filehandles
You can write to a file as easily as you can write to the terminal. The first step (almost the only step) is to open the file with the open command. Here's an example:
open(DOG,">/home/scotty/data/dogs");
DOG is the "filehandle"--the name by which you'll refer to the open file from now on. It's customary to use all caps for filehandles. The other thing inside the parens is the full pathname of the file; it's prefixed with a > so you can write to it. (Without the > you could only read from it--we'll talk about that in a second.)
Now to write a line of text to the file, just do it like this:
print DOG "This line goes into the file and not to the screen.\n";
Pretty easy, huh? I love Perl.
Since failure to successfully open a file can cause your program to go batty, it's a good idea to have the program exit gracefully if it fails to open the file. To do that, use this syntax for the open command:
open(DOG,">/home/scotty/data/dogs") || die "Couldn't open DOG.\n";
Now it will either open the file, or quit the program with an explanation why.
To read from a file, open it without the >:
open(DOG,"/home/scotty/data/dogs") || die "Couldn't open DOG.\n";
(If the file doesn't exist, the program will quit.) Once the file is open, there are two common ways to get the information out. You can do it one line at a time, with lines like this:
$x = ;
That copies the first line from DOG (or the next line, if you've already taken some lines out) and assigns it to $x. The syntax is just like the <> we used earlier to get input from the keyboard; sticking DOG in there just tells Perl to get the input from the open file instead.
You can also do it in a loop. This program prints out all the contents of DOG:
#!/usr/bin/perl
open(DOG,"/home/scotty/data/dogs") || die "Couldn't open DOG.\n";
while() {
print;
}
Notice how we didn't specify a variable to store each line from DOG in, and we didn't specify anything for the print command to print? This is a really important concept I should have introduced earlier: Perl features a default variable called $_. Basically, if you don't specify which variable you want to use, or if you use a command like print by itself, Perl assumes you want to use $_.
Some other common commands that can assume you're talking about $_:
s/dog/cat/g;
That's a valid line all by itself; it means "substitute all occurences of 'dog' in the variable $_ with 'cat'." Another popular type of construction is
print if (/dog/);
That means "print $_ if $_ has 'dog' in it."
$_ shows up everywhere in Perl, just to make your life easier. For instance, foreach will store its elements in $_ if you're too lazy to name a variable yourself:
foreach (@array) {
print if (/Tonight/);
}
You can assign from $_ or manipulate it just like any other variable, with commands like
$_++;
$x = $_;
You just have to do it before the next time you overwrite $_ with a command like
--it's a very temporary storage space.
When you're done with a file, don't forget to close it with the close command:
close(DOG);
Filename Globbing
Perl can read all the filenames in a directory (/home/scotty/bin in the following example) with this syntax:
while($x = ) {
...
}
One obvious and powerful use of this "filename globbing" is a loop like this:
while($x = ) {
open(FILE,"$x") || die "Couldn't open $x for reading.\n";
...
}
Thus, the following simple program will print all lines containing the word "dog" (along with the names of the files they came from) in the /home/scotty/bin directory:
#!/usr/bin/perl
while($x = ) {
open(FILE,"$x") || die "Couldn't open $x for reading.\n";
while(){
if(/dog/) {
print "$x: $_";
}
}
}
Opendir
Here's a key point of Perl philosophy: Perl tries to never limit you. Arrays can be as big as you want, strings as long as you want, and strings can contain anything. If you try to open a file, and the file doesn't open, the program hums right along; if a line is expecting three variables back from a subroutine, as in
($one, $two, $three) = &some_routine;
and it only gets back one, well, it won't care. It'll just pad the other variables and move along.
Anyways, Perl doesn't have limits, but Unix sometimes does. A key example is the difference between filename globbing, which relies on the Unix shell's built-in functions, and Perl's own opendir function.
Try to open a directory with 3000 files using filename globbing, and your program will probably crash. But you can open a directory of 100,000 files with opendir (as long as your machine can handle it...)
Anyways, here it is:
opendir(DIR,"/tmp");
while($file = readdir(DIR)){
....
}
Note that opendir will try to open every file, including the enigmatic . and .. files. (I can't imagine why, but I'm sure there's a reason.) So here's a variation that will only attempt to open files whose names don't begin with dots:
opendir(DIR,"/tmp");
while($file = readdir(DIR)){
next if ($file =~ /^\./);
....
}
Using Unix from Within Perl
Writing to a Process
Watch this: this is so cool:
open(MAIL,"|/usr/lib/sendmail [email protected]");
print MAIL "Subject: This is a subject line\n\n";
print MAIL "This is an empty body.\n";
close(MAIL);
What did that do? It wrote to a process rather than a file. That little pipe (|) on the first line tells any output to the filehandle MAIL to go to a sendmail process; when you close up the process, with the close statement--presto, you've sent yourself a mail message.
This is easily one of Perl's most powerful features. For instance, if you have a Perl script that generates a list of old files (at work, say) and the Internet addresses of the co-workers who own the files, you can send a message to every single one of them:
while(....){
....
open(MAIL,"|/usr/lib/sendmail $owner");
print MAIL "Subject: Notification of Ancient Fileness\n\n";
print MAIL "You own an ancient file named $ancientfile.\n";
print MAIL "Would you like me to do anything about it?\n";
close(MAIL);
}
Yup: variables.
System Calls
A lot of the time, Unix can do something faster or easier than your Perl scripts can. Or maybe you just don't want to rewrite an entire program that Unix already provides... Perl enables you to execute a shell command from within a Perl program via the system() function: for instance, to allow a user to edit a file directly:
system("/usr/bin/emacs $file")||die;
If you don't know what Emacs is, or aren't that familiar with Unix programs, then just forget you ever saw this section. Mostly it would just lead you to do extremely lazy shortcut things anyways.
Backticks
You can also fire up a shell process with the so-called 'backticks':
@lines = `/usr/local/bin/lynx -source http://www.some-web-site.com`;
Now you have the front page of some website in an array, one line neatly stored in each array element. Imagine what you can do with that...
@lines = `/usr/local/bin/lynx -source http://www.some-web-site.com`;
foreach (@lines) {
print if (/href/i);
}
(And yes, you can write perfectly wonderful spiders, robots and web catalogs with Perl... Just, please, BE RESPONSIBLE. Read up on robot ethics before you even think about it.)
One important difference between a system() call and backticks: if you fire up a program with a line like this
`/home/scotty/bin/some_program`;
then your original program won't wait until some_program is finished before hastening on to the next line. This can be quite powerful, but it's usually unnecessary and kinda scary until you get the hang of it. So to make sure that your original program waits obediently until some_program is finished up, do this:
$result = `/home/scotty/bin/some_program`;
$result is a throaway value here; its only purpose is to force the original program to wait for some_program to exit.