Web Client Programming with Perl-Chapter 6: Example LWP Programs-P1

41 469 0
Web Client Programming with Perl-Chapter 6: Example LWP Programs-P1

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Chapter 6: Example LWP Programs-P1 This chapter presents LWP programs that are more robust and feature-rich than the examples shown in previous chapters While Chapter 5, The LWP Library, focused on teaching LWP and explained how LWP objects fit together, this chapter shows you some sample LWP programs with more user-friendly options and features We present three broad categories of web client programs:  Simple clients programs that perform actions for users in real time, usually with a finite list of URLs to act upon In this section, we present LWP versions of the hcat and hgrepurl programs that were presented in Chapter 4, The Socket Library  Periodic clients robots that perform a request repeatedly, with some delay between each request Periodic clients typically request the same resource over and over, or a different resource in a predictable manner For example, a client may request 0100.gif at a.m., 0200.gif at a.m, etc A periodic client might check some data and perform action when a condition is met In this section, we present a program that periodically checks the status of a Federal Express document  Recursive clients robots that follow hyperlinks or other references on an HTML page In this section, we present a program that looks for bad links in a web site The boundaries between these categories are not set in stone It is possible to write a periodic client that also happens to be a recursive client Or a simple client might become periodic if the document indicates that the page should be refreshed every 15 minutes We're not trying to classify all programs into one category or another; these categories are given as a way to identify distinct behaviors that a client may exhibit The examples in this chapter all use a simple command-line interface In Chapter 7, Graphical Examples with Perl/Tk, we have some additional examples with a graphical interface using the Tk extension to Perl Simple Clients Simple clients are programs that perform actions for users in real time, usually with a finite list of URLs to act upon In this section, we'll show LWP versions of the socket-based hcat and hgrepurl programs that were presented in Chapter Hypertext UNIX cat Revisited As you might recall, the sockets version of hcat used the open_TCP( ) function to establish a connection to a web server, and then issued an HTTP request, like "GET / HTTP/1.0" In LWP, many of the details are hidden from the programmer Instead of this: open_TCP(F, $the_url[1], $the_url[2]) print F "GET $the_url[3] HTTP/1.0\n"; print F "Accept: */*\n"; print F "User-Agent: hcat/1.0\n\n"; in LWP, it can be written like this: my $ua = new LWP::UserAgent; $ua->agent("hcat/1.0"); my $request = new HTTP::Request("GET", $path); my $response = $ua->request($request); They both the same thing; they request a document from a user-specified web server and identify themselves in the User-Agent header But one looks a lot cleaner than the other Instead of using the nitty-gritty socket code that talks directly to the web server, you merely describe to LWP what the action should be LWP handles it for you Many things, like handling URL redirection or handling HTTP version differences, will be handled automatically by LWP Also, the following lines in the sockets version of hcat can be replaced: # print out server's response # get the HTTP response line $the_response=; print $the_response if ($all || $response); # get the header data while(=~ m/^(\S+):\s+(.+)/) { print "$1: $2\n" if ($all || $header); } # get the entity body if ($all || $data) { print while (); } In LWP, these lines can be written as: my $code=$response->code; my $desc = HTTP::Status::status_message($code); my $headers=$response->headers_as_string; my $body = $response->content; if ($opt_r || $all) { print "HTTP/1.0 $code $desc\n"; } if ($opt_H || $all) { print "$headers\n"; } if ($opt_d || $all) { print $body; } In addition, we've added proxy support, since it's trivial in LWP: my $ua = new LWP::UserAgent; $ua->agent("hcat/1.0"); # If proxy server specified, define it in the User Agent object if (defined $proxy) { my $url = new URI::URL $path; my $scheme = $url->scheme; $ua->proxy($scheme, $proxy); } The source in its entirety looks like this: #!/usr/local/bin/perl -w use strict; use HTTP::Status; use HTTP::Response; use LWP::UserAgent; use URI::URL; use vars qw($opt_h $opt_r $opt_H $opt_d $opt_p); use Getopt::Std; my $url; my $goterr; After calling all the necessary Perl modules and declaring variables, we process command-line arguments: getopts('hrHdp:'); my $all = !($opt_r || $opt_H || $opt_d); when -r -H -d not set # all=1 if ($opt_h || $#ARGV==-1) { # print help text when -h or no args print_help( ); exit(0); } Then, for any string that remains as a command-like parameter, we treat it as a URL, process it, and print out the result: my $goterr = 0; # make sure we clear the error flag while ($url = shift @ARGV) { my ($code, $desc, $headers, $body)=simple_get('GET', $url, $opt_p); if ($opt_r || $all) { print "HTTP/1.0 $code $desc\n"; } if ($opt_H || $all) { print "$headers\n"; } if ($opt_d || $all) { print $body; } $goterr |= HTTP::Status::is_error($code); } exit($goterr); The print-help( ) routine just prints out a range line and a list of commandline options: sub print_help { print scheme; $ua->proxy($scheme, $proxy); } # Ask the User Agent object to request a URL # Results go into the response object (HTTP::Reponse) my $request = new HTTP::Request($method, $path); my $response = $ua->request($request); # Parse/convert the response object for "easier reading" my $code=$response->code; my $desc = HTTP::Status::status_message($code); my $headers=$response->headers_as_string; if ($response->content =~ /$START(.*?)$END/s) { $self->{'error_as_HTML'} = $1; } else { # couldn't get explanation, use generic one $self->{'error_as_HTML'} = 'Unexpected HTML response from FedEx'; } # couldn't get error explanation And then there are cases when the HTTP response didn't result in a status code of 200 (OK): $self->{'error_as_HTML'} = $response>error_as_HTML; That just about wraps up the FedEx package For the sake of being a good Object-Oriented citizen, a public interface to the FedEx object's settings are available The retrieve_okay( ) method returns true when the HTTP response code was 200 The delivered( ) method returns true if the document was delivered The who_got_it( ) method returns the name of the recipient of a delivered package Finally, the error_info( ) method prints out an HTML error message Now that we've reviewed the important parts of the FedEx package, let's take a look at the complete example Note how one creates a FedEx object and calls it We'll come back to this example and redo it as a graphical client in Chapter 7: #!/usr/local/bin/perl -w use strict; use HTML::FormatText; use HTML::Parse; use vars qw($opt_h $opt_a $opt_e $opt_d $opt_c $opt_p); use Getopt::Std; # URL that handles our FedEx query my $cgi = 'http://www.fedex.com/cgi-bin/track_it'; getopts('ha:e:d:c:p:'); # print help upon request or when arguments are missing if ($opt_h || !($opt_a && $opt_e && $opt_d && $opt_c )) { print_help( ); exit(0); } # my $tracker = new FedEx $cgi, $opt_e, $opt_ p; my $keep_checking = 1; First, we declare local variables, call all necessary modules, get commandline options, etc The body of the program is just a loop that keeps checking the FedEx site until the package is delivered or an error is found: while ($keep_checking) { $tracker->check($opt_a, $opt_c, $opt_d); if ($tracker->retrieve_okay) { if ($tracker->delivered) { print "Tracking number $opt_a was delivered to: ", $tracker->who_got_it, "\n"; $keep_checking = 0; } else { # request was okay, but not delivered wait sleep (60 * 30); } } # sleep 30 minutes Let's else { # request not successful my $html_error_message = $tracker->error_info; my $parsed = parse_html($html_error_message); my $converter = new HTML::FormatText; print $converter->format($parsed); $keep_checking = 0; } } The print_help( ) routine prints a help message, as always: sub print_help { print {'robot'} = new LWP::RobotUA $user_agent_name, $email; $self->{'robot'}->delay(0); requests by hand # we'll delay if ($proxy) { $self->{'robot'}->proxy('http', $proxy); } $self; } sub check { my ($self, $track_num, $country, $date) = @_; $self->{'url'}>query("trk_num=$track_num&dest_cntry=" "$country&ship_date=$date"); my $request = new HTTP::Request 'GET', $self>{'url'}; my $response = $self->{'robot'}>request($request); $self->{'status'} = $response->code( ); if ($response->code == RC_OK) { if ($response->content =~ /Delivered To : (\w.*)/) { # package delivered $self->{'who_got_it'} = $1; $self->{'delivered'} = 1; } # Odd cases when package is delivered but "Delivered To" is blank # Check for delivery time instead elsif ($response->content =~ /Delivery Time : \w.*/) { # package delivered $self->{'who_got_it'} = 'left blank by FedEx computer'; $self->{'delivered'} = 1; } else { # package wasn't delivered $self->{'delivered'} = 0; # if there isn't a "Delivered To : " field, something's wrong # error messages seen between HTML comments if ($response->content !~ /Delivered To : /) { $self->{'status'} = RC_BAD_REQUEST; # get explanation from HTML response my $START = ''; my $END = ''; if ($response->content =~ /$START(.*?)$END/s) { $self->{'error_as_HTML'} = $1; } else { # couldn't get explanation, use generic one $self->{'error_as_HTML'} = 'Unexpected HTML response from FedEx'; } } } } # couldn't get error explanation # unexpected reply # not delivered yet # if HTTP response of RC_OK (200) else { $self->{'error_as_HTML'} = $response>error_as_HTML; } } sub retrieve_okay { my $self = shift; if ($self->{'status'} != RC_OK) {return 0;} 1; } sub delivered { my $self = shift; $self->{'delivered'}; } sub who_got_it { my $self = shift; $self->{'who_got_it'}; } sub error_info { my $self = shift; $self->{'error_as_HTML'}; } Recursive Clients Recursive clients are robots that follow hyperlinks or other references on an HTML page In this section, we present a program that looks for bad links in a web site I've created a package called CheckSite that follows links within HTML and reports various properties of each page The constructor accepts the email address, delay time between requests, maximum number of requests, verbose flag, and optional proxy URL as parameters As in the FedEx example, this creates an LWP::RobotUA object inside the CheckSite package package CheckSite; sub new { my ($class, $email, $delay, $max, $verbose, $proxy) = @_; my $self = {}; bless $self, $class; # Create a User Agent object, give it a name, set delay between requests $self->{'ua'} = new LWP::RobotUA 'ORA_checksite/1.0', $email; if (defined $delay) {$self->{'ua'}>delay($delay);} # If proxy server specified, define it in the User Agent object if (defined $proxy) { $self->{'ua'}->proxy('http', $proxy); } $self->{'max'} = $max; $self->{'verbose'} = $verbose; $self; } ... behaviors that a client may exhibit The examples in this chapter all use a simple command-line interface In Chapter 7, Graphical Examples with Perl/Tk, we have some additional examples with a graphical... extension to Perl Simple Clients Simple clients are programs that perform actions for users in real time, usually with a finite list of URLs to act upon In this section, we''ll show LWP versions of the... categories are not set in stone It is possible to write a periodic client that also happens to be a recursive client Or a simple client might become periodic if the document indicates that the page

Ngày đăng: 28/10/2013, 15:15

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan