Showing posts with label WebClient. Show all posts
Showing posts with label WebClient. Show all posts

Sunday, July 6, 2008

Using the .NET WebClient to Scrape Web Pages

.NET comes with a nifty little class called System.Net.WebClient that lets you easily interact with a web page.

To play with it I decided to scrape the output of this page that generates Shakespearean insults and grab just the insult from the output, giving me easy command-line access to random Shakespearean insults (something I often find myself in need of, to be sure).

# Retrieves a random Shakespearean insult from the Internet.
#
# Author: Tojo2000 <tojo2000@tojo2000.com>
# (c)2008 All Rights Reserved
#
# Usage: get-insult.ps1

$regex = New-Object System.Text.RegularExpressions.Regex('\n([^<>]+)\n');

$web_client = New-Object System.Net.WebClient;
$web_client.Headers.Add("user-agent", "PowerWeb");

$data = $web_client.DownloadString("http://www.pangloss.com/seidel/Shaker/index.html");

if ($match = $regex.Match($data)) {
  echo $match.Groups[1].Value;
}

Note: I'm not affiliated with this website, so obviously don't abuse it.  It's just an example