Главная    Ex Libris    Книги    Журналы    Статьи    Серии    Каталог    Wanted    Загрузка    ХудЛит    Справка    Поиск по индексам    Поиск    Форум   
blank
Авторизация

       
blank
Поиск по указателям

blank
blank
blank
Красота
blank
Hemenway K., Calishain T. — Spidering Hacks
Hemenway K., Calishain T. — Spidering Hacks



Обсудите книгу на научном форуме



Нашли опечатку?
Выделите ее мышкой и нажмите Ctrl+Enter


Название: Spidering Hacks

Авторы: Hemenway K., Calishain T.

Аннотация:

The Internet, with its profusion of information, has made us hungry for ever more, ever better data. Out of necessity, many of us have become pretty adept with search engine queries, but there are times when even the most powerful search engines aren't enough. If you've ever wanted your data in a different form than it's presented, or wanted to collect data from several sites and see it side-by-side without the constraints of a browser, then Spidering Hacks is for you. Spidering Hacks takes you to the next level in Internet data retrieval — beyond search engines — by showing you how to create spiders and bots to retrieve information from your favorite sites and data sources. You'll no longer feel constrained by the way host sites think you want to see their data presented — you'll learn how to scrape and repurpose raw data so you can view in a way that's meaningful to you. Written for developers, researchers, technical assistants, librarians, and power users, Spidering Hacks provides expert tips on spidering and scraping methodologies. You'll begin with a crash course in spidering concepts, tools (Perl, LWP, out-of-the-box utilities), and ethics (how to know when you've gone too far: what's acceptable and unacceptable). Next, you'll collect media files and data from databases. Then you'll learn how to interpret and understand the data, repurpose it for use in other applications, and even build authorized interfaces to integrate the data into your own content. By the time you finish Spidering Hacks, you'll be able to:

* Aggregate and associate data from disparate locations, then store and manipulate the data as you like
* Gain a competitive edge in business by knowing when competitors' products are on sale, and comparing sales ranks and product placement on e-commerce sites
* Integrate third-party data into your own applications or web sites
* Make your own site easier to scrape and more usable to others
* Keep up-to-date with your favorite comics strips, news stories, stock tips, and more without visiting the site every day


Язык: en

Рубрика: Технология/

Статус предметного указателя: Готов указатель с номерами страниц

ed2k: ed2k stats

Год издания: 2003

Количество страниц: 424

Добавлена в каталог: 15.06.2007

Операции: Положить на полку | Скопировать ссылку для форума | Скопировать ID
blank
Предметный указатель
Hack #70, Using the Link Cosmos of Technorati      
Hack #71, Finding Related RSS Feeds      
Hack #72, Automatically Finding Blogs of Interest      
Hack #73, Scraping TV Listings      
Hack #74, What's Your Visitor's Weather Like?      
Hack #75, Trendspotting with Geotargeting      
Hack #76, Getting the Best Travel Route by Train      
Hack #77, Geographic Distance and Back Again      
Hack #78, Super Word Lookup      
Hack #79, Word Associations with Lexical Freenet      
Hack #80, Reformatting Bugtraq Reports      
Hack #81, Keeping Tabs on the Web via Email      
Hack #82, Publish IE's Favorites to Your Web Site      
Hack #84, Bargain Hunting with PHP      
Hack #85, Aggregating Multiple Search Engine Results      
Hack #9, Simply Fetching with LWP::Simple      
Hack #92, Mirroring Web Sites with wget and rsync      
Hack #93, Accumulating Search Results Over Time      
Hack #94, Using XML::RSS to Repurpose Data      
Hack #95, Placing RSS Headlines on Your Site      
Hack #96, Making Your Resources Scrapable with Regular Expressions      
Hack #97, Making Your Resources Scrapable with a REST Interface      
Hack #98, Making Your Resources Scrapable with XML-RPC      
Hack #99, Creating an IM Interface      
Hammersley, Ben (contributor)      
Harvard Weblogs      
hash keys and      
hash keys and Technorati      
HEAD request      
Headers      
headlines, placing on your site      
health inspections      
Hemenway, Kevin (author)      
Hindenburg, Kurt V.      
horoscopes      
horoscopes, saving on iPod      
HTML files      
HTML::Diff      
HTML::Diff module      
HTML::Element nodes      
HTML::Element nodes module      
HTML::Entities      
HTML::Entities module      
HTML::LinkExtor      
HTML::LinkExtor module      
HTML::RSSAutodiscovery      
HTML::RSSAutodiscovery module      
HTML::TableExtract      
HTML::TableExtract module      
HTML::TokeParser      
HTML::TokeParser module      2nd
HTML::TreeBuilder      [See HTML::TreeBuilder module]
HTML::TreeBuilder and      
HTML::TreeBuilder module      
HTTP authentication      
HTTP headers      
HTTP POST request      
HTTP::Headers module      
HTTP::Message      
HTTP::Message module      
HTTP::Response      [See HTTP::Response module]2nd [See HTTP::Response module]
HTTP::Response module      2nd
HTTPS      
HTTPS support      
HTTP_PROXY environment variable      
hypermail      
Identifiers      
identifying based on content      
identifying documents across collections      
identifying what to scrape      
IE Favorites      
If-Modified-Since HTTP header      
iFilm      
IM interface, building      
Image::Size      
Image::Size module      
Images      
indexes, directory      
information collections, gleaning data from      
Installation      
Installing      
Instant messaging      
intellectual property      
interesting use of      
INTERFACE      
International Standard Serial Number Register      
Internet APIs      
IP-to-Country database      
iPods      
ISBN (International Standard Book Number)      
Joseph, Martyn      
JungleScan      
karaoke      
keeping tabs on web through      
Kennedy, Niall (contributor)      
Key      
Landgren, David (contributor)      
Language Tools      
languages, geotargeting local      
Last-Modified HTTP header      
latitude and longitude      
latitude/longitude position      
Lawrence Lessig's weblog      
lawsuits, Bidder's Edge sued by eBay      
leeching      
Legal issues      [See also lawsuits]
Lester, Andy (contributor)      
Lexical Freenet      
Library      
library classification systems      
Library of Congress      
library popularity, mapping O'Reilly best sellers to      
libwww-perl      
LibXML xmllint      
libxml2 library      
Limewire      
limitations      
Linden, James (contributor)      
Link Cosmos      
link count      
List of User-Agents web site      
LOC call numbers      
low-vision people      
LWP      
LWP::Parallel::UserAgent      
LWP::Parallel::UserAgent module      
LWP::Simple      [See LWP::Simple module]
LWP::Simple module      
LWP::UserAgent      [See LWP::UserAgent module]
LWP::UserAgent module      2nd
Lynx browser      2nd 3rd
LyricsFreak.com      
Mac OS X      
Mac OS X installation of Perl      
mail( ) function      
mailing lists      
MailTools      
making information available      
making your own resources scrapable      
management systems      
MapBlast!      
MapPoint      
MapPoint and      
MapQuest      
mb2md      
Mech      
Mechanisms      
media files      
Medico, Andrew      
Meerkat      
Memoize module      2nd
meta-search engine, building      
midshare, calculating      
MIME::Lite      
MIME::Lite module      
MIME::Parser      
MIME::Parser module      
Mirroring      
mirroring web sites      
mirroring web sites with      
misbehaving      
most-mentioned lists      
Movable Type      
movies from Library of Congress      
MP3s      
Multiple      
MUSIC      [See CDs karaoke MP3s]
naming      
naturalvoices.com      
Navigating      
navigation tools      
Net::AIM library      
Net::Blogger      
Net::Blogger module      
Net::Google      
Net::Google module      
Net::ICQ      
Net::ICQ module      
Net::Jabber      
Net::Jabber module      
Net::POP3      
Net::POP3 module      
NetNewsWire      
Newgrounds      
news aggregators      
news from an AP Wire feed      
news photo archive, scraping      
News Wallpaper      
NewsIsFree      
NewsMonster      
new_abs method      
nget, downloading from Usenet with      
O'Reilly best sellers, mapping to library popularity      
object attributes      
object's cookie_jar attribute      
of interest, finding      
Online documentation      
Open Directory Project      
Open Directory Project (DMOZ)      
optimizing      
orchard, l.m. (contributor)      
ordered lists      
ordered lists in HTML files      
Origins of American Animation site      
output to RSS file      
output, monitoring      
outputting RSS in      
outputting while in PHP      
overscraping      
overview      
Pacheco, Ron (contributor)      
packages, tracking with FedEx      
painless      
Palmer, Sean B. (contributor)      
parental control ratings for sites      
Parsing      
parsing with      
patterns of      
perl Makefile.PL phase of installation      
Perl modules      
Perl Monks      2nd
Perl, emulating shell scripts with      
Perl4Lib      
Perlmonk Snippets Index      
personal book lists      
Peters, Dean (contributor)      
Philosophy      
phonebook: syntax      
PHP      
PHP with      [See PHP scraping]
PHPNuke      
Picks of the Day      
Pilgrim, Mark      
pipes, using to chain commands      
PKP (Polskie Koleje Panstwowe, or Polish State Railways) server      
Places      
PlanetFeedback.com      
Pod2Go      
PodNotes      
POP3 email attachments, saving      
popularity, comparing in different locations      
Portal      
portal spider      
POST, simulating within LWP      
posting data with      
posting entries from multiple RSS feeds      
PPM      
presenting arguments for your      
Presenting results      
printers, watching      
Prisma      
problems with      
problems with use of data      
product reviews      
progress bars, adding to scripts      
Project Gutenberg      
Proxies      
publishing to web site      
push @ attribute      
Queries      2nd
Quicken's QIF format      
quotes      
Radio Userland      2nd
read by low-vision people      
reading and downloading a list of links      
reasons      
recommendations, sorting by rating      
redesigns      
Referer header      
registering      
registering spiders      
Regular expressions      
reinventing the wheel      
Related Artists link      
related products      
related searches      
Related searches feature      
relative and absolute      
relative URLs      
repecting bandwidth      
Representational State Transfer      [See REST]
repurposing data      
Requests      
ResearchBuzz      
respecting      
respecting scrapee's      
REST (Representational State Transfer)      
REST interface and      
restaurant inspections      
retrieving recommendations      
robot karaoke      
Robots Exclusion Protocol      
robots.txt file      
Rochester Institute of Technology's library search interface      
Rose, Richard (contributor)      
rotating cursors      
round robin      
RRDtool (Round Robin Database Tool)      
rrdtool update command      
RSS      [See also XML::RSS module]
1 2 3
blank
Реклама
blank
blank
HR
@Mail.ru
       © Электронная библиотека попечительского совета мехмата МГУ, 2004-2024
Электронная библиотека мехмата МГУ | Valid HTML 4.01! | Valid CSS! О проекте