Главная    Ex Libris    Книги    Журналы    Статьи    Серии    Каталог    Wanted    Загрузка    ХудЛит    Справка    Поиск по индексам    Поиск    Форум   
blank
Авторизация

       
blank
Поиск по указателям

blank
blank
blank
Красота
blank
Hemenway K., Calishain T. — Spidering Hacks
Hemenway K., Calishain T. — Spidering Hacks



Обсудите книгу на научном форуме



Нашли опечатку?
Выделите ее мышкой и нажмите Ctrl+Enter


Название: Spidering Hacks

Авторы: Hemenway K., Calishain T.

Аннотация:

The Internet, with its profusion of information, has made us hungry for ever more, ever better data. Out of necessity, many of us have become pretty adept with search engine queries, but there are times when even the most powerful search engines aren't enough. If you've ever wanted your data in a different form than it's presented, or wanted to collect data from several sites and see it side-by-side without the constraints of a browser, then Spidering Hacks is for you. Spidering Hacks takes you to the next level in Internet data retrieval — beyond search engines — by showing you how to create spiders and bots to retrieve information from your favorite sites and data sources. You'll no longer feel constrained by the way host sites think you want to see their data presented — you'll learn how to scrape and repurpose raw data so you can view in a way that's meaningful to you. Written for developers, researchers, technical assistants, librarians, and power users, Spidering Hacks provides expert tips on spidering and scraping methodologies. You'll begin with a crash course in spidering concepts, tools (Perl, LWP, out-of-the-box utilities), and ethics (how to know when you've gone too far: what's acceptable and unacceptable). Next, you'll collect media files and data from databases. Then you'll learn how to interpret and understand the data, repurpose it for use in other applications, and even build authorized interfaces to integrate the data into your own content. By the time you finish Spidering Hacks, you'll be able to:

* Aggregate and associate data from disparate locations, then store and manipulate the data as you like
* Gain a competitive edge in business by knowing when competitors' products are on sale, and comparing sales ranks and product placement on e-commerce sites
* Integrate third-party data into your own applications or web sites
* Make your own site easier to scrape and more usable to others
* Keep up-to-date with your favorite comics strips, news stories, stock tips, and more without visiting the site every day


Язык: en

Рубрика: Технология/

Статус предметного указателя: Готов указатель с номерами страниц

ed2k: ed2k stats

Год издания: 2003

Количество страниц: 424

Добавлена в каталог: 15.06.2007

Операции: Положить на полку | Скопировать ссылку для форума | Скопировать ID
blank
Предметный указатель
$browser object      
.m3u files
.txt files      
Aas, Gisle      
AbleShoppers      
absolute URLs      
Accept-Charset headers      
Accept-Encoding HTTP header      
Accept-Language headers      
Acceptable Use Policies      
Acceptable use policy (AUP)
accessing particular
accessing particular URLs      
account      
across multiple domains using Google      
across multiple sites for authors      
ActiveState's ActivePerl      
adding to request
advanced applications and wget utility
advanced techniques
advertisers and      
advertisers and geotargeting      
aggregating      [See aggregating data]
aggregating data      2nd
aggregating entries from multiple      
aggregating from multiple engines      
aggregators      
AIM (AOL Instant Messenger)      
alert for new Amazon.com product reviews      
Alexa      
All Consuming      2nd
AlltheWeb.com      2nd 3rd
AlltheWeb.com sample      
AltaVista      
Amazon.com      
America at Work, America at Leisure project
America at Work, America at Leisure project (Library of Congress)      
Ampache      
AmphetaDesk      2nd 3rd
anatomy of      
Andromeda      2nd
announcing to world      
AOL Instant Messenger (AIM)      
Apache      
Apache::MP3      [See Apache::MP3 module]
Apache::MP3 module      2nd
API      
API developer's key      
Apis      
arbitrary      
arbitrary classification systems      
architectural style
archiving messages
archiving with yahoo2mbox      
archiving Yahoo! Groups messages
Artymiak, Jacek (contributor)
as Perl module      
ASIN      2nd
ASIN (Amazon.com Standard Identification Number)      2nd
Associates account
Associates sales statistics, publishing      
associative data      
attachments, saving only POP3      
Audioscrobbler
AUP (Acceptable Use Policy)
Authentication
authors, searching across multiple sites for      
automating
automating tasks
Ball, Chris (contributor)      
Bandwidth      
banking online
Bausch, Paul (contributor)      
BBC's Radio Times      
beginning process      
Ben's Bargains      
Benson, Erik (contributor)      
Berkman Center for Internet & Society at Harvard Law School
Best practices      
best practices for spidering      
Better Business Bureau      
Bidder's Edge sued by eBay      
Biddle, Daniel (contributor)      
bio      
Blagg      
Blawg Search      
blog neighborhoods      
Blogger      2nd
blogrolls      
Blogs      [See also weblogs]
Blosxom
book metadata and weblog mentions      
BOTs      [See spiders]
Boundary data
branding another site's data      
Bregenzer, Adam (contributor)
browser attributes
Buffy the Vampire Slayer      
Bugtraq reports, reformatting
Burke, Sean (contributor)      
by keyword
CAIDA project      2nd
calculating distance      
calculating mindshare      
Calishain, Tara (author)
cd-discid program
CdS
chaining commands      
change notification through email      
characters, special      
checking for new comments      
checks on keywords      
clarifying      
classification numbers, unique      
classification system      
classification systems      
clustered and related results      
clustered search results
Code      
Combined Log Format
combining information from FreeDB and      
combining related information with other      
comics      
comics, downloading      
Competitive intelligence      
Compress::Zlib      
Compress::Zlib module
Compressed
compressed data      
consequences of violating      
considering
contacting sites about your spider
Content      [See also data]
cookie jar
cookies, enabling
Copyfight, the Politics of IP web site
copyright, violating      
Cosmos      [See Link Cosmos]
Cozens, Simon (contributor)      
CPAN (Comprehensive Perl Archive Network)      2nd 3rd
CPAN module (Perl)      
creating web site for
cron
Crone      
cURL utility
cursors, rotating
customer advice      
customer advice, scraping      
dailystrips      
DATA      
databases      
daterange: syntax      
DaylightStation.com
Daypop      
Developer Wiki
Developer's Token      
Dewey Decimal system      
dict protocol      
DICT.org server      
Dictionaries      
diff utility (GNU)
difference between scrapers and      
Directi      
directories      
directory indexes
directory, calculating mindshare
disc ID
discussion groups      [See also Yahoo! Groups]2nd
disobey.com
distance calculating, geographic      
Dive Into Mark      
DMOZ (Open Directory Project)      
DNS lookup      
Dornfest, Rael (contributor)      
downloading      
downloading images from Webshots      
downloading movies from      
Dynamic MP3 Lister      
Eastler, William (contributor)      
eBay's lawsuit against Bidder's Edge      
EchoCloud      2nd
Edna      
EIN (Employer Identification Number)      
Electronic Freedom Foundation      
Electronic Frontier Foundation      
Email      
email alert for new      
error checking      
ETag HTTP header      
European train connections, finding faster      
example      
example template      
Fake Cron      
Faking a Referer      
Fallin, Scott (contributor)      
Fark      
Farscape      
Fastcron      
Favorites tree      
FedEx, tracking packages with      
feed ID      2nd
FeedDemon      
Feeds      
fetching      
fetching with      
File::Spec      
File::Spec module      
Files      2nd
Filtering      
Finance::Bank::HSBC      
Finance::Bank::HSBC module
Finance::QIF      
Finance::QIF module      
Finance::Quote      
Finance::Quote module      
finding related sites using      
finding related sites using RSS feeds      
FireWire HD and      
FireWire HD and iPods      
FishHoo! fishing search engine      
Folder
fopen( ) function      
for requesting the hourly and weekly most-mentioned lists      
for retrieving categorized books      
for spidering
form data, posting with LWP
form, posting with LWP
Forums      
Framing      
framing data      
FreeDB project
freshmeat.net
freshmeat.net sample
friends and recommendations
from Usenet with nget
from webcams
from Webshots      
functionalities, combining      
gaining access to CD device      
Gamegrene.com      
GameStop.com prices      
GameStop.com, spidering prices
gathering
gathering tools
Geo::Distance      
Geo::Distance module
geographic distance calculating
geotargeting      
Getopt::Std      
Getopt::Std module      
getting for each book
gleaning data from
gleaning data from databases and information collections
GNUMP3d      
Google      
Google API sample
grabbing      [See fetching]
graphing
graphing data with      
graphing data with RRDTOOL      
graphing Sales Rank      
grep command      
GuideStar      
Hack #10, More Involved Requests with LWP::UserAgent      
Hack #11, Adding HTTP Headers to Your Request      
Hack #12, Posting Form Data with LWP      
Hack #13, Authentication, Cookies, and Proxies
Hack #14, Handling Relative and Absolute URLs
Hack #15, Secured Access and Browser Attributes      
Hack #17, Respecting robots.txt      
Hack #19, Scraping with HTML::TreeBuilder      
Hack #20, Parsing with HTML::TokeParser      
Hack #21, WWW::Mechanize 101      
Hack #22, Scraping with WWW::Mechanize      
Hack #23, In Praise of Regular Expressions      
Hack #24, Painless RSS with Template::Extract
Hack #25, A Quick Introduction to XPath      
Hack #27, More Advanced wget Techniques      
Hack #28, Using Pipes to Chain Commands      
Hack #29, Running Multiple Utilities at Once      
Hack #30, Utilizing the Web Scraping Proxy
Hack #36, Downloading Images from Webshots      
Hack #38, Archiving Your Favorite Webcams      
Hack #44, Archiving Yahoo! Groups Messages with WWW::Yahoo::Groups
Hack #45, Gleaning Buzz from Yahoo!      
Hack #46, Spidering the Yahoo! Catalog      
Hack #50, Weblog-Free Google Results      
Hack #52, Scraping Amazon.com Product Reviews      
Hack #53, Receive an Email Alert for Newly Added Amazon.com Reviews      
Hack #54, Scraping Amazon.com Customer Advice      
Hack #55, Publishing Amazon.com Associates Statistics      
Hack #56, Sorting Amazon.com Recommendations by Rating      
Hack #57, Related Amazon.com Products with Alexa      
Hack #58, Scraping Alexa's Competitive Data with Java      
Hack #59, Finding Album Information with FreeDB and Amazon.com      
Hack #60, Expanding Your Musical Tastes      
Hack #62, Graphing Data with RRDTOOL      
Hack #63, Stocking Up on Financial Quotes      
Hack #64, Super Author Searching      
Hack #66, Using All Consuming to Get Book Lists      
1 2 3
blank
Реклама
blank
blank
HR
@Mail.ru
       © Электронная библиотека попечительского совета мехмата МГУ, 2004-2025
Электронная библиотека мехмата МГУ | Valid HTML 4.01! | Valid CSS! О проекте