Главная    Ex Libris    Книги    Журналы    Статьи    Серии    Каталог    Wanted    Загрузка    ХудЛит    Справка    Поиск по индексам    Поиск    Форум   
blank
Авторизация

       
blank
Поиск по указателям

blank
blank
blank
Красота
blank
Hemenway K., Calishain T. — Spidering Hacks
Hemenway K., Calishain T. — Spidering Hacks

Читать книгу
бесплатно

Скачать книгу с нашего сайта нельзя

Обсудите книгу на научном форуме



Нашли опечатку?
Выделите ее мышкой и нажмите Ctrl+Enter


Название: Spidering Hacks

Авторы: Hemenway K., Calishain T.

Аннотация:

The Internet, with its profusion of information, has made us hungry for ever more, ever better data. Out of necessity, many of us have become pretty adept with search engine queries, but there are times when even the most powerful search engines aren't enough. If you've ever wanted your data in a different form than it's presented, or wanted to collect data from several sites and see it side-by-side without the constraints of a browser, then Spidering Hacks is for you. Spidering Hacks takes you to the next level in Internet data retrieval — beyond search engines — by showing you how to create spiders and bots to retrieve information from your favorite sites and data sources. You'll no longer feel constrained by the way host sites think you want to see their data presented — you'll learn how to scrape and repurpose raw data so you can view in a way that's meaningful to you. Written for developers, researchers, technical assistants, librarians, and power users, Spidering Hacks provides expert tips on spidering and scraping methodologies. You'll begin with a crash course in spidering concepts, tools (Perl, LWP, out-of-the-box utilities), and ethics (how to know when you've gone too far: what's acceptable and unacceptable). Next, you'll collect media files and data from databases. Then you'll learn how to interpret and understand the data, repurpose it for use in other applications, and even build authorized interfaces to integrate the data into your own content. By the time you finish Spidering Hacks, you'll be able to:

* Aggregate and associate data from disparate locations, then store and manipulate the data as you like
* Gain a competitive edge in business by knowing when competitors' products are on sale, and comparing sales ranks and product placement on e-commerce sites
* Integrate third-party data into your own applications or web sites
* Make your own site easier to scrape and more usable to others
* Keep up-to-date with your favorite comics strips, news stories, stock tips, and more without visiting the site every day


Язык: en

Рубрика: Технология/

Статус предметного указателя: Готов указатель с номерами страниц

ed2k: ed2k stats

Год издания: 2003

Количество страниц: 424

Добавлена в каталог: 15.06.2007

Операции: Положить на полку | Скопировать ссылку для форума | Скопировать ID
blank
Предметный указатель
$browser object      
.m3u files      
.txt files      
Aas, Gisle      
AbleShoppers      
absolute URLs      
Accept-Charset headers      
Accept-Encoding HTTP header      
Accept-Language headers      
Acceptable Use Policies      
Acceptable use policy (AUP)      
accessing particular      
accessing particular URLs      
account      
across multiple domains using Google      
across multiple sites for authors      
ActiveState's ActivePerl      
adding to request      
advanced applications and wget utility      
advanced techniques      
advertisers and      
advertisers and geotargeting      
aggregating      [See aggregating data]
aggregating data      2nd
aggregating entries from multiple      
aggregating from multiple engines      
aggregators      
AIM (AOL Instant Messenger)      
alert for new Amazon.com product reviews      
Alexa      
All Consuming      2nd
AlltheWeb.com      2nd 3rd
AlltheWeb.com sample      
AltaVista      
Amazon.com      
America at Work, America at Leisure project      
America at Work, America at Leisure project (Library of Congress)      
Ampache      
AmphetaDesk      2nd 3rd
anatomy of      
Andromeda      2nd
announcing to world      
AOL Instant Messenger (AIM)      
Apache      
Apache::MP3      [See Apache::MP3 module]
Apache::MP3 module      2nd
API      
API developer's key      
Apis      
arbitrary      
arbitrary classification systems      
architectural style      
archiving messages      
archiving with yahoo2mbox      
archiving Yahoo! Groups messages      
Artymiak, Jacek (contributor)      
as Perl module      
ASIN      2nd
ASIN (Amazon.com Standard Identification Number)      2nd
Associates account      
Associates sales statistics, publishing      
associative data      
attachments, saving only POP3      
Audioscrobbler      
AUP (Acceptable Use Policy)      
Authentication      
authors, searching across multiple sites for      
automating      
automating tasks      
Ball, Chris (contributor)      
Bandwidth      
banking online      
Bausch, Paul (contributor)      
BBC's Radio Times      
beginning process      
Ben's Bargains      
Benson, Erik (contributor)      
Berkman Center for Internet & Society at Harvard Law School
Best practices      
best practices for spidering      
Better Business Bureau      
Bidder's Edge sued by eBay      
Biddle, Daniel (contributor)      
bio      
Blagg      
Blawg Search      
blog neighborhoods      
Blogger      2nd
blogrolls      
Blogs      [See also weblogs]
Blosxom      
book metadata and weblog mentions      
BOTs      [See spiders]
Boundary data      
branding another site's data      
Bregenzer, Adam (contributor)      
browser attributes      
Buffy the Vampire Slayer      
Bugtraq reports, reformatting      
Burke, Sean (contributor)      
by keyword      
CAIDA project      2nd
calculating distance      
calculating mindshare      
Calishain, Tara (author)      
cd-discid program      
CdS      
chaining commands      
change notification through email      
characters, special      
checking for new comments      
checks on keywords      
clarifying      
classification numbers, unique      
classification system      
classification systems      
clustered and related results      
clustered search results      
Code      
Combined Log Format      
combining information from FreeDB and      
combining related information with other      
comics      
comics, downloading      
Competitive intelligence      
Compress::Zlib      
Compress::Zlib module      
Compressed      
compressed data      
consequences of violating      
considering      
contacting sites about your spider      
Content      [See also data]
cookie jar      
cookies, enabling      
Copyfight, the Politics of IP web site      
copyright, violating      
Cosmos      [See Link Cosmos]
Cozens, Simon (contributor)      
CPAN (Comprehensive Perl Archive Network)      2nd 3rd
CPAN module (Perl)      
creating web site for      
cron      
Crone      
cURL utility      
cursors, rotating      
customer advice      
customer advice, scraping      
dailystrips      
DATA      
databases      
daterange: syntax      
DaylightStation.com      
Daypop      
Developer Wiki      
Developer's Token      
Dewey Decimal system      
dict protocol      
DICT.org server      
Dictionaries      
diff utility (GNU)      
difference between scrapers and      
Directi      
directories      
directory indexes      
directory, calculating mindshare      
disc ID      
discussion groups      [See also Yahoo! Groups]2nd
disobey.com      
distance calculating, geographic      
Dive Into Mark      
DMOZ (Open Directory Project)      
DNS lookup      
Dornfest, Rael (contributor)      
downloading      
downloading images from Webshots      
downloading movies from      
Dynamic MP3 Lister      
Eastler, William (contributor)      
eBay's lawsuit against Bidder's Edge      
EchoCloud      2nd
Edna      
EIN (Employer Identification Number)      
Electronic Freedom Foundation      
Electronic Frontier Foundation      
Email      
email alert for new      
error checking      
ETag HTTP header      
European train connections, finding faster      
example      
example template      
Fake Cron      
Faking a Referer      
Fallin, Scott (contributor)      
Fark      
Farscape      
Fastcron      
Favorites tree      
FedEx, tracking packages with      
feed ID      2nd
FeedDemon      
Feeds      
fetching      
fetching with      
File::Spec      
File::Spec module      
Files      2nd
Filtering      
Finance::Bank::HSBC      
Finance::Bank::HSBC module      
Finance::QIF      
Finance::QIF module      
Finance::Quote      
Finance::Quote module      
finding related sites using      
finding related sites using RSS feeds      
FireWire HD and      
FireWire HD and iPods      
FishHoo! fishing search engine      
Folder      
fopen( ) function      
for requesting the hourly and weekly most-mentioned lists      
for retrieving categorized books      
for spidering      
form data, posting with LWP      
form, posting with LWP      
Forums      
Framing      
framing data      
FreeDB project      
freshmeat.net      
freshmeat.net sample      
friends and recommendations      
from Usenet with nget      
from webcams      
from Webshots      
functionalities, combining      
gaining access to CD device      
Gamegrene.com      
GameStop.com prices      
GameStop.com, spidering prices      
gathering      
gathering tools      
Geo::Distance      
Geo::Distance module      
geographic distance calculating      
geotargeting      
Getopt::Std      
Getopt::Std module      
getting for each book      
gleaning data from      
gleaning data from databases and information collections      
GNUMP3d      
Google      
Google API sample      
grabbing      [See fetching]
graphing      
graphing data with      
graphing data with RRDTOOL      
graphing Sales Rank      
grep command      
GuideStar      
Hack #10, More Involved Requests with LWP::UserAgent      
Hack #11, Adding HTTP Headers to Your Request      
Hack #12, Posting Form Data with LWP      
Hack #13, Authentication, Cookies, and Proxies      
Hack #14, Handling Relative and Absolute URLs      
Hack #15, Secured Access and Browser Attributes      
Hack #17, Respecting robots.txt      
Hack #19, Scraping with HTML::TreeBuilder      
Hack #20, Parsing with HTML::TokeParser      
Hack #21, WWW::Mechanize 101      
Hack #22, Scraping with WWW::Mechanize      
Hack #23, In Praise of Regular Expressions      
Hack #24, Painless RSS with Template::Extract      
Hack #25, A Quick Introduction to XPath      
Hack #27, More Advanced wget Techniques      
Hack #28, Using Pipes to Chain Commands      
Hack #29, Running Multiple Utilities at Once      
Hack #30, Utilizing the Web Scraping Proxy      
Hack #36, Downloading Images from Webshots      
Hack #38, Archiving Your Favorite Webcams      
Hack #44, Archiving Yahoo! Groups Messages with WWW::Yahoo::Groups      
Hack #45, Gleaning Buzz from Yahoo!      
Hack #46, Spidering the Yahoo! Catalog      
Hack #50, Weblog-Free Google Results      
Hack #52, Scraping Amazon.com Product Reviews      
Hack #53, Receive an Email Alert for Newly Added Amazon.com Reviews      
Hack #54, Scraping Amazon.com Customer Advice      
Hack #55, Publishing Amazon.com Associates Statistics      
Hack #56, Sorting Amazon.com Recommendations by Rating      
Hack #57, Related Amazon.com Products with Alexa      
Hack #58, Scraping Alexa's Competitive Data with Java      
Hack #59, Finding Album Information with FreeDB and Amazon.com      
Hack #60, Expanding Your Musical Tastes      
Hack #62, Graphing Data with RRDTOOL      
Hack #63, Stocking Up on Financial Quotes      
Hack #64, Super Author Searching      
Hack #66, Using All Consuming to Get Book Lists      
1 2 3
blank
Реклама
blank
blank
HR
@Mail.ru
       © Электронная библиотека попечительского совета мехмата МГУ, 2004-2017
Электронная библиотека мехмата МГУ | Valid HTML 4.01! | Valid CSS! О проекте