Главная    Ex Libris    Книги    Журналы    Статьи    Серии    Каталог    Wanted    Загрузка    ХудЛит    Справка    Поиск по индексам    Поиск    Форум   
blank
Авторизация

       
blank
Поиск по указателям

blank
blank
blank
Красота
blank
Hemenway K., Calishain T. — Spidering Hacks
Hemenway K., Calishain T. — Spidering Hacks



Обсудите книгу на научном форуме



Нашли опечатку?
Выделите ее мышкой и нажмите Ctrl+Enter


Название: Spidering Hacks

Авторы: Hemenway K., Calishain T.

Аннотация:

The Internet, with its profusion of information, has made us hungry for ever more, ever better data. Out of necessity, many of us have become pretty adept with search engine queries, but there are times when even the most powerful search engines aren't enough. If you've ever wanted your data in a different form than it's presented, or wanted to collect data from several sites and see it side-by-side without the constraints of a browser, then Spidering Hacks is for you. Spidering Hacks takes you to the next level in Internet data retrieval — beyond search engines — by showing you how to create spiders and bots to retrieve information from your favorite sites and data sources. You'll no longer feel constrained by the way host sites think you want to see their data presented — you'll learn how to scrape and repurpose raw data so you can view in a way that's meaningful to you. Written for developers, researchers, technical assistants, librarians, and power users, Spidering Hacks provides expert tips on spidering and scraping methodologies. You'll begin with a crash course in spidering concepts, tools (Perl, LWP, out-of-the-box utilities), and ethics (how to know when you've gone too far: what's acceptable and unacceptable). Next, you'll collect media files and data from databases. Then you'll learn how to interpret and understand the data, repurpose it for use in other applications, and even build authorized interfaces to integrate the data into your own content. By the time you finish Spidering Hacks, you'll be able to:

* Aggregate and associate data from disparate locations, then store and manipulate the data as you like
* Gain a competitive edge in business by knowing when competitors' products are on sale, and comparing sales ranks and product placement on e-commerce sites
* Integrate third-party data into your own applications or web sites
* Make your own site easier to scrape and more usable to others
* Keep up-to-date with your favorite comics strips, news stories, stock tips, and more without visiting the site every day


Язык: en

Рубрика: Технология/

Статус предметного указателя: Готов указатель с номерами страниц

ed2k: ed2k stats

Год издания: 2003

Количество страниц: 424

Добавлена в каталог: 15.06.2007

Операции: Положить на полку | Скопировать ссылку для форума | Скопировать ID
blank
Предметный указатель
rsync, mirroring web sites with      
SafeSearch filtering mechanism      
sales statistics, publishing Amazon.com Associates      
Sample      
saving daily horoscopes on      
scattersearching      
scheduling tasks without      
scheduling tasks without cron      
scrapers and spiders, difference between      
Scraping      
scraping competitive data      
scraping with      
Script Schedule      
scripts, adding progress bars to      
Search engine robots web site      
search form      
search request program      
search request to      
search results      
Searching      
searching code      
searching for authors      
searching instead of ISBN      
searching LOC call numbers instead of      
Seattle's King County database of restaurant inspections      
secured access and browser attributes      
sending      
Shared RRD module      
shell scripts      
Sifry, Dave      
signatures, software      
simulating a POST      
Six Degrees of Kevin Bacon      
Slashcode      
Slashdot      
sleep statement (Perl)      
SMIL (Synchronized Multimedia Integration Language) files      
SOAP-based Google Web Services API      
SOAP::Lite package      
software for      
software packages      
Sort::Array module      
spaces      
specific information, locating and gathering      
Spider-Man theme song      
spidering      
Spiders      
sprintf      
stock prices, collecting      
structure of      
Synchronized Multimedia Integration Language (SMIL) files      
Syndic8      
syndicated news feeds      
syndication      
Tang, Autrijus      
Technorati      
Technorati and      
Template Toolkit      
Template::Extract      
Template::Extract module      
Template::Generate      
Template::Generate module      
Templates      
Term::ProgressBar      
Term::ProgressBar module      
Terms of Service (TOS)      
Terms of Use (TOU)      
Testing      
text      [See content]
Text::Diff      
Text::Diff module      
Text::Template      
Text::Template module      
that track legitimate spiders      
Thesaurus      
Thesaurus.com      
Time::JulianDay      
Time::JulianDay module      
titles      
to find data including friends or recommendations      
to get book metadata and weblog mentions      
Toftum, Mads (contributor)      
tools, using correct      
Top 20 searching
Top 20 searching on Google
TOS (Terms of Service)      
TOU (Terms of Use)      
Tracking      [See Link Cosmos]
tracking additions to      
tracking packages with FedEx      
tracking search results      
traffic statistics, agregating      
train connections, finding faster      
TREE      [See Favorites tree]
Trees      
trendspotting with geotargeting      
Truskett, Iain (contributor)      
turning into positions      
TV Guide Online      
TV listings      
TV listings, scraping      
tvlisting      
U.S. Census web site      
Udell, Jon      
Unicode.org      
United States Post Office      
UNIX      
Unix and Mac OS X installation      
Unix and Mac OS X installation of Perl      
URI      
URI module      2nd
URI::Escape      
URI::Escape module      
URLs      
Usenet, downloading from      
User Agent Database web site      
using established universal taxonomy      
using existing programs      
using good      
using regular expressions      
using specific      
using to automate tasks      
using to find related web sites      
using to repurpose data      
using to scrape across multiple domains      
utilities      
utilities, running multiple      
VersionTracker      
virtual browsers      
visual indicators      
visual indicators when downloading      
Vitiello, Eric (contributor)      
VoodooPad      
watching printers      
Weather Underground      
weather, identifying visitor's      
Weather::Underground      
Weather::Underground module      
Web Robots Database web site      
Web Scraping Proxy      
Web Services      
Web Services API, SOAP-based      
Web Services ASIN query      
Web sites      
web sites having problems with use of      
web sites that track legitimate      
webcams, archiving      
weblog      
weblog-free Google results      
weblog-free results      
weblogs      [See also blogs]
Webmaster World      2nd
Webshots      
wget utility      
what is indexed      
What's New page      
who may not want to be spidered      
why use this technology      
Win32::Sound      
Win32::Sound module      
Windows      
Windows and Perl      
Wired Bots      
with cron      
with LWP::Simple      
with REST interface      
with XML-RPC      
word lookup      
WWW::Mechanize      [See WWW::Mechanize module]
WWW::Mechanize module      2nd
WWW::Yahoo::Groups      
WWW::Yahoo::Groups module      
XBox games      
Xerces for Java      
XHTML (Extensible Hypertext Markup Language) files      
XML      
XML (Extensible Markup Language) files      
XML and      
XML-RPC      
XML::LibXML      
XML::LibXML module      
XML::RSS      [See XML::RSS module]
XML::RSS module      
XMLRPC::Lite      [See XMLRPC::Lite module]
XMLRPC::Lite module      2nd
XPath      
Yahoo!      
Yahoo! Buzz      
Yahoo! Catalog, spidering      
Yahoo! Groups      
Yahoo!'s news photo archive      
yahoo2mbox      
Zeitgeist page      
Zeitlin, Vadim      
Zip codes      
Zope      
1 2 3
blank
Реклама
blank
blank
HR
@Mail.ru
       © Электронная библиотека попечительского совета мехмата МГУ, 2004-2024
Электронная библиотека мехмата МГУ | Valid HTML 4.01! | Valid CSS! О проекте