Armenian Knowledge Base  

Go Back   Armenian Knowledge Base > Technical sections > Webmaster Zone > Showcase
Register

Reply
 
LinkBack Thread Tools
Old 08.05.2002, 11:03   #1
Guest
 
Posts: n/a
Downloads:
Uploads:
Red face AntiGrabbing

lyudi, tut u menya takoy vopros .. ya dumayu mnogix zainteresuyet ..

kak izvestno sayti postoyanno grablyatsya razlichnimi sitegrabberami tipo webzip ili yesho chto-to vetom rode .. osobenno menya etot vopros ochen' volnuyet, t.k. eto lishniy trafik i sledovatel'no lishnie zatrati na server.

tak vot, yesli ktoto znayet, predlojite pojaluysta metodi predotvrasheniya ili xotyabi chastichnogo predotvrasheniya takix downloadov.

zaranee blagodaren
Reply With Quote
Old 08.05.2002, 13:02   #2
Профессор
 
Join Date: 01 2002
Location: New York, USA
Posts: 2,938
Downloads: 0
Uploads: 0
Reputation: 0 | 0
Post

Pochitay pro specifikaciyu "robots.txt"

Bol'shinstvo offline browserov ej podchinyayutsya...
Reply With Quote
Old 09.05.2002, 10:23   #3
Guest
 
Posts: n/a
Downloads:
Uploads:
Post

mmmm
da da .. ya tak i dumal chto robots.txt budet imet' k etomu otnoshenie .. chesno ..

tipo
user-agent webzip : disallow /

ili chtoto v etom rode ..

groul mojesh mne xoroshiy link dat' gde bi ya ob etom prochital plz ?
Reply With Quote
Old 09.05.2002, 10:32   #4
Moderator
 
acid's Avatar
 
Join Date: 09 2001
Location: South Korea, Gumi
Posts: 7,699
Downloads: 102
Uploads: 34
Blog Entries: 16
Reputation: 561 | 6
Post

Vo vsex kachalkax mozhno izmenit' user agent i postavit' k primeru IE5, ja pochti vsemi izvestnymi pol'zovalsja.
Reply With Quote
Old 09.05.2002, 10:40   #5
Guest
 
Posts: n/a
Downloads:
Uploads:
Post

Quote:
Originally posted by acid:
Vo vsex kachalkax mozhno izmenit' user agent i postavit' k primeru IE5, ja pochti vsemi izvestnymi pol'zovalsja.
eto konechno tak, ya i sam videl, no xotyabi mojno raschitivat' chto kakoy to procent grabinga umen'shetsya ..
Reply With Quote
Old 09.05.2002, 17:51   #6
Kooper
 
Kooper_26's Avatar
 
Join Date: 05 2002
Location: Hay.am Portal
Age: 41
Posts: 350
Downloads: 0
Uploads: 0
Reputation: 0 | 0
Arrow

Postavish teg protiv robota, tebya nikakaya prilichnaya poiskovaya sistema ne najdet, luchshe ispoluj cgi, eto pomogaet bolee menee i razmesti pics-y na drugix serverax chtob za trafik mnogo ne naletalo
Reply With Quote
Old 09.05.2002, 22:47   #7
Профессор
 
Join Date: 01 2002
Location: New York, USA
Posts: 2,938
Downloads: 0
Uploads: 0
Reputation: 0 | 0
Post

2 Yervand

http://www.robotstxt.org/wc/norobots.html

naydeno po ssylke:
http://www.google.com/search?hl=en&s...=Google+Search

))

2 Kooper
> Postavish teg protiv robota, tebya nikakaya prilichnaya poiskovaya sistema ne najdet

Prosti, no ya dumayu, chto eto v korne neverno.
Vo-pervyx mozhno sdelat' disallow tol'ko na folder gde lezhat images.
Vo vtoryx kazhdaya seryeznaya ("prilichnaya") poiskovaya sistema imeet svoj user-agent, poetomu mozhno razre****' dostup, naprimer, googlebot ili slurp, odnovremenno zapretiv dostup dlya kakogo-nibud' webzip-a.
Kstati google dazhe imeet special'nyj tag, pozvolyayushij isklyuchat' iz indexa imenno kartinki (kotorye indexiruyutsya dlya Google image search)!

Naprimer

a.

User-agent: *
Disallow: /img/

b.

User-Agent: Googlebot-Image
Disallow: /
__________________
Karen Vrtanesyan, աջակցող

ArmenianHouse.org - Armenian Library and Forum.
Literary Cafe - Young Armenian writers and poets
Reply With Quote
Old 10.05.2002, 00:02   #8
Консервативн
 
VX's Avatar
 
Join Date: 01 2002
Location: Кавказская Албания
Posts: 889
Downloads: 0
Uploads: 0
Reputation: 0 | 0
Post

PPls, robots.txt eto ne ocne' inetersnopye resheniye tak kak ono mojet ne rabotat', naprimer ya znau kachlki c kotorix mojno vikluchit' funckciu robotx.txt, eto vopros mojno re****' drugim sposobvom, spomishu napisaniya programu (na php naprimer), kotoraya kontrolirovalabi interval poslednix zaprosov itd. itp.....
za bol'shoe bablo napishu
Reply With Quote
Old 10.05.2002, 00:44   #9
Главный Лысый
 
Pascal's Avatar
 
Join Date: 10 2001
Location: AM
Age: 39
Posts: 2,829
Downloads: 4
Uploads: 0
Reputation: 28 | 4
Post

2Yervand
Ya zanimalsya etoy problemoy.
I ponyal, chto nereal'no garantirovanno za****it'sya ot skachivaniya.

Khotya mogu perechislit' neskol'ko mne ponravivshikhsya mekhanizmov.
1. Ispol'zovanie mod_rewrite. Eto OCHEN" gibkiy mekhanizm perepisyvaniya URI. Esli na site-e rabotaet kakoy libo vnutrenniy mekhanizm podscheta hit-ov - to mozhno sovmestit' ispol'zovanie external program dlya mod_rewrite i etogo counter-a zapreshaya obrasheniya tomu hostu, kotoriy pereshel nekiy limit (naprimer > 500 zaprosov/min). Minusy - bol'shaya zagruzzka processor-a na servere i bol'shaya nagruzka na DB, v sluchae esli counter ispol'zuet DB. Da i sam mod_rewrite slishkom zzhaden do resursov.
==Apache docs==
RewriteCond %{REMOTE_HOST} ^host1.* [OR]
RewriteCond %{REMOTE_HOST} ^host2.* [OR]
RewriteCond %{REMOTE_HOST} ^host3.*
RewriteRule ...some special stuff for any of these hosts...
==skip==
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*
RewriteRule ^/$ /homepage.max.html [L] RewriteCond %{HTTP_USER_AGENT} ^Lynx.*
RewriteRule ^/$ /homepage.min.html [L] RewriteRule ^/$ /homepage.std.html [L]
==end of Apache docs===

2. Khranit' postoyanno obnovlyaemiy i teryayushiy actual'nost' content. Konechno etot podkhod priemlem tol'ko dlya ogranichennogo kol-va site-ov (naprimer novostnie), no esli podumat' v etom napravlenii - to mozhno chto-to pridumat'.

3. Sdelat' nemnogo khitruyu navigatsiyu. A imenno dat' vozmozhnost' user-u poluchit' informatsiyu v raznom otyobrazhenii. Tumanno poluchilos' - no vot primer:
http://news.panarmenian.net/rus/headlines/
Kazhdaya novost' dostizhima v 2 variantakh - v novom okne i v tom zhe okne.
Smysl vsego etogo khozyastva pomimo udobstva user-a, eshe i v tom, chto Kakoy-to tam teleport pro poymet i novoe okno, i polezet v to zhe okno. Poluchit v zuby do cherta ssylok i process skachivaniya udlinitsya do bezbozhnosti.

4. Umen'**** ' razmer stranichek uvelichiv ikh kol-vo. Neizvestno naskol'ko eto pomozhet vospriyatiyu informatsii, no esli nayti optimal'noe reshenie - to etot metod v peresechenii s 3 daet fantasticheskie resul'taty. Obychno eti kachalki skachivayut vse files v odnu direktoriyu. A chem bol'she files v working direktorii - tem medlennee rabotaet danniy proces. Eto spravedlivo kak dlya UNIX s ego prekrasnym wget-om, tak i dlya windows. Kak tol'ko kol-vo files zashkalit za 5-6K, process tormozit, i perestaet byt' effektivnym. Pri vsem etom razmer skachannogo site-a na diske stanovitsya uzhasayushe bol'shim. A vremya etogo downloada ochen' dolgim.
JFYI W2000 (PII, 256 RAM) stiral direktoriyu s 37000 files okolo 3 chasov.....

5. zapretit' skachivat' files ne cherez vash referer. Mozhet eto ne sovsem v kassu - no mozhet navesti na mysli. Menya navel.
SetEnvIfNoCase Referer lib\.ru internal_referer
SetEnvIfNoCase User-Agent Teleport internal_referer
SetEnvIfNoCase User-Agent Vampire internal_referer
SetEnvIfNoCase User-Agent ReGet internal_referer
SetEnvIfNoCase User-Agent GetRight internal_referer
SetEnvIfNoCase User-Agent Wget internal_referer
<Files ~ "\.zip$">
ErrorDocument 403 http://lib.ru/books/index.htm
order deny,allow
deny from all
allow from env=internal_referer
</Files>

Vse primery configuratsii dlya apache-1.3.xx

ispol'zovannaya literatura:
http://httpd.apache.org/docs/
http://library.web.am/win/WEBMASTER/sowetywww2.txt

Vot i vse. Esli chto neponyatno - obyasnyu podrobnee.

Regards
__________________
Ruben Muradyan
Technical Director
PanARMENIAN Network: Armenian News

----------------------------------------------------
Лысина - это полянка, вытоптанная мыслями.
----------------------------------------------------
Reply With Quote
Old 10.05.2002, 00:47   #10
Главный Лысый
 
Pascal's Avatar
 
Join Date: 10 2001
Location: AM
Age: 39
Posts: 2,829
Downloads: 4
Uploads: 0
Reputation: 28 | 4
Post

Acid jan.
PLS obrati vnimanie na filter matyugov. On absolyutno bezobidnoe slovo U_M_E_N_'_S_H_I_T_' prevratil v UMEN****'
Reply With Quote
Old 11.05.2002, 02:35   #11
Главный Лысый
 
Pascal's Avatar
 
Join Date: 10 2001
Location: AM
Age: 39
Posts: 2,829
Downloads: 4
Uploads: 0
Reputation: 28 | 4
Post

kstati obrati vnimanie na URL
http://www.pics.am/animals/birds/images/

Eto iz-za togo, chto web-server nastroen ne sovsem korrektno.
Skoree vsego v conf-file-e Apache-a stoit chto-to vrode
<Directory "/SOMEDIR/">
Options Indexes MultiViews
AllowOverride None
Order allow,deny
Allow from all
</Directory>
A nado Options vse chto ugodno, no ne Indexes, MultiViews.
Kstati na tvoem servere imenno Apache.

Etot pont prokhodit na mnogikh site-akh. Prichem sdelannykh ves'ma professional'nymi studiyami.
Reply With Quote
Old 11.05.2002, 08:14   #12
Профессор
 
Join Date: 01 2002
Location: New York, USA
Posts: 2,938
Downloads: 0
Uploads: 0
Reputation: 0 | 0
Post

Eshe odna idea: kak naschet free membership?

Kstati:

http://zmey.com.ru/cgi-bin/nph-down.pl/0/picpmp16.zip

Picture Pump 1.6
odin iz koshmarov Yervanda
Reply With Quote
Old 11.05.2002, 19:35   #13
Guest
 
Posts: n/a
Downloads:
Uploads:
Post

Quote:
Originally posted by groul:
Eshe odna idea: kak naschet free membership?

Kstati:

http://zmey.com.ru/cgi-bin/nph-down.pl/0/picpmp16.zip

Picture Pump 1.6
odin iz koshmarov Yervanda
kakovo membershipa ???

a proga i vpravdu iz serii kashmarov lol
Reply With Quote
Old 13.05.2002, 11:22   #14
Профессор
 
Join Date: 01 2002
Location: New York, USA
Posts: 2,938
Downloads: 0
Uploads: 0
Reputation: 0 | 0
Post

A takogo:
Pust' user loginitsya a ty budesh session cherez cookie proslezhivat'.

Grabbery cookies podderzhivayut?

i eshe odno interesnoe obsuzhdenie po teme:

http://anfrax.ru/?c=2002_05_10_12n
Reply With Quote
Sponsored Links
Reply

Thread Tools


На правах рекламы:
реклама

All times are GMT. The time now is 23:31.


Powered by vBulletin® Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.