Problems enoola

Problems

The follow measures have been taken to avoid problems for the server provider:

HTML only
Only HTML pages are loaded. Images and file attachments are ignored. This reduces system load for the page provider.

Breadth-first search
We are not polling just one server but each single external server every few minutes. This rules out a server overload.

One-time attempt
Each page is asked only once. If the URLs are defect, these are also asked only once. Repeated attempts with defective URLs are avoided.

Use of robots.txt
Robots.txt is used in line with the specification on www.robotstxt.org. It allows every server administrator to protect content and function of a site. Globally accepted regulations for search engines are adhered to.

Explicit exclusion of sites
Sites can be completely deactivated. If a page provider wishes whole sites can be excluded ? independent of robots.txt.

Name provision
To avoid searches on the basis of the IP, when problems occur, we use the project name ?enoola? including the link to www.enoola.com. This allows administrators to quickly point out any new problems.

Registration at robotstxt.org
Enoola is registered at robotstxt.org, so it is even easier for each administrator to get the necessary contact data.

Test environments
The Crawler is tested in test environments, during which the Crawler makes a protocol, so that what actually happens can be followed up.

Different types of link
We differentiate between a-href-links, area-href-links, form-links, input-links und img-links. The evaluation concerns only a-href-links and area-href-links.

Immediate stop
Crawler activities are stopped immediately as soon as a problem occurs. Evaluation only continues when the problem has been reprlicated and solved.

Kontakt

Achim Oberg
oberg@mail.ifm.uni-mannheim.de

Lehrstuhl für Kleine und Mittlere Unternehmen
woywode.bwl.uni-mannheim.de
Prof. Dr. Michael Woywode
Institut für Mittelstands-
forschung
Universität Mannheim
68161 Mannheim

Realisierung
www.plattform-gmbh.de

Start

Crawler

Impressum

Problems

Kontakt