The follow measures have been taken to avoid problems for the server provider:
HTML onlyOnly HTML pages are loaded. Images and file attachments are ignored. This reduces system load for the page provider.
Breadth-first searchWe are not polling just one server but each single external server every few minutes. This rules out a server overload.
One-time attemptEach page is asked only once. If the URLs are defect, these are also asked only once. Repeated attempts with defective URLs are avoided.
Use of robots.txtRobots.txt is used in line with the specification on
www.robotstxt.org. It allows every server administrator to protect content and function of a site. Globally accepted regulations for search engines are adhered to.
Explicit exclusion of sitesSites can be completely deactivated. If a page provider wishes whole sites can be excluded ? independent of robots.txt.
Name provisionTo avoid searches on the basis of the IP, when problems occur, we use the project name ?enoola? including the link to
www.enoola.com. This allows administrators to quickly point out any new problems.
Registration at robotstxt.orgEnoola is registered at robotstxt.org, so it is even easier for each administrator to get the necessary contact data.
Test environmentsThe Crawler is tested in test environments, during which the Crawler makes a protocol, so that what actually happens can be followed up.
Different types of linkWe differentiate between a-href-links, area-href-links, form-links, input-links und img-links. The evaluation concerns only a-href-links and area-href-links.
Immediate stopCrawler activities are stopped immediately as soon as a problem occurs. Evaluation only continues when the problem has been reprlicated and solved.