Products — Spider — More Info

How the dtSearch Spider Works

To index a Web site, select "Add web" in the dialog box below.

Enter the name of the Web site. Then select the crawl depth. The crawl depth is the number of levels into the web site dtSearch will reach when looking for pages. You could spider a Web site to a crawl depth of 1 to reach only pages on the site linked directly to the home page. Or you could enter a crawl depth of 4 to reach four levels deep into the site.

Display

After a search, dtSearch Spider will display retrieved HTML or PDF files with hit highlighting, and all links and images intact. The result looks and acts just like the original Web page, but with highlighted hits and additional navigation options ("next hit," "previous document," "next documents," etc.).

HTML file retrieved by dtSearch Spider

dtSearch uses built-in HTML file converters to convert other text formats, such as word processor and spreadsheet, to HTML for display with highlighted hits. See Fields for special XML search options.

Online Demo

For a Spider demo operating through dtSearch Web, click here. (The www.dtsearch.com spidered site is hosted on a completely different hosting system and physical location from the site that is running the Search Site demo.)

Technical Note

The dtSearch Spider does not "capture" an indexed Web sites. To display a file indexed with the dtSearch Spider, dtSearch will return to the Web site to access the document.

In addition to searching publicly available Web site, the Spider also supports indexing and searching of secure content HTTPS sites and password-accessible sites.

For information on searching ASP, please see this FAQ article: How to use dtSearch Web with dynamically-generated content.

 
The dtSearch product line can instantly search terabytes of text across a desktop, network, Internet or Intranet site.
dtSearch products also serve as tools for publishing, with instant text searching, large document collections to Web sites or CD/DVDs.
over two dozen indexed, unindexed, fielded and full-text search options
highlights hits in HTML, XML and PDF, while displaying embedded links, formatting and images
converts other file types — word processor, database, spreadsheet, email and full-text of email attachments, ZIP, Unicode, etc. — to HTML for display with highlighted hits
built-in Spider adds a third-party or other Web site (public, secure content, password accessible, etc.) to your searchable database
Spider supports Web-based content (HTML, PDF, XML, etc.) as well as dynamically-generated content (ASP.NET, MS CMS, SharePoint, etc.)
General supported file types
SQL and similar data sources