Products
Features
copyright notice
and terms of use
Search Features — International Languages

 
Unicode Support
Unicode support allows for indexing and searching of non-English text, including every character set supported by the Unicode standard.
In addition to Unicode support, dtSearch offers extensive alphabet customization options.
See Unicode FAQ for more technical information.
For a general Unicode overview, see Unicode and Text Retrieval white paper.
Language-Neutral Search Options
The following search options work automatically on text in any language: fuzzy (adjustable from 0 to 10); natural language with automatic relevancy-ranking; variable term weighting; phrase; boolean (and/or/not); proximity and directed proximity; wildcard; macro; numeric range; and fielded data (alone or combined with full-text searching).
See also forensics search options.
Language Extension Packs
The dtSearch product line includes an English noise word list and stemming rules (to find words such as learn, learned, learns, learning, etc. that are linguistically related).
dtSearch's UK distributor offers pre-packaged sets of noise word lists and stemming rules covering a wide variety of European languages. Language Extension Packs
The Western European group includes (in addition to English):  Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Spanish and Swedish.
The Eastern European group includes: Belarusian, Bosnian, Bulgarian, Croatian, Czech, Estonian, Greek, Hungarian, Latvian, Lithuanian, Polish, Romanian, Russian, Serbian, Slovak, Slovenian, Turkish, Ukrainian. Cyrillic article
Licensing: dtSearch Corp. can add either the Western European group or the Eastern European group onto a signed dtSearch developer license.  Please Contact dtSearch for details.  Both packages may also be licensed directly from www.dtsearch.co.uk.
More information on the Language Extension Packs
Request a trial version
Visit distributor's site in English, Français, Deutsch
Chinese, Japanese and Korean Text With No Word Breaks
Some Chinese, Japanese, and Korean text does not include word breaks. Instead, the text appears as lines of characters with no spaces between the words.
Because there are no spaces separating the words on each line, dtSearch sees each line of text as a single long word.
To make this type of text searchable, enable automatic insertion of word breaks around Chinese, Japanese, and Korean characters, so each character will be treated as single word.
dtSearch Desktop/Network: In Options > Preferences > Letters and Words, check the box to “Insert word breaks between Chinese, Japanese, and Korean characters in text.”
dtSearch Developer API: set dtsoTfAutoBreakCJK in Options.TextFlags.
Language Analyzer API Integration
In addition to the extensive alphabet customization options available across the dtSearch product line, the dtSearch Engine also includes a Language Analyzer API that can be used to integrate morphological analyzers and custom or dictionary-based word breakers into the dtSearch Engine indexing process.
The dtSearch Engine also includes an API for substituting a non-English language thesaurus for the existing English-language one.
Basis Technology's Rosette® Linguistics Platform Integration
The Rosette Linguistics Platform helps unlock the meaning of unstructured text by determining the language, and identifying the basic linguistic features and structure. Relying on code that is unique to each particular language, Rosette results in highly accurate Chinese, Japanese, Korean, and other international language morphological analysis.
The Rosette Linguistics Platform integrates with dtSearch search functionality through the dtSearch Engine’s Language Analyzer API.  Essentially, the dtSearch Engine API passes blocks of Unicode text to the Rosette Linguistics Platform and accepts back words to index.
For more details on how the two products work together, including a chart detailing the different steps involved in the dtSearch Engine and Rosette API integration, please see dtSearch and Rosette Full-Featured International Search PDF white paper.
 
 
Instantly Search Terabytes of Text
dtSearch document filters support a broad range of data
Supports MS Office through current versions (Word, Excel, PowerPoint, Access), OpenOffice, ZIP, HTML, XML/XSL, PDF and more
Supports Exchange, Outlook, Thunderbird and other popular email types, including nested and ZIP attachments
Spider supports public and secure, static and dynamic (ASP.NET, SharePoint, CMS, PHP, etc.) web data
APIs for SQL-type data, including BLOB data
Highlights hits in all supported data types
25+ full-text and fielded data search options
Federated searching
Special forensics search options
Advanced data classification objects
APIs for C++, Java and .NET through current versions
64-bit and 32-bit Win / Linux APIs; .NET Spider API
Document filters also available for separate licensing
 
 
dtSearch dtSearch Maze