Search Features— International Languages

Related article:  Unicode and Text Retrieval

Unicode Support
Unicode support allows for indexing and searching of non-English text, including every character set supported by the Unicode standard.
In addition to Unicode support, dtSearch offers extensive alphabet customization options.
See Unicode FAQ for more technical information.
Language-Neutral Search Options
The following search options work automatically on text in any language: fuzzy (adjustable from 0 to 10); natural language with automatic relevancy-ranking; variable term weighting; phrase; boolean (and/or/not); proximity and directed proximity; wildcard; macro; numeric range; and fielded data (alone or combined with full-text searching).
See also forensics search options.
Language Extension Packs
The dtSearch product line includes an English noise word list and stemming rules (to find words such as learn, learned, learns, learning, etc. that are linguistically related).
dtSearch's UK distributor offers pre-packaged sets of noise word lists and stemming rules covering a wide variety of European languages. Language Extension Packs
The Western European group includes (in addition to English):  Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Spanish and Swedish.
The Eastern European group includes: Belarusian, Bulgarian, Czech, Estonian, Greek, Hungarian, Latvian, Lithuanian, Polish, Russian, Slovak, Slovenian, Turkish and Ukrainian. Cyrillic article
Licensing: dtSearch Corp. can add either the Western European group or the Eastern European group onto a signed dtSearch developer license.  Please Contact dtSearch for details.  Both packages may also be licensed directly from www.dtsearch.co.uk.
More information on the Language Extension Packs
Request a trial version
Visit distributor's site in English, Français, Deutsch
Chinese, Japanese and Korean Text With No Word Breaks
Some Chinese, Japanese, and Korean text does not include word breaks. Instead, the text appears as lines of characters with no spaces between the words.
Because there are no spaces separating the words on each line, dtSearch sees each line of text as a single long word.
To make this type of text searchable, enable automatic insertion of word breaks around Chinese, Japanese, and Korean characters, so each character will be treated as single word.
dtSearch Desktop/Network: In Options > Preferences > Letters and Words, check the box to “Insert word breaks between Chinese, Japanese, and Korean characters in text.”
dtSearch Developer API: set dtsoTfAutoBreakCJK in Options.TextFlags.
Language Analyzer API Integration
The dtSearch Engine includes a language analyzer API that can be used to integrate morphological analyzers and custom or dictionary-based word breakers into the dtSearch Engine indexing process.
The dtSearch Engine offers integration with Basis Technology's Rosette Linguistics Platform for enhanced Chinese, Japanese and Korean text retrieval.
The dtSearch Engine also includes an API for substituting a non-English language thesaurus for the existing English-language one.
Bitext Information / Información
Request information about dtSearch products in Spanish.
Request information about DataSuite (versiones para español, inglés y catalán).
DataLexica information, covering morphological support for Spanish, English, and Catalan.
 
 
The dtSearch product line can instantly search terabytes of text across a desktop, network, Internet or Intranet site.
dtSearch products also serve as tools for publishing, with instant text searching, large document collections to Web sites or CD/DVDs.
over two dozen indexed, unindexed, fielded and full-text search options
highlights hits in HTML, XML and PDF, while displaying embedded links, formatting and images
converts other file types — word processor, database, spreadsheet, email and full-text of email attachments, ZIP, Unicode, etc. — to HTML for display with highlighted hits
built-in Spider adds a third-party or other Web site (public, secure content, password accessible, etc.) to your searchable database
Spider supports Web-based content (HTML, PDF, XML, etc.) as well as dynamically-generated content (ASP.NET, MS CMS, SharePoint, etc.)
General supported file types
SQL and similar data sources