dtSearch Release Notes
dtSearch 7.53 (Build
7629 )
Wed 05/14/2008
Fixes and minor enhancements
- Fixed error formatting combined date and time cells in Lotus 1-2-3 spreadsheet
- Reduced memory use indexing XML file with very large CDATA field
- Reduced memory use indexing very large HTML file
- Added IndexCache object to the Java API
- Fixed incorrect page number shown in search report generated from PDF file
- Spider: Fixed timeout error crawling very slow web site
- dtWebSetup: Fixed error causing blank list under "Select search form type" in Search Controls tab of Build Search
Form dialog box
- Fixed error indexing PDF file with more than 32,000 pages
- Fixed out of memory error caching text from very large text file
- Added support for indexing attachments in PDF files
- Fixed incorrect hit highlighting on PDF page containing Unicode hyphen characters when hyphens indexed as letters
dtSearch 7.52 (Build 7600) Released April 8, 2008
dtSearch Engine
-
64-bit Java API
-
64-bit version of the dtSearch Engine for Linux
-
Option to index .eml files as containers (so the message body and attachments are each indexed as a separate file).
- Sample ASP.NET application implementing an OpenSearch interface to the dtSearch Engine
Fixes and minor enhancements
- dten600.dll: To improve consistency in the handling of punctuation in field names, unsearchable characters
are now removed from field names in input data, with a few exceptions (:&_+=.) to minimize the effect on backward
compatibility. In previous versions this was done in indivdual file parsers so the effect of punctuation in
field names depended on the format of the input data.
This change will not generally affect searching because only searchable letters are used when
matching field names.
This change may affect the field names associated with stored fields,
in cases where the field name contains punctuation characters.
- dten600.dll: Fixed two bugs affecting field parsing in Office 2007 documents.
- dten600.dll: Fixed incorrect document count displayed in history.ix when startingDocId set to value other than 1
- dten600.dll: Fixed XML hit highlighting bug resulting in illegal entity appearing in XML output
- dten600.dll: Fixed bug causing extra "F" character to appear in stored fields
- dten600.dll: Improved handling of low-memory conditions for searches that generate very large search results sets
- dten600.dll: Fixed error caching text in index when an external language analyzer returns overlapping blocks of text
- Added: FileConvert.exe and ListIndex.exe command-line utilities utilities
- dtSearch.Spider2.dll: Added LinkTraceFilename option to create a log of the links followed and not followed during a crawl
- Added dtsLaJob.searchRequestPunct string advising external language analyzers of the search request punctuation
characters to preserve when analyzing a search request
- dtSearchNetApi2.dll: If data source throws an exception, IndexJob will catch it and report the exception through
the Errors object
- dtSearchNetApi2.dll: Added FileInfoFlags.fiOpenFailed to indicate when a document returned from a DataSource
with the DocIsFile flag set to true cannot be opened because it is either not present or locked
dtSearch 7.51 (Build 7556) Released January 18, 2008
dtSearch Desktop
-
Search results saved as XML include the selection state of the items in the search results list (i.e., checked or unchecked). Search results
saved in other formats such as CSV can either include all items or just selected items.
dtSearch Web
-
Added 64-bit version of dtSearch Web and dtSearch Web Setup
Fixes and minor enhancements
- dtSearch.Spider2.dll: Added 64-bit version of the .NET Spider API
- dtSearchw.exe: Added status bar indicator for total hits and total number of files retrieved
- lbvprot.dll: Fixed error in standard CD type causing it to get stuck at an "Opening page..." message
- dten600.dll: Fixed file parser error causing formatting errors and extraneous text in MS Works documents
- dten600.dll: File parsers added for Lotus 123 and Quattro Pro
- dten600.dll: Fixed incorrect highlighting in search report generated from cached text when automatic
CJK word breaking enabled
- dten600.dll: Fixed bug in HTML file parser that could cause duplication of field values in stored fields extracted from HTML meta tags
- dten600.dll: Fixed HTML hit highlighting bug that caused comment tags to appear inside the <TITLE>
dtSearch 7.50b (Build 7518) Released November 25, 2007
Fixes and minor enhancements
- dten600.dll: Fixed SearchReportJob error causing more blocks of context than specified by MaxContextBlocks to be included
in the generated report.
- lbvprot.dll: Fixed error starting CGI applications
dtSearch 7.50 (Build 7517) Released November 9, 2007
Enhancements (All products)
- Improved integration with external language analyzers: (1) Language analyzers will be given much larger chunks
of text to analyze, which enables some language analyzers to operate more effectively.
(2) Language analyzers will be given consistently-sized
chunks of text whether indexing or highlighting hits, which ensures that hit highlighting will not be affected by
changes in the behavior of a language analyzer depending on the size of the data it receives.
dtSearch Desktop
- In the Edit > Copy File dialog box, added option to preserve original modification,
creation, and last access times of the original files
- In the Edit > Copy File dialog box, added option to copy the entire container file
when a matching document is inside a container (such as a ZIP file or email archive)
- Option in Options > Preferences > Indexing Options > Letters and Words to automatically
insert a word break around Chinese, Japanese, and Korean characters in text. This makes it possible
for documents that do not contain word breaks to be searched.
dtSearch Engine
- 64-bit version of the dtSearch Engine with C++ and .NET APIs.
- New dtsoTfAutoBreakCJK flag in Options.TextFlags to automatically
insert a word break around Chinese, Japanese, and Korean characters in text. This makes it possible
for documents that do not contain word breaks to be searched.
- ListIndexJob added to the Java API
- New dtsListIndexIncludeDocCount flag added to ListIndexFlags, to provide a the document count
for each word listed
Fixes and minor enhancements
- dten600.dll: Added dtsLaJob.pFileInfo to provide language analyzer with a dtsFileInfo describing the document
being processed
- dten600.dll: Added dtsLaJobInputIsFirstBlockInDocument value for dtsLaJob.flags to tell language analyzer when
a new document is starting
- dten600.dll: Fixed error formatting generated table of contents in .docx file
- dtIndexerw.exe: MAPI_E_CALL_FAILED error indexing Outlook data with Outlook 2007
- dten600.dll: Fixed crash indexing Word document with corrupt styles
- dten600.dll: Fixed PDF parsing error causing incorrect word break
- dten600.dll: Added support for non-Microsoft variant of Microsoft Searchable TIFF format
- dten600.dll: Fixed error generating search report when SearchReportJob specified neither an OutputFile nor
OutputToString (fixed in build 7517)
dtSearch 7.43 (Build 7476) Released September 16, 2007
Fixes and minor enhancements
- dten600.dll: Fixed error processing list formatting in some MS Word files caused bullets to be rendered as numbers.
- dten600.dll: Fixed errors formatting text in .docx files.
- dten600.dll: Fixed bug in PDF file parser affecting decoding of CID fonts in PDF files
- dten600.dll: Fixed error extracting item from TAR file to hit-highlight after search
- dten600.dll: Added detection of the following file types with missing or incorrect filename extensions:
Microsoft Word 2003 XML files, Microsoft Excel 2003 XML files.
- dtsjava.dll: Fixed error indexing using data source API under WebSphere
- dten600.dll: Fixed extra spacing in output when HTML converted to UTF-8 text
dtSearch 7.42 (Build 7467) Released July 31, 2007
Enhancements (All products)
- Added support for Microsoft Searchable TIFF (created by Microsoft Office Imaging), Microsoft Document Imaging (*.mdi), Windows Metafile
*.wmf) and Enhanced Metafile (*.emf) formats
Enhancements (dtSearch Engine)
- Added flags to control recognition of ambiguous dates (new TextFlags values dtsoTfRecognizeDatesPresumeDMY, dtsoTfRecognizeDatesPresumeYMD)
Fixes and minor enhancements
- dtSearchNetApi2.dll: JobErrorInfo no longer requires or supports the IDisposable interface.
- dten600.dll: Filenames longer than 1024 characters could cause "duplicate filename" errors when verifying an index.
- dtSearchw.exe: Fixed problem with mouse wheel scrolling in search results window with high-resolution mouse wheels
- dten600.dll: Improved recognition of CJK encodings in HTML and PDF
- dten600.dll: Fixed error in title attribute for documents indexed using the COM implementation of
the data source API
- dten600.dll: Changed the behavior when a container document such as a ZIP file is removed from an index using its doc id
(using IndexJob.ActionRemoveListed). Instead of just removing the container, the container and all contained items will be removed.
- dten600.dll: MS Word file parser displayed some internal field data (TC, TA, SEQ)
- dten600.dll: Added detection of .docx, .xlsx, .xps, and OpenOffice documents with
missing or incorrect filename extensions
dtSearch 7.41 (Build 7420) Released April 21, 2007
Enhancements (dtSearch Engine)
- Added support for automatically varying hit weights according to the field they occur in, through the new
SearchJob.FieldWeights setting. For more information, see the "Relevance" topic in the dtSearch Engine API Reference.
- Added improved progress reporting during unindexed searches to the C++ API (see dtsSearchProgressInfo in the dtSearch Engine API Reference)
and the .NET 2.0 API (see ISearchStatusHandler2 in the .NET 2.0 API Reference).
- Compatibility note for developers working with the .NET 2.0 API only: The
DLL dependencies for dtSearchNetApi2.dll have changed due to Visual Studio .NET
2005 Service Pack 1. Because dtSearchNetApi2.dll is built with
Service Pack 1, it requires the updated MFC and CRT DLLs that are included with
that version. Executing the vcredist_x86.exe included with Visual Studio
.NET 2005 Service Pack 1 (dated December 2, 2006 or later) will install these
components.
This issue does not affect any other dtSearch Engine API.
Fixes and minor enhancements
- dten600.dll: New fields added as properties of .eml files - CC, BCC, and Attachments (a list of the filenames of all
attachments).
- dtSearch .exe and .msi files digitally signed for better operation in Windows Vista
- dtSearchNetApi2.dll: Error in ConvertPath caused unnecessary refresh of virtual path mappings from the metabase.
- dten600.dll: Minor improvements in the binary file detection and Unicode filtering algorithm for binary files
- dten600.dll: Fixed bug in MHT file parser that caused hit highlighter to generate blank HTML page for some MHT files
- dten600.dll: IndexCache object added to the COM interface.
- dtSearchw.exe: Fixed error starting indexer and dtSearch Web Setup under Windows Vista
dtSearch 7.40 (Build 7360) Released February 22, 2007
Enhancements (All products)
- Added automatic recognition of dates, email addresses, and credit card numbers in text. For more information,
see http://www.dtsearch.com/dateRecog.html
- Added support for Vista (XMP) metadata in .jpg and .tif images.
- Added support for PowerPoint 2007 (*.pptx).
- Added support for Vista XML Paper Specification (*.xps) documents.
Enhancements (dtSearch Engine)
- Added IndexCache object in the .NET 2.0 API, and dtsIndexCache object in the C++ API, to enable much faster searching
when a series of searches must be done against a small number of indexes. The IndexCache maintains a thread-safe pool of open
indexes that are available for searching during the lifetime of the cache. Using the cache eliminates the need to open and close
the index for each search
Enhancements (dtSearch Desktop)
- Added option in Options > Preferences > Spider Options to log the links found in each page the Spider follows.
- Added option in Options > Preferences > Search Options to change the maximum number of words a search request
can match.
Fixes and minor enhancements
- dtSearch.exe: Fixed "Invalid Character" error displaying documents in report view after installing Internet Explorer 7.
- dten600.dll: When serializing stored fields to XML, add a _ in front of any stored field names that begin with a digit
so the resulting XML remains syntactically correct.
- dten600.dll: In the C++ API, the pOnIndexWordFn callback was called with encoded field information in addition to the text of the word,
and if the called function did not preserve this field information intact, field attributes could become invalid. To prevent this, in
version 7.40 the field information is removed before the callback so pOnIndexWordFn will not see or be able to affect field attributes.
- dten600.dll: Increased the maximum value for MaxWordsToRetrieve to 512k (from 256k)
- dten600.dll: When checking for available disk space in a folder, indexer did not check whether the folder
was mounted from a different physical drive.
- dtSearch Web: Added option to log BooleanConditions and FileConditions to search log.
- dtindexerw.exe: Fixed MAPI_E_UNKNOWN_FLAGS error indexing Outlook messages in Outlook 2002 (fixed in build 7360)
dtSearch 7.30 (Build 7320) Released September 30, 2006
Enhancements (All products)
- Added preliminary support for Word 2007 (*.docx) and Excel 2007 (*.xlsx) based on the current
Office 2007 beta and available documentation.
- Added support for JPG and TIFF metadata, including EXIF and IPTC fields.
- Unicode filtering file parser can handle individual documents larger than 2 Gb, and support for files
larger than 2 Gb added to the extext.exe utility
- Improved handling of partially inaccessible email files. In previous versions, if an email
had encrypted or corrupt data (for example, an encrypted attachment), the whole email was
reported as encrypted or corrupt. In this version, the readable portion of the message
is indexed and the unreadable portion is separately reported as a partially encrypted or
partially unreadable file. This change applies to Outlook messages, TNEF files, .eml files,
MBOX archives, and .msg files.
Enhancements (dtSearch Engine)
- Beta x64 (64-bit) versions of the dtSearch Indexer and dtSearch Engine (dtIndexer64.exe, dtengine64.dll, and
dtSearchNetApi2.dll. The index format and APIs (C++, COM, and .NET) are identical to the 32-bit version.
The 64-bit components are in a separate download file (dtSearch64_730.exe) with the same installation password
as the dtSearch Engine SDK.
- Added alternative PDF highlighting mechanism for client-based applications (see "Highlighting Hits in PDF files" in the API
Overviews section for details)
- Added ListIndexJob object to the .NET 2.0 API to list files, words, or fields in an index (see
dtSearchNetApi2.chm for API reference)
- Added dtsListIndexIncludeDocId flag for dtsListIndexJob and ListIndexJob to provide a quick way to
list all documents in an index and the doc id for each document
- C++ API Changes to support 64-bit file sizes in dtsInputStream (added size64 and seek64), dtsInputStreamReader,
dtsFileInfo (added size64), dtsSearchResultsItem (added size64). These changes preserve binary compatibility for
the dtSearch Engine DLL, but some C++ code may trigger new warnings when compiled because of 64-bit values returned.
- Added dtsIndexKeepExistingDocIds flag to specify that, when compressing an index, the indexer should not remap document ids,
so document ids will be unmodified in the index once compression is done.
Fixes and minor enhancements
- dtWebSetup.exe: Fixed bug causing "Build Search Form" tool to create extra button bar when overwriting an existing form
- dten600.dll: Fixed 'out of memory' error verifying very large index
- lbviewer.exe: Fixed bug causing XML to appear incorrectly when displayed using a stylesheet
- dten600.dll: PowerPoint file parser - added support for embedded OLE objects
- dten600.dll: PDF file parser detects and handles case where text in right-to-left languages (Hebrew or Arabic) is
stored backwards (left-to-right) in a PDF file, and automatically inverts the characters in the word so it will be correctly searchable
- dten600.dll: PDF file parser handles invalid PDF files created by OCR product that leaves out required /Pages and /Page tags in PDF structure
- dtSearchNetApi2.dll: JobErrorInfo object did not implement IDisposable interface, preventing deterministic release of allocated resources.
dtSearch 7.25 (Build 7285) Released June 25, 2006
Fixes and minor enhancements
- dtWebSetup.exe: Fixed bug causing dtSearch Web Setup to fail to run on some Windows 2003 Server systems
- dten600.dll: Added Fragmentation, ObsoleteCount, and IndexFlags to the COM IndexJob.GetIndexInfo() and Java
IndexJob.getIndexInfo() methods. Also, indexing dates are now reported as a date and time.
dtSearch 7.24 (Build 7245) Released June 7, 2006
Enhancements (dtSearch Engine)
- Added support for indexing and searching TNEF (Transport Neutral Encapsulation Format) files
- dtSearch.Spider .NET API has new Authentication, FormAuthentication, and ProxyInfo properties
Fixes and minor enhancements
- dten600.dll: EML parser - fixed bug indexing messages with no message body
- dten600.dll: Excel file parsing bug caused indexer to hang on corrupt Excel file
- mapitool.exe: Added workaround for MAPI error MAPI_E_NOT_ENOUGH_MEMORY when saving a
message that has a very large number of recipients (see
http://support.microsoft.com/kb/171907).
- dten600.dll: RTF parser - fixed bug parsing headers in RTF files
- dten600.dll: Improved parsing of Ole10Native streams in OLE Storage files
- C++ Source code samples reorganized into vc6, vc7, and vc8 folders
- dten600.dll: Fixed PDF hit-highlighting bug that caused highlighting to fail to appear in some documents
dtSearch 7.23 (Build 7241) Released May 8, 2006
Enhancements (dtSearch Desktop)
- Added new Indexing Resources preferences page
- Added pause button to Update Index dialog box
Enhancements (dtSearch Engine)
- Added ASP.NET 2.0 sample applications in VB.NET and C# in C:\Program Files\dtSearch Developer\examples\asp.net2
- Java API: empty() method added to SearchResults, and clear() method added to SearchReportJob and SearchJob to force deterministic release of
memory allocated for SearchResults
Fixes
- dten600.dll: PowerPoint file parsing bug caused incorrect character formatting
- dten600.dll: Word file parsing bug caused right-to-left text to be left-aligned instead of right-aligned by default.
- dten600.dll: EML parser - improved detection and parsing of malformed .eml files.
- mapitool.exe: Fixed error converting UTC date when saving .msg files to disk
- dten600.dll: Fixed crash indexing corrupt .mp3 file
- libdten600.so (Linux): Fixed error parsing zipped files with accented characters in the filename
- dtsearch.exe: Fixed error detecting Ctrl+C keyboard shortcut
- dten600.dll: Excel file parser bug caused some valid XLS files to be incorrectly reported as corrupt
dtSearch 7.22 (Build 7217) Released March 14, 2006
Enhancements (dtSearch Engine)
- .NET 2.0 API for Visual Studio .NET 2005. The .NET 2.0 API wrapper is dtSearchNetApi2.dll, and the .NET 2.0 version of the Spider API is dtSearch.Spider2.dll.
The API is identical to the .NET 1.1 API. For sample code, see the examples\cs2 and examples\vb.net2 folders.
- Added dtsSortByFullName sort flag
- Added docByteRead64, bytesRead64, bytesToIndex64 to dtsIndexProgressInfo
- Added Options.StoredFieldDelimiterChar, which provides a way to specify a delimiter between multiple instances of
a stored field in a single document
Enhancements (dtSearch Desktop)
- Added option to dtSearch to view PDF files as plain text instead of using Adobe Reader (Options > Preferences > External Viewers).
Fixes
- dten600.dll: Fixed indexer bug that caused index to be reported as having "Illegal ref ptr" error when verified
- dten600.dll: Fixed Excel file parser bugs affecting formatting of date and time values
- dten600.dll: Fixed indexing crash when indexing very large .gz files
- dten600.dll: Minor improvements to word break detection in PDF files
- dten600.dll: Fixed RTF file parser bug that could cause indexing crash on corrupt RTF file
- dten600.dll: Excel file parser defaults to 10 digits of precision for numbers without a
specified format (consistent with
Excel).
- dten600.dll: Minor improvements to Unicode filtering algorithm.
dtSearch 7.21 (Build 7164) Released January 23, 2006
Enhancements (All products)
-
IFilter support to enable
dtSearch to parse document types such as Microsoft OneNote and AutoCAD that
include IFilters.
IFilters are components that enable various Microsoft search products, such as Microsoft
Index Server, to extract text from documents. For example, when you install Microsoft
OneNote, an IFilter is installed to enable searching of *.one files. To tell dtSearch
to use installed IFilters to process some of your files, set up a rule in Options > Preferences > File Types and
under File type, select "IFilter". In dtSearch Engine applications, use the FileTypeTableFile to specify the filename
patterns to use with IFilters. The IFilter adapter only works on systems with the Microsoft component query.dll installed.
For information on products that include query.dll, see
http://support.microsoft.com/dllhelp
For more information on IFilters, see
http://www.ifilter.org/ or
http://channel9.msdn.com/wiki/default.aspx/Channel9.DesktopSearchIFilters
Fixes
- dten600.dll: Fixed bug that prevented some items in ZIP files from being displayed after a search (an "unable to access
input file" message would appear instead).
- dten600.dll: Fixed bug in file parsers for Microsoft Office documents (PowerPoint, Excel,
and Word) that could cause dtSearch to crash attempting to index corrupt documents
- dten600.dll: Fixed bug in PDF file parser that caused "Bad xref" error on some PDF files created with
PDF 1.5-only compatibility
- dtsearch.h: unnamed unions removed from the dtsMessage structure. This will not affect binary compatibility
but may require source code changes in C++ code that accessed undocumented union members. Because the removed
union members were undocumented, this change should affect very few programs.
dtSearch 7.20 (Build 7136) Released December 6, 2005
Enhancements (All products)
- New file parsers for OpenOffice documents, spreadsheets, and presentations (*.sxw, *.sxc, *.odt, *.ods, etc.),
covering OpenOffice version 1 and OpenOffice version 2 (the "Open Document Format for Office Applications")
- New file parsers for the Microsoft Office XML formats (Microsoft Word 2003 XML and Microsoft Excel 2003 XML)
Enhancements (dtSearch Desktop)
- Added "Opening containing folder" in right-click menu for retrieved items
- Improved reporting of errors that occur when copying files in Edit > Copy File(s)
- dtindexer.exe: added /caf and /cat command-line option to cache text (/cat) or cache original files (/cad), when creating indexes using
the command line, and /recog to recognize an index.
- Added Help > Check For Updates feature to automatically download new
versions
Enhancements (dtSearch Engine)
- dtSearch.Spider.dll component provides a .NET API for the dtSearch Spider. For API documentation, see dtSearchNetApi.chm.
For sample code, see C:\Program Files\dtSearch Developer\examples\cs\SpiderDemo.
- New xfilter search type, "ext", to search only on the filename extension (dot required). Examples:
xfilter(ext ".doc") matches file with a .doc extension; xfilter(ext "~.doc") matches file without a .doc extension;
xfilter(ext ".") matches file with no extension. This search feature will only work with documents that were indexed with
dtSearch 7.2 or later.
- SearchReport supports %%FirstHit%% macro in ContextHeader to indicate the word offset of the first hit in the context block
- dtsIndexCacheTextWithoutFields flag added to IndexingFlags. This flag makes it possible to cache text (for generation of
a synopsis to include in search results) without including any of the fields added using the data source API.
- dtsErAccCachedDoc flag added to ErrorCodes. This error code indicates that a document could not be extracted from
the document cache in an index (this usually means that the index was created without caching enabled)
- dtsConvertJustDetectType flag added to ConvertFlags, to have FileConverter or DFileConvertJob just detect the file format of
a document. The format is returned in FileConverter.DetectedTypeId.
- dtSearch Engine for Linux updated to the dtSearch 7.2 code base; multithreading support added
- dtsReportIncludeFileStart flag added to ReportFlags. This flag causes a block of text from the beginning of the
document to be included in the generated search report.
- A new search feature makes it possible to restrict a search to the text of documents (excluding any metadata). To search for text that is not
in any field, search for //text contains (search request). Example:
(//text contains apple) and (author contains smith)
Fixes
- DynaZip unzip component (dunzip32.dll) updated to new version that eliminates buffer overrun vulnerability in earlier versions.
- dtSearch.exe: Installing Adobe Acrobat 7.05 update caused hit highlighting to stop working in PDF files
- dten600.dll: Reduced amount of memory needed to parse very large Word, Excel, and Outlook items
- dten600.dll: Fixed file parsing error in Word documents that caused bullets to be rendered as auto-numbered lists
- dten600.dll: Changed handling of CSV files that do not have a header that lists field names; these files are now handled
as plain text, since no field information is available for them.
- Spider: Fixed: Links with // in the name (http://www.example.com//default.html) caused index to be reported as corrupt by Verify Index
- dtSearch.exe: MAPI profile id, entry id, and store id were displayed in search results list
- dtSearch Web: WebSearchForm.js incorrectly handled blank filename filter
- Spider: Did not match port number against filename filters for
port numbers other than 80
- dtSearch.exe: Displayed MAPI entry id and store id of Outlook
messages in search results.
dtSearch 7.10 (Build 7045) Released August 8, 2005
Enhancements (dtSearch Engine)
- Added two new ASP.NET samples, one in VB.NET and one in C#, that demonstrate a search interface
using a grid control for search results. The new samples are installed to C:\Program Files\dtSearch Developer\examples\asp.net.
Please see the readme file in the project folders before trying to open them in Visual Studio -- a virtual directory mapping
for C:\Program Files\dtSearch Developer\examples\asp.net has to be created first or Visual Studio will not be able to open
the project.
- GetNthWordDocCount added to WordListBuilder to get the number of documents a word occurs in
- SearchReportJob enhancements: Added ContextSeparator; itUnformattedHTML output format, for easier generation of
a synopsis; faster generation of search report when search results cover multiple indexes; dtsReportLimitContiguousContext flag
to prevent very large synopsis when there are many hits close together.
- In the OnFound callback notification in the C++ and .NET interfaces, an application can veto individual items
to prevent them from being included in search results. See SearchResultsItem.VetoThisItem (.NET) and
DSearchJob::VetoThisItem (C++ Support Classes).
- dtSearchNetApi.dll uses registry type library information and delay loading
to eliminate the need for dten600.dll to reside on the system PATH in ASP.NET
applications.
- New TextFlags option to suppress automatic generation of
xfirstword and xlastword (dtsoTfSkipXFirstAndLast)
- Options.MaxFieldNesting setting to limit the permissible depth of field nesting
- .NET API objects implement Dispose() for more deterministic release of allocated resources.
- .NET IndexJob.ExecuteInThread and SearchJob.ExecuteInThread use .NET thread
pool instead of creating a thread.
- dtSearch Engine for Linux updated to the dtSearch 7.1 code base
Enhancements (dtSearch Publish)
- "Standard" CDs (which use lbview.exe) can launch non-CGI programs from the CGI-BIN folder using a new URL syntax. See the "Standard CDs"
help topic in dtSearch_Web.chm for details.
- "Standard" CDs can highlight hits in PDF files with Adobe Reader 6 or later (formerly Adobe Reader 7 was required).
Fixes
- dten600.dll: Reduced amount of stack required to process very long xfilter expressions
- dten600.dll: Fixed index merge bug that could cause a corrupt index merging into a large index without
the "clear target" flag set
- dten600.dll: Bug caused MakePdfWebHighlightFile to return a blank string after unindexed search
- dten600.dll: The default value of Options.MatchDigitChar has changed from blank (disabled) to '=', to be consistent with the
behavior of dtSearch Desktop.
- dten600.dll: Bug caused unindexed search of HTML field defined using comment tags to find the same search term outside of the field
dtSearch 7.01 (Build 7025) Released June 14, 2005
Enhancements (dtSearch Web)
- Generated search form has more flexible stylesheet references
- File parsers generate HTML output using "em" units for font sizes instead of points, which
allows font sizes to scale up or down in Internet Explorer
Enhancements (dtSearch Publish)
- Added "Recognize CD" function to use the CD Wizard to modify a CD that was created on a different computer
Fixes
- dten600.dll: MS Word file parser caused a word break when MS Word inserted redundant font changes within a word
- lbview.exe: Error opening PDF file with URL-encoded apostrophe in filename or path
- dten600.dll: PowerPoint file parser error parsing slide without outline entry text
- dtSearchNetApi.dll: SearchResultsItem did not include modified date or type id
- dten600.dll: SearchResults did not read HitsByWord when serializing from XML
- dtSearch Publish: PDF files did not highlight hits in Adobe Reader 7 in some
systems with unpatched versions of IE components.
- dtIndexer.exe: Default setting for IndexAutoCommitIntervalMB forced
large index updates to commit too frequently, making indexing slower.
dtSearch 7.00 (Build 7008) Released May 18, 2005
Enhancements (All products)
-
High-capacity index format released, with support for over 1 terabyte of data per index.
dtSearch 7 can update and search indexes created with dtSearch 6.
To upgrade an index to the version 7 format using dtSearch Desktop,
(1) click Index > Update Index...,
(2) Check the box to "Upgrade index to version 7 format".
(3) Click "Start Indexing"
- New variable field weighting search option. Example: "(Description:5 contains (apple and pear)) or (author:2 contains smith)"
Enhancements (dtSearch Desktop)
- Added Spider option to pause between page downloads, to reduce them impact
on the server of a crawl
- PDF files open faster in Adobe Reader if Adobe Reader 7 is installed
Enhancements (dtSearch Engine)
- New API documentation. See dtSearchApiRef.chm (overviews, C++, and COM interface), dtSearchJavaApi.chm (Java interface),
and dtSearchNetApi.chm (.NET interface)
Enhancements (dtSearch Publish)
- A new file-based CD interface has been added that does not rely on HTTP.
- The CD Wizard has been simplified
Fixes
- dtSearchNetApi.dll: repeated instances of first element returned in HitsByWord array
- dten600.dll: Hidden stored fields (fields with names prefixed by **) were stored with the ** mark in front of the
field name, causing serialized XML search results to have incorrect XML syntax
- dten600.dll: Check for available disk space did not handle volume sizes larger than 2 Terabytes
- mapitool.exe: several minor bug fixes
- dten600.dll: MS Word file parser did not handle FORMTEXT fields
- dtSearch.exe: fixed 800a025e error from Internet Explorer when dtSearch
tries to select hidden text to highlight from a Word document
- dten600.dll: Fixed PDF file parser error counting words in pages with
annotations, causing incorrect highlighting
- dten600.dll: Fixed hit highlighting error in MIME files
- dtv_odbc.dll: Fixed bug handling Unicode data in Access fields
- dten600.dll: Added current option settings in effect to history.ix entry for each update
- dtisapi6.dll: Urls not truncated using MaxUrlSize
dtSearch 6.5 (Build 6608) Released January 18, 2005
Enhancements (All products)
- Improved file parsers for Microsoft Office file formats (Word, Excel, PowerPoint, Outlook MSG).
- New filtering option to apply text filtering to automatically recover text from corrupt documents
- Beta support for new
high-capacity index format.
Enhancements (dtSearch Desktop)
- Faster indexing of Outlook message stores. Indexing speed is substantially faster, especially for incremental updates.
For compatibility, existing Outlook indexes will continue to use the previous indexer, so the improvements
will only apply to new indexes.
- Edit > Copy File can copy retrieved Outlook messages as .msg files (formerly they were converted to HTML).
This only works for Outlook messages indexed using the new Outlook indexer (see above).
- mapitool.exe command-line utility to convert PST files or other Outlook-accessible message stores
to MSG files. See mapitool.html for documentation.
Enhancements (dtSearch Engine)
- Support for hidden stored fields. Hidden stored fields are returned in search results like stored fields, but
are not displayed as part of the document and are not searchable. Only fields returned through the data source API
in DocFields (.NET/COM) or dtsInputStream.fields (C++) can be hidden stored fields. To designate a field as hidden,
insert ** in front of the field name.
- Note: dtv_ms.dll, an external file parser included in prior versions, is now compiled into dten600.dll.
dtSearch 6.4 (Build 6482) Released September 4, 2004
Enhancements (All products)
- Improved indexing speed
- Support for indexing .tar, .gz, and .tgz archives
- Support for indexing metadata (Author, Title, etc.) in MP3, ASF, and WMV files
- Improved file parser for RTF files.
- XML parser improvements in handling incorrect XML input, such as mismatched tags and > and < characters
in field text
Enhancements (dtSearch Web)
- Web search form remembers users' search criteria from prior searches
- Many improvements to search form generation: more controls including date range control,
simpler search forms, automatic generation of field searching controls;
stylesheet-based search form and search results formatting; Form Builder remembers search form option
settings from previously-built form
- dtSearch Web Setup has new options to upgrade all dtSearch Web installations at once, and to
remove a dtSearch Web installation
- dtSearch Web option to encrypt index paths in search forms
Enhancements (dtSearch Desktop)
- Added support for forms-based authentication in dtSearch Spider
- Improved traversal of JavaScript-generated links in dtSearch Spider
- Option to suppress indexing of HTML tags in XML data
Enhancements (dtSearch Engine)
- Native .NET Interface. For API documentation, see help\dtSearchNetApi.chm.
- Added to COM interface: Verify and Merge indexes, and IndexJob.IsThreadDoneWait.
- Added to Java interface:
Verify
and
Merge
indexes.
- Added dtsIndexProgress.fullname
- Added new ASP sample application demonstrating display of synopsis in search results
- LZW decompression of PDF files is now always enabled, since the Unisys patent has expired
Fixes
- Error in decryption of encrypted PDF files with SecurityModel = 2
- dtsearchw.exe: Error opening PDF file on systems with network configuration errors
- dtisapi6.dll: Error sorting search results by user-defined field when combined with with numerical flags
- dtisapi6.dll: Fixed bug causing incorrect XML to be returned from query
- dtSearchw.exe: PDF highlighting URL used localhost instead of 127.0.0.1, causing unnecessary DNS access
- dten600.dll: Fixed hyphenation bug that could cause an index to be reported as corrupt when documents are
indexed with the dtsoHyphenAll hyphenation option
- dten600.dll: Fixed two bugs in .msg file parser
- dtindexer.exe: Fixed blank index name and path in "index created" message when using "Create Index - Advanced"
- dten600.dll: Fixed error indexing MBOX archive containing corrupt message data
dtSearch 6.33 (Build 6430) Released April 21, 2004
Fixes
- Outlook indexing stops working after installing Microsoft Office XP Service Pack 3
- dtSearch Web bug fixes
dtSearch 6.32 (Build 6429) Released March 5, 2004
Enhancements (All products)
- Improved relevancy ranking using positional scoring, which ranks documents higher
when hits occur near the top of the file or are clustered within a document. In dtSearch Desktop
and dtSearch Web, positional scoring is applied automatically when automatic term weighting
is selected. In dtSearch Engine applications, use the new dtsSearchPositionalScoring search flag
to enable positional scoring.
Enhancements (dtSearch Spider)
- Added option to allow Spider to crawl across multiple servers from a single starting URL
- Added option to limit maximum size of items Spider can download from a site
- Added option to limit number of files Spider can index on a web site
- Added option to limit number of minutes Spider can spend indexing a single web site
- Spider supports the "robots" meta tag
Enhancements (dtSearch Engine)
- Added SearchResults.setSortKey to Java API
- Large search results sets stored more efficiently, substantially reducing memory use
- Added TextFlags option dtsoTfSkipNumericValues. By default, dtSearch indexes numbers both as
text and as numeric values, which is necessary for numeric range searching. Use this flag
to suppress indexing of numeric values in applications that do not require
numeric range searching. This setting can reduce the size of the index by about 20%.
- Added dtsoFfSkipFilenameFieldPath FieldFlag, to allow indexing of the filename as a field
without the whole path
Enhancements (dtSearch Publish)
- Added option to specify the browser to launch
- Added options to specify format of folder listings
Fixes
- dten600.dll: Error processing user-defined synonym containing only punctuation
- dtindexer.exe: In some situations, Spider would not immediately stop downloading a long file when cancel button
pressed during indexing
- dten600.dll: Enhanced error detection in index merge, and merge events logged to history.ix file
- dten600.dll: Merge Indexes bug could cause merge job to terminate with incorrect "index corrupt" message
- dtsearchw.exe: Unnecessary dependency on secur32.dll on some Windows NT 4.0 systems
dtSearch 6.31 (Build 6393) Released November 24, 2003
Enhancements (All products)
- Added automatic detection of MBOX-format email archives
Fixes
- dtIndexer: The Windows TEMP folder was used instead of the Spider temporary folders setting to
determine the location of temporary files during indexing of a web site
- dten600.dll: Fixed bug in index merge function that could cause the target index to be reported as corrupt after a merge
- dten600.dll: Fixed word breaking error in PDF file parser
- dten600.dll: Fixed Spider error handling URL containing & HTML entity
- dten600.dll: Fixed PDF parsing error that caused trademark symbol ™ to appear incorrectly in document properties fields
dtSearch 6.30 (Build 6386) Released November 11, 2003
Enhancements (All products)
- Added pre/N connector, which is like W/N but requires that the first expression occur before the second
Enhancements (dtSearch Desktop)
- Added option in Options > Preferences > Filtering Options to control the minimum size of a text segment
- Added option in Options > Preferences > Search Results to remember the sort order from the previous search.
For example, if you click the Filename column to sort search results by filename, after the next search results
will be displayed sorted by filename. The remembered sort order overrides whatever was selected in the
Search dialog box.
Enhancements (dtSearch Web / dtSearch Publish)
- dtSearch Web search speed has been increased substantially.
- dtSearch Publish CD access program automatically detects when the user's browser window closes
Enhancements (dtSearch Engine)
- Language Analyzer API for integration of third-party language tools (such as a Japanese or Chinese dictionar-based
word breaker, or an Arabic morphological analyzer). For more information, see the "Language Analyzer API" topic
in the dtSearch Engine help file, dtengine.chm
Fixes
- dten600.dll: Text Fields limited to first N lines were applied through entire file for WordPerfect files
- dtSearch: Fixed bug affecting internal links in HTML files opened after a
search
- dtIndexer: Bug in index update scheduler prevented a scheduled task from being created if
another task had previously been created, and deleted, with the same name
- dten600.dll: Search report generated using exact words of context contained an incorrect character at end of a context block
if the last word in the context block was followed by a Unicode punctuation mark.
- dtSearch: After installing Adobe Reader or Acrobat 6, PDF files opened very slowly in dtSearch Desktop.
- dten600.dll: In MIME-encoded emails with multipart/alternative encoding (message included in both HTML and text),
message text appeared twice.
- dten600.dll: Error indexing files with names longer than 260 characters
dtSearch 6.21 (Build 6345) Released August 25, 2003
Enhancements (All products)
- Support for indexing Treepad HJT files.
- Support for indexing NTFS document summary information fields
- Improved indexing of Outlook .MSG files
Enhancements (dtSearch Desktop)
- Added option to automatically open Adobe Reader in the background before
opening PDF files. (This makes PDF files open more quickly in dtSearch,
especially with Adobe Reader 6.)
Enhancements (dtSearch Engine)
- A SearchFilter can be constructed from the results of a SearchJob. See SearchJob.WantResultsAsFilter in the dtSearch Engine help file
for more information.
- Documents can be removed from an index by DocId rather than by filename. See "Building and Maintaining Indexes"
in the dtSearch Engine help file for more information.
- dtsoFfShowNtfsProperties flag (in Options.FieldFlags) to enable indexing of NTFS document summary information fields.
- dtsConvertXmlToXml flag enables FileConverter to highlight hits in XML documents with XML output. For more information, see
"Highlighting hits in XML" in the dtSearch Engine help file, tech support
article dts0183, and
this demo page.
Fixes
- dtSearch: TAB key to switch between search results and document window did not work
- dtIndexer: The "Update Multiple Indexes" dialog box could add web site content from an index to subsequently-updated indexes in the same session
- dtSearch: Fixed: Edit > Copy File with "Preserve folder names" did not correctly handle Outlook message with / in the message subject
- dten600.dll: Fixed bug in index merge function that could cause the target index to be reported as corrupt after a merge
- dten600.dll: ContextHeader omitted in first block of context in SearchReport, if report generated using paragraphs
of context
- dten600.dll: Duplicate items in index when a list of files to index is passed to the indexer with duplicate
items in the list of files
- dten600.dll: Fixed file parsing errors affecting DBF (FPT memos), PPT (headers and footers), and XLS ("Invalid
packed Unicode sequence" error indexing an XLS file).
dtSearch 6.20 (Build 6320) Released May 6, 2003
Enhancements (dtSearch Desktop)
- Support for indexing Outlook 2003 messages. (Note: This is based on
the Office 2003 Beta 2. It is possible that Microsoft may make additional
changes to Office 2003 when it is released that will require more changes
to the dtSearch Outlook indexer.)
- New search results list
- The search results list can contain a brief synopsis for each item showing the first couple of hits
and a few words of context around each hit. To enable this feature, click Options > Preferences > Search results,
and check the box labelled "First hits in context".
- Click the <-> mark in the upper left corner of search results to automatically
size columns to fit their contents. Click it a second time to automatically size columns to fit
in the search results window. (Click Options > Preferences > Search results to have
search results automatically resized in either way.)
- Right-click the <-> mark for quick access to search results format settings
- Drag and drop column headers to change the order in which search results items appear
- Items in search results can be dragged to Explorer (to move the files) or to
email programs (to send the files as an attachment).
- (To make dtSearch use the search results list from prior versions, run dtsearch.exe or dtsearchw.exe with the /lv command-line switch.)
- New text filtering option for indexing recovered forensic data. See the Options > Preferences > Filtering options dialog box
and the "Filtering options" help topic in the dtSearch help file for more information
- New option in the Options > Preferences > File Types dialog box to
require that a set of files be indexed as HTML or plain text, even if they
appear to have a different format.
- Enhanced filename filters for use inside ZIP archives. See the "Filename Filters" topic in the dtSearch help file for more information.
- Press Ctrl+Shift+UP to enlarge the text font, or Ctrl+Shift+DOWN to reduce the text font.
- Edit > Copy File handles items in container files (such as ZIP archives or databases) better,
copying only the items retrieved from the search rather than the entire container. For example, if a search
retrieves sample.doc inside c:\archive.zip, then "Copy File" will extract and copy sample.doc rather than copying
the whole archive.zip file.
Enhancements (dtSearch Web / dtSearch Publish)
- New ResultsTableItem macros: %%PhraseCount%% (number of hits in a document,
counting each phrase as a single hit) and %%HitsByWord%% (list of words or phrases
matched in a document, with the number of hits on each).
- New settings for dtsearch_options.html: HttpProxy and SERVER_NAME. See dtSearch_Web.chm for more information.
Enhancements (dtSearch Engine)
- Added ZIP file parser to Linux version
- Added Java JNI API to Linux version
- ExText text extracting algorithm, for indexing recovered forensic data,
integrated into the dtSearch Engine as the "Filtered Binary" file parser.
See "Filtering options" in the dtSearch Desktop help file for information on how
this filtering algorithm works. A new value for the Options.binaryFiles
flag, dtsoFilterBinaryUnicode (4), enables this parser.
- FileConverter and dtsFileConvertJob2 have new typeId property that can be used to
specify the file parser to be used with the input.
- Serialized search results include search flags and fuzziness
- Added dtsoFfXmlHideFieldNames field flag to suppress indexing of field names
in XML files.
- Data source API (C++): Added pFileInfo member to dtsDataSource to provide
information on the last file indexed. See "dtsDataSource" in dtengine.chm
for more information.
- Data source API (C++): Added typeId to dtsInputStream to provide a way to specify the
file parser that should be used for an input file. See "dtsInputStream" in dtengine.chm
for more information.
- Data source API (VB/ASP): Added DocId, DocWordCount, and DocTypeId
properties of DataSourceToIndex to provide information the last file indexed.
See "Indexing ActiveX Data Sources" in dtengine.chm for more information.
- Data source API (Java): Added getDocId, getDocWordCount, and getDocTypeId
methods in new DataSource2 class to provide information the last file indexed.
See the DataSource2 topic in the JavaDoc documentation for more information, and
see the Java dsource sample application for sample code.
- Data source API (Java): Added getDocBytes() to provide a way to return
documents in a memory buffer
- Java FileConverter and SearchReportJob objects: Added setDocBytes() for a calling program to
provide an input document in a memory buffer
- Java API: Added SearchResults.getDocDetailItems(), returning the document properties as a java.util.Map
- Java API: Added SearchFilter object, SearchResults.serializeItemAsXml
- phraseCount, reporting the number of hits matched in a request with each phrase
counted as a single hit, is computed if dtsSearchWantHitsByWord search flag is set.
This value can be accessed in the C++ API as dtsSearchResultsItem.phraseCount, and
in the COM and Java interfaces as DocDetailItem("_phraseCount").
- Added dtsSearchJob.maxFilesToRetrieve2 to provide a 32-bit version of this limit,
and made the Java and COM maxFilesToRetrieve properties 32-bit.
- dtsSearchFilter (C++): Added getIndexCount, getIndexPath, and getItems
methods to extract information from a search filter.
- SearchFilter (Java): Added getIndexCount, getIndexPath, and getItems
methods to extract information from a search filter.
Fixes
- dten600.dll: Improved PDF file parser handling of XObjects and corrupt
but partially-readable data streams
- Spider did not cache HTTP sessions, causing problems with forms-based
authentication
- dtSearch Web: "Next 25" paging did not work correctly when search results
were sorted by a custom field
- Spider reported modified date for downloaded web pages based on
the "Date" header rather than the "Last-Modified" header
- xfilter filename searches not including a \ could still match
against the full path of a document
- dten600.dll: Check for "index full" condition uses improved method for
estimating amount of space left to complete current update
- dten600.dll: A script bracked by <% and %> was incorrectly parsed if the
script contained the <> comparison operator.
- Forix.exe: Forensic indexer did not apply exclude filters inside ZIP files, so the entire contents
of a ZIP would be indexed without regard to any exclusions in the filters.
- dten600.dll: In an accent-insensitive index, combining diacriticals (U+0300-U+0362) were not removed
- dten600.dll: If a PDF file has a title attribute that is a path and filename, the path from the title
attribute was reported in search results as the location
dtSearch 6.11 (Build 6276) Released December 24, 2002
Enhancements (dtSearch Desktop)
- Edit > Copy File can download web pages indexed by the spider and extract email messages,
attachments, and other Outlook items, and copy them to the target folder
- Edit > Copy File provides detailed logging of file copy errors
- Added option to unconditionally suppress password prompts in the Spider
- New = wildcard character matches any single digit. Example: X=== would match X123 but not Xabc
Enhancements (dtSearch Web / dtSearch Publish)
- New %%Synopsis%% search results macro can be used to add a brief hits-in-context
display to search results. See "The Options File" in dtSearch_Web.chm for details.
- Search results can be sorted by user-defined fields, and search forms can specify any
combination of search flags to use when sorting.
Enhancements (dtSearch Engine)
- Added SearchReportJob.MaxContextBlocks, which provides a way to limit a search
report to the first N blocks of context in a document
- Added SearchReportJob.MaxWordsToRead, which provides a way to limit a search report
to the first N words in each document
provide two ways to limit a search report
- Added SearchFilter.GetItemArray and SearchFilter.GetItemArrayVBS, which provide a way to get
an array containing doc ids selected in a SearchFilter
- Added SearchResults.AddDoc, which can add a specific document to a search results list by doc id
- Added C# sample web searching application
- Added dtsSortCleanText sort flags to remove leading punctuation, white space, "Re:", "Fwd:", and "Fw:"
- Added two new sample applications in C#: ADO.NET, demonstrating database indexing
using ADO.NET, and asp_search, demonstrating a search function for a web site.
Fixes
- dten600.dll: External file parsers would not load from a dtSearch "Home" directory whose name
contained accented letters
- mfcdemo sample application: two files were missing from the setup program
- dtindexer.exe: Spider did not identify user agent as "dtSearchSpider"
- dtindexer.exe: Intermittent crash indexing very large files stored in ZIP
archives
- dtindexer.exe: Fixed several minor problems with RTF file parser (extra ;
characters in text, incorrect handling of \emdash and \endash directives, \info
properties)
- dten600.dll: MBOX file parser did not recognize Mozilla email archives
- dten600.dll: Fixed hit highlighting error in boolean field search with nested or/and condition (a or (b and c))
- dtindexer.exe: Fixed bug in index library manager that could cause indexer to crash at startup
- dten600.dll: Word wrapping in file conversion incorrectly handled words longer than 256 characters
- dten600.dll: Auto-detection of Unicode text files did not work with text files shorter than 20 characters
- dten600.dll: Items contained in ZIP archives indexed using the data source API did not inherit field attributes
of the containing ZIP.
dtSearch 6.1 (Build 6260) Released November 3, 2002
Enhancements (All products)
- PDF indexing improvements: (1) Added support for indexing PDF files with security passwords (40-bit only);
(2) better detection of word breaks; (3) indexing of annotations and form fields
- Improved formatting of Eudora messages
- Faster and more robust indexing of dBASE and FoxPro databases (ODBC is no longer used or needed for indexing these files)
- MBox message archive support. (Because MBox archives cannot be automatically detected, use the
File Types dialog box to specify which files should be indexed as MBox archives.)
Enhancements (dtSearch Web)
- Added highlighting of hits on documents indexed using the spider (for example, third-party web sites)
- Added "Next 25" and "Prev 25" links to search results for paging through long search results lists
- Search form included in search results is initialized with previous search request
Enhancements (dtSearch Desktop)
- "All words" and "Any words" search options
- Option to retrieve most recent (vs. most relevant) documents
- Spider can index https (SSL) web sites.
- Spider can index password-protected web sites. Click Options > Preferences > Spider Options to set up passwords
and security settings.
- Spider uses WinHttp 5.1 library under Windows XP and Windows 2000 SP3. This library provides improved handling of
authentication and generally better performance (in our testing). WinHttp 5.1 is included with
Windows XP and Windows 2000 SP 3 and is not available for other platforms.
- "Create Group Policy" dialog box added for automatic deployment of dtSearch and shared indexes across a network. See
the "Automatic deployment of dtSearch on a network" topic in the dtSearch Desktop help file for more information.
- Outlook indexer can index appointments and journal items, in addition to messages, tasks, contacts, and notes
- Option to change the size of the dialog box font (Options > Preferences > Dialog box font size)
- Preferences dialog box reorganized
- Edit Noise Words dialog box
Enhancements (dtSearch Engine)
- Added dtsSortFloatNumeric sort flag
- New HttpSearch sample application demonstrates client-server searching with dtSearch Web on the server and
the CHttpSearchJob C++ class used on the client to perform a search.
- SearchFilter object -- added SelectItemsBySearch method to allow a search
filter to be set up as the results of one or more searches.
- pOnIndexWordFn added to the dtsIndexJob in the C++ API, to allow an
indexing program to modify text as it is being indexed (to customize character
handling or add alternative forms of a word to the index). See the
dtsOnIndexWordInfo topic in dtengine.chm for more information.
Fixes
- dten600.dll: Default name for alphabet file changed from "ENGLISH.ABC" to "DEFAULT.ABC", since it no longer
has language-specific data
- dtindexer.exe: Merge indexes did not combine the list of folder selections for the merged indexes
- dtSearch.exe: Cursor keys sometimes did not scroll document in HTML viewer
- dten600.dll: Search report included an extra line of context before the hit
- dten600.dll: HTML hit highlighter highlighted an extra word if the search word ended in
- dten600.dll: PDF file parser incorrectly handled case where same font resource had different names in PDF file
- Setup program changes: (1) dtSearch Desktop and dtSearch Developer (Web/Engine/Publish) made into two separately
installable products; (2) uses Windows Installer Service 2.0; (3) improved reboot handling under Windows 9x;
(4) new "Repair" setup option to fix a damaged installation.
- dtSearch.exe: "fields" button in Search dialog box did not position the cursor correctly after fields
were inserted into a search request
- dtindexer.exe: Spider reports when a web page cannot be accessed due to an authentication error
or server error.
- dten600.dll: If an index path contains an accented letter, the noise word list was not used and all words
were indexed.
dtSearch 6.07 (Build 6205) Released June 17, 2002
Enhancements (All products)
- Faster indexing for large indexes (typically about 20% faster)
- Added support for UCS-16 encoded HTML and XML (little-endian and big-endian)
- Blocks of HTML can be excluded from indexing using <!--BeginNoIndex--> and <!--EndNoIndex--> tags. (The comment
tags must appear exactly they appear here, with no spaces or other variations.)
- Improved formatting in display of Excel spreadsheets
Enhancements (dtSearch Desktop)
- Option to automatically use "Report" view for very long text files
- Search reports can include documents indexed with the spider (http: references) and Outlook messages
- Edit|Copy file list to copy a list of filenames from search results to the clipboard
- Indexing Options: new option to "Index HTML scripts, styles, links, and comments" (causes these items to become visible and searchable in dtSearch)
- Improved status reporting in the Forensic Indexer (forix.exe).
- "View File" can be used to open a saved search results list
Enhancements (dtSearch Engine)
- dtsfclib.lib: updated to work with Visual Studio .NET. See "C++ Support Classes" in dtengine.chm
- C++ Support Classes can optionally be placed in a "dtSearch" namespace by declaring the USE_DTSEARCH_NAMESPACE macro
- SearchFilter object - added Read and Write functions to save to/read from disk files, ReadMultiple to
read and combine multiple filters, and AND and OR operations to logically combine two filters
- Added percentFull property in dtsIndexInfo
- An alternative search syntax, the "All Words"/"Any Words" syntax, can be used in searches using the dtsSearchTypeAllWords
and dtsSearchTypeAnyWords search flags. This new search syntax supports use of quotation marks to indicate phrases and
+ and - to indicate required and excluded words. See "dtsSearchTypeAllWords and dtsSearchTypeAnyWords" topic in dtengine.chm.
- andany search connector added to allow optional words to be added to a search request.
(see "Search Requests" help topic in dtengine.chm)
- IndexJob object -- added GetIndexInfo method to access to index properties in the COM interface
- IndexJob object -- MaxMemToUseMB property provides a way to limit the amount of memory used during indexing
- For forensic applications, new FieldFlags options to allow indexing of hidden HTML content: dtsoFfHtmlShowLinks, dtsoFfHtmlShowImgSrc,
dtsoFfHtmlShowComments, dtsoFfHtmlShowScripts, dtsoFfHtmlShowStylesheets, dtsoFfHtmlShowMetatags.
Enhancements (dtSearch Web)
- MaxWordsToRetrieve setting added to dtsearch_options.html, providing a way to limit server resources consumed by
general searches like "a* or b* c*"
- BooleanConnectors setting added to dtsearch_options.html, so that the boolean connectors can be customized on web forms
- FieldFlags setting added to dtsearch_options.html
- OriginalSearchForm CGI variable added to search forms as backup to prevent broken link to options file
- dtSearch Web Setup can add a "Search Type" option to search forms, allowing the user to select between
"All words", "Any words", "Exact phrase", and "Boolean" search types
Enhancements (dtSearch Publish)
- CD Wizard can automatically generate a CD using either Apache or Microweb
Enhancements (dtSearch Engine for Linux)
- dtSearch Engine for Linux (libdtsearch.so.6.0) updated to dtSearch 6.07 code base (build 6200) and built with GCC 3.1.
Fixes
- dten600.dll: VB data source API made the creation date the same as the modified date
- dten600.dll: Memory leak in DSearchReportJob class (affected COM and Java APIs)
- dtSearch.exe: if document opened in Quick View Plus embedded in browser, could not then open another document as HTML
- dtSearch Web: Top and First Hit buttons did not work with Netscape 6.2
- dten600.dll: Search reports output as Ansi text could contain character 0x1e, causing truncation of reports in applications that treat this character as EOF
- dtSearch Publish: Fixed errors displaying non-English characters in startup CD dialog box
- dten600.dll: Fixed error parsing fields identified by HTML comments
- dten600.dll: Fixed error in IndexGetDocInfo C++ API function that caused it to fail to recognize a valid open index
- dten600.dll: xfilter Filename searches for names containing the stemming char (~) did not work
- dten600.dll: Improved logic for handling noise words in phrase searches, when the noise word in the search request matches both noise words and words that are not noise words
- dten600.dll: Fixed bug in ActiveX Data Source API in build 6204
dtSearch 6.06 (Build 6173) Released March 15, 2002
File Format Enhancements (all products)
- Automatic detection of encoding type for plain text and HTML files without Content-Type META tags, based on language of contents.
Supported languages/encodings: Arabic, French, German, Greek, Hebrew, Italian, Russian (KOI_8R and CP1251), Spanish,
Central European (CP1250), and UTF-8.
- .CSV support enhancements: Automatic detection of delimiter (tab, semicolon, or comma) and encoding (Unicode, UTF-8, or Ansi)
- Filename-only indexing of non-text formats: asf, ani, avi, mpg, mov, jpg, tif, gif, png, cab, ix, chm, ttf, pst, wav, mp3, bmp, wmf, emf.
(These formats are detected by checking the file contents, not just the extension, so a Word document named "something.jpg" will still
be indexed as a Word document.)
- Added support for: .MHT archives (single-file web pages saved by Internet Explorer)
- Added support for: .EML files (emails saved by Outlook Express)
Enhancements (dtSearch Desktop)
- Default character encoding option setting in the Options > Preferences File Types dialog box, provides a way to override
automatic encoding detection for file types that do not contain encoding information
- extext.exe text extractor utility for extracting text from large binary files (click Start|Programs|dtSearch|Tools|Extext)
- forix.exe forensic indexer utility, for creating indexes from very large volumes of data (click Start|Programs|dtSearch|Tools|Forix)
- Optional limit on the maximum amount of text to display from a file (in Options, Preferences, Other Options)
Enhancements (dtSearch Engine)
- WordsOfContextExact search report property (dtsReportByWordExact flag in C++ API) allows search report to contain exactly
the requested number of lines of context around each hit
- Fields returned through the data source API can be indexed without the field name being searchable. To suppress indexing
the field name, add * in front of the name
- C++ Support Classes are included in static libraries (in addition to the source code previously included) for easier use in Visual C++ projects
- C++ data source API supports error reporting by the caller's data source
- SetSortKey() method allows sorting search results on custom values
- dtsSearchAutoTermWeight search flag allows automatic term weighting (as in natural language searches) for boolean query
- Search filters provide a way to pre-select the documents that are eligible to be returned in a search. This feature
can be useful when a database search and a dtSearch text search must be combined. See the SearchFilter (VB) or dtsSearchFilter (C++)
topic in the dtSearch Engine help file for details.
- dtSearch Web (dtisapi6.dll) logging search requests and document access. For more information, see the "Generated Files" topic in dtSearch_Web.chm.
Enhancements (dtSearch Engine for Linux)
- dtSearch Engine for Linux (libdtsearch.so.6.0) updated to dtSearch 6.05
code base, and PDF indexing support added.
Fixes
- dten600.dll: "Stored" fields limited to 512 characters
- dtindexer.exe: Failed to convert Outlook message dates to UTC when indexing, so message time shown in search results was off by a few hours
- dtsjava.dll: Java JNI wrapper reported file dates through search results using 1-12 for the month, instead of 0-11, so the reported month was off by one.
- dtimage.exe: View as Image did not check for external viewer settings
- dtimage.exe: Draw errors updating image when page changes with multiple .TIF files
- dtindexer.exe: Failed to initialize Outlook indexing if user has more than one profile, and Outlook is set to prompt
for the profile to use at startup
- dtindexer.exe: Incorrectly handled indexing Outlook message folders that
contain / or \ in the folder name
- dtsearch.exe: Displayed internal Outlook name (a long number) for some
messages, rather than the subject.
- dten600.dll: "<![endif]-->" displayed incorrectly in some MS Word
documents saved as HTML
- dten600.dll: In plain-text documents that contain exactly one line of text, the title attribute was doubled
dtSearch 6.05 (Build 6146) Released November 29, 2001
Enhancements (dtSearch Desktop)
- Option setting, in Options, Preferences, Other Options (formerly "Index Defaults"), to change the fonts in the Search
dialog box
- Improved and simpler Index Library Manager. See the "Index Library Manager" topic in the dtSearch help file for
more information.
- More responsive switching between files when a long file is being opened
- Outlook Express 5/6 (*.dbx) message store indexing (including attachments)
- Unindexed Search for List of Words
Enhancements (dtSearch Engine)
- Outlook Express 5/6 (*.dbx) message store indexing (including attachments)
- dtsSearchWantHitsArray flag, with dtsSearchWantHitsByWord, provides word offsets of phrases
Enhancements (dtSearch Publish)
- cdrun.exe can check for required software and install dependencies as needed or display an error page.
See the "Software Dependencies" topic in dtSearch_Web.chm for details.
Fixes
- dten600.dll: dtsSearchWantHitDetails report field names incorrect in multi-index search
- dtimage.exe: Print problems (incorrect scaling to fit page, only printed one page at a time) when printing image files
- dtSearch.exe: Search for List of Words, "Boolean" option changed to "One boolean expression per line"; improved error reporting
dtSearch 6.04 (Build 6126) Released October 15, 2001
Enhancements (dtSearch Desktop)
- New "Whole file" search report option
- MaxWordsToRetrieve default value increased to 64k, and registry entry added to allow for customization
- Added Index Library Manager option to suppress check for index libraries in UserData and dtSearch program folders
- Optional checkboxes in search results list (use Options|Preferences|Search Results Format to enable them) for easier selection of multiple items
- Edit|Select all and Edit|Clear selections commands for easier selection of items in search results
- "Search for List of Words" search type in the Search menu, for searching using the contents of a text file as the search request.
Results of the search can be displayed in dtSearch Desktop or can be written to a text file.
- Help|About dialog box shows configuration details
- Outlook indexing problems created by the Microsoft Outlook Security Update resolved.
- Outlook contact details (phone, fax, company, title, etc.) indexed
- Outlook attachments indexed
- Select Outlook Folder dialog box allows for automatic selection of subfolders
- File Segmentation Rules enhanced to support headers on segments (so HTML and XML files can be segmented)
- Scheduled Index Updates can update more than one index in a single task
- Hits are centered in the dtSearch window instead of appearing at the end
Enhancements (dtSearch Publish)
- dtSearch Publish, included in the dtSearch Engine download, provides tools
and a setup wizard for running dtSearch Web from a CD.
Enhancements (dtSearch Engine)
- dtsearch_paging.asp sample added to demonstrate display of search results in pages
- dtsearch.asp sample uses improved JavaScript for hit navigation
- New properties in Options object in the COM and Java JNI interface: macroChar, stemmingChar, fuzzyChar, synonymChar, phonicChar, weightChar
- New properties in FileConverter object in the Java JNI interface: baseHref, alphabetLocation
Enhancements (dtSearch Web)
- Added AutoStopLimit setting in Form Builder
- Added MaxUrlSize setting in _options.html file to limit the maximum size of a URL created by dtSearch Web in search results.
- Relevance score automatically reported in dtSearch Web searches (this change only affects search forms created with the new dtSearch Web Setup)
Fixes
- dtSearch.exe: Error message "Failed to update the system registry. Please try using REGEDIT." when dtSearch is launched, due to MFC bug described in Q254957
- dtSearch Web: Incorrect hit highlighting with hyphens set to "searchable"
- dtsearch.asp: Fixed encoding problems in sample code for display of Unicode text
- setup: installer modified to work around bug in Windows XP's implementation of msiexec.exe
- dten600.dll: Character encoding error in MS Word 95 document
- dtv_odbc.dll: Extra blank lines displaying database records
- dtsviewr.h: dtsInputSource and dtsInputSourceReader renamed to dtsInputStream and dtsInputStreamReader (with a #define to preserve backward-compatability)
- dten600.dll: PDF file parser error reading xref table from PDF 1.0 document
- dten600.dll: Unindexed search did not delete file list temp file if searching data source or nonexistent files
- dtSearch.exe, dtSearch Web: Sorting search results by filename did not check, for PDF/HTML files, whether the filename or the Title was currently displayed as the filename.
As a result, the list was always sorted by the title for these file types.
- dten600.dll: On a sort by Location, items within a particular location were sorted in ascending order by score, rather than descending order
- dten600.dll: Fixed error handling <style> tags in HTML
- dten600.dll: Fixed error handling <src> tags with blank quoted ALT attribute in HTML
- dtsearch.exe: Index Create dialog box did not browse for UNC folders correctly
- dten600.dll: Error searching xfilter expression for string containing ":"
- dten600.dll: Incorrect handling of " expression in HTML and XML tag attributes
- dten600.dll: IndexJob object in the COM interface -- StatusBytesToIndexKB value was not set
- dtWebSetup.exe: Recovers partial virtual path information when metabase access functions throw an exception
- dtWebSetup.exe: Wrote closing title tag of search form incorrectly
- dtSearch.exe: Better trapping of JavaScript errors when navigating hits
- dtSearch.exe: Fixed problems with space, backspace, enter, and tab hotkeys when reviewing search results
- dten600.dll: Fixed error in WordPerfect file parser character tables for processing some Arabic and Greek characters
- dtSearch Web: JavaScript for "Next Hit" button did not find last hit in a document
- dten600.dll: Report generated by dtsSearchWantHitDetails flag had hits in descending instead of ascending order
dtSearch 6.03 (Build 6079) Released June 1, 2001
Enhancements (dtSearch Desktop)
- dtSearch.exe can run from a shared network folder (see the help file for details)
- Much faster loading of long documents
- Loading of long documents can be cancelled
- Option setting not to automatically open the first document
- Hit highlighting shows which hit is currently selected
- Edit|Copy File added
- New UTF-8 file type in File Types dialog box
- User-defined fields can be included in search reports and search listings
- New dtsrun.exe launcher to select dtsearch.exe or dtsearchw.exe automatically
Enhancements (dtSearch Engine)
- New CD Wizard for setting up dtSearch Web to run from a CD. [This new option for CD publishing
will not be released with version 6.03 due to the need for more beta testing. It is included in
pre-release form for testing and evaluation purposes only.]
- Added outputStringMaxLen value and dtsListTabDelimit flag for dtsListIndexJob
- Added to Java API: urlEncodeItem, urlDecodeItem, serializeToXml, serializeFromXml,
getHitByteOffsets, getHitBytePageParaOffsets
- Added dtsUSearchAccentSensitive flag to make unindexed searches accent sensitive (default changed to accent insensitive)
Enhancements (dtSearch Web)
- Improved hit navigation JavaScript
- Next/Prev Hit buttons provide better information messages when PDF files or files without hits marked are displayed
Fixes
- dtwebsetup.exe: Did not display default search feature selections correctly in Search Form Builder
- scriptrun.exe: Did not check for dtSearch Desktop registry settings
- dtsearch.exe: pressing Ctrl-Y with no search results open would cause a crash
- dten600.dll: WordPerfect character tables incorrectly encoded Unicode values for some Greek letters
- dten600.dll: Bug in dtssMapHitsInFile caused word counting to be off due to error reading alphabet
- dtindexer.exe: Clicking "Delete Task" in Schedule Updates deletes the task, even if you answer No to the question
- dtsearch.exe: Hit navigation did not work with the Internet Explorer 6 beta
- dtsearch.exe: Detect and handle Adobe Reader 5's "Certified Plug-ins Only" option setting
- dten600.dll: More efficient memory use when indexing very long files
- dten600.dll: When truncating words longer than the maximum word length, count Unicode characters rather than UTF-8
characters
- dten600.dll: Some Hebrew vowel marks not classified as letters in Unicode tables
- dtisapi6.dll: Error handling blank fields on search form
- dten600.dll: Crash indexing damaged Excel spreadsheet embedded in Word document
- dten600.dll: Crash compressing an index
- dtwebsetup.exe: JavaScript called by NextDoc crashed Adobe Reader
- dten600.dll: Null character in damaged PDF file's xref table crashed PDF file
- dten600.dll: Excel file parser incorrectly handled "extended strings" stored by Far East version of Microsoft Word
dtSearch 6.02b (Build 6055) Released April 9, 2001
- dtsetup.exe: fixed "Wrong OS Version" message
dtSearch 6.02 (Build 6055) Released April 2, 2001
- dtsearch.exe: Unindexed search crashes attempting to display the "Index" column in search results.
- dtsearch.exe: SPACE and BACKSPACE do not work in HtmlHelp when launched from dtSearch. The reason for this is that
dtSearch hotkeys are still active when help is launched, and in dtSearch SPACE = Next Hit and BACKSPACE = prev hit.
- dtsearch.exe: Search dialog box, under Windows 98, does not move the cursor to the end of the search request when the user clicks AND/OR/etc. buttons
- dtsearch.exe: Sort by file type does not do anything
- dtsetup.exe: Installing under Whistler (Windows XP) beta causes an MSI error.
- dtsearch.exe: After launching a file to view it in search results, the dtSearch window moves in front of the launched program
- dtsearch.exe: Pressing Ctrl-M ("View as Image") with no search results displayed crashes dtSearch
- dtsearch.exe: occasionally displayed plain text documents right-justified
- dtsearch.exe: intermittent crash switching back to dtSearch window if Search dialog box not initialized yet
- dtsearch.exe: very intermittent right-justification of plain-text documents in viewer window
- dtsearch.exe: Select All button in Search dialog box did not work
- dtsearch.exe: Problems displaying field names in Browse Words dialog box
- dtsearch.exe: "Include word counts" checkbox in List Index dialog box did not work
- dtsearch.exe: search dialog box warns when unindexed searching is left on, and when file filters are left over from a previous search under More Search Options. The warnings can be suppressed with a checkbox, and there is a new Options|Enable warnings... command to enable all optional warnings.
- dtsearch.exe: HTML error caused extra underlining in some documents when highlighting hits
- dtsearch.exe: Change to booleanConnectors setting did not work if boolean connectors string was too long
- dtsearch.exe: "Search in a new window" did not do anything under Windows 9x
- dtindexer.exe: warns when the "Clear index before adding documents" box is checked and the index already has data (like 5.25 did -- people have been asking why dtSearch fully rebuilds their indexes each time)
- dten600.dll: Word documents created by a Japanese version of Word 95 were not recognized
- dten600.dll: HtmlTitle field sometimes not generated for contents of the HTML <TITLE>
- dten600.dll: crash indexing a corrupt WordPerfect document
- dten600.dll: SearchReportJob sometimes put a line break between the beforeHit mark and the first letter of the hit word
- dten600.dll: ComFileConverter failed if InputFile left blank.
dtSearch 6.01 (Build 6048) Released March 14, 2001