Introduction
Using dtSearch
and the techniques in this article will make your data searches
lightning fast, making it possible to search terabytes of data with
sub-second response time.
But first, two preliminary notes about this blog post. (1) The blog
post describes source code data, but the same approach would apply to
other data stored in the Microsoft Azure cloud: HTML, XML, MS Office
documents -- even email data. (2) While the data in this blog post
resides in the Microsoft Azure cloud, the indexes are on a local PC. A
subsequent article will address data and indexes in the cloud.
Here is a workplan of our overall project:

Overall Workplan
In part one of this article we are going to go to the Azure portal
and provision the storage account. Naturally, the assumption is that you
have signed up for an Azure account. If you have not, it's relatively
easy to sign up for a free trial, so you can see if it meets your needs
before you commit your money.
Once you provision your storage account, access keys will be
automatically generated. These access keys will be copied into our
Visual Studio project, because they are the secret keys that give
privileged access to your storage account, the place where we’re going
to copy the source code to be indexed and later searched.
Part two of this article will show you where we can get the Visual
Studio solution with the starter code. This solution will dramatically
reduce the amount of work we actually have to do to implement this
useful source code searching application. If you install the full
edition of the dtSearch Engine, the starter project actually gets
installed in your program files folder.
We will be using Visual Studio 2013 with the latest updates. We will also install the latest Azure Storage SDK binaries.
It's in part three where the real work starts. What we want to do
here is build the capability to upload your source code into your
storage account. There are various utilities that you can download to
perform the task of uploading source code to your storage account, but
it will be far more convenient if we can build this into our main
searching application. Once we finish this retrofit and upgrade, we can
then run the application to upload the source code, index it, and then
move to part four of our work plan.
Part four will be fast and easy because we will be pretty much done
with the difficult work. Part four is about testing and packaging our
application. The index files that get generated could be copied to other
client computers. That means we can copy the application along with the
generated index files to any computer to perform lightning fast source
code searches.
Part 1 - Provisioning at the Azure portal
Provisioning the storage account is actually quite simple. At the
time of this writing the traditional Azure portal is the place to go.
But after the first week of May 2015, Microsoft will release the new
portal.
Once you log into the Azure portal, it's a simple matter of navigating to the STORAGE menu item and clicking NEW.

Provisioning a storage account at the Azure portal
A QUICK CREATE menu item will become visible. Click on that to continue.
At this point you are ready to provide the URL, location, and the
replication mode. The URL you come up with needs to be globally unique.
As you can see "mysourcecode" was not taken. I chose "East US" for my
location, but you can choose from among the world’s data centers. A
closer data center means lower latency. You can read about replication
options in the Azure documentation.
When you are done, click CREATE STORAGE ACCOUNT in the lower right
corner. It should take less than five minutes to provision your storage
account. It took less than a minute for me when I did it.

Creating your storage account
When the portal indicates that your storage account is ONLINE, you
are ready to move forward. Click on the small arrow that's pointing
right to drill into the details of this newly provisioned storage
account.

The provisioned storage account
You are now ready to copy access keys to the clipboard. Click on MANAGE ACCESS KEYS.

Copying the Access Keys
Click on the icon of the red box to copy the PRIMARY ACCESS KEY into
your clipboard and store it in a safe place along with the STORAGE
ACCOUNT NAME. Both your STORAGE ACCOUNT NAME and your PRIMARY ACCESS KEY
will be different from what you see here.

Copying the Storage Account Name and the Primary Access Key
Storage Account Name | mysourcecode |
Primary Access Key | CnQ6dUXdOQ81qSCFJhscuB3PCNM92o4bIuDoKG7mO
7tJ1imxa5sMkzKtnghsG11EwKgxRaTW5g6fFKRcXZ8z6g== |
Part 2 - Locating the starter project
The starter project that ships with the dtSearch Engine can be found under the program files folder here:
- C:\Program Files (x86)\dtSearch Developer\examples\cs4\AzureBlobDemo\AzureBlobDemo.sln
The starter project provides an excellent starting point for us to
begin our work. Be sure you are using Visual Studio 2013 with all the
latest updates installed.
The project should open up seamlessly, but we want to be sure we have
the latest Azure Storage binaries installed. We will right-click in
Visual Studio's Solution Explorer and select Manage NuGet Packages.

Adding NuGet Packages
In the upper right search box, type in "Azure Storage." As you would
expect, this brings up the Windows Azure Storage client library, which
we are going to use to read and write from and to the Windows Azure
Storage account that we will provision momentarily.

Updating the project with latest Azure Storage SDK
In Visual Studio Solution Explorer you can expand the references node
to validate that we have the storage client libraries installed.

Validating the Azure Storage Client Libraries
Part 3 - Adding the storage account connection information to app.config
Now is a good time to copy the storage account information into your app.config file. The app.config
file provides a convenient location that is globally accessible to your
application. It will be accessed at run time. It is not appropriate to
ask users to continually provide the connection information every time
they use the application.
Modifying App.Config
="1.0"
<configuration>
<startup>
<supportedRuntime
version = "v4.0"
sku = ".NETFramework,Version=v4.0"/>
</startup>
<appSettings>
<add
key = "StorageAccountName"
value = "mysourcecode"/>
<add
key = "AccessKey"
value = "CnQ6dUXdOQ81qSCFJhscuB3PCNM92o4bIuDoKG7mO7tJ1imxa5sMkzKtnghsG11EwKgxRaTW5g6fFKRcXZ8z6g=="/>
</appSettings>
</configuration>
Options for encryption
If you would like to encrypt this information, there are several options here:
Adding support to upload source code to your Azure Storage Account
Our next task is to enhance the starter project to enable source code
uploads. Adding this capability directly into the application will
dramatically improve usability. In this section, we will add a command
button and then write some code.
Here's what the application looks like before our changes. This is MainForm.cs.

Before Adding a button to MainForm.cs
We will now add a third button as seen below. The name of the button is cmdAddCode and the caption reads (Text Property) Add source code to Azure Storage. You will need to move the index and search buttons down a little bit to make room for this new third button.
From the designer, click on the Add source code to Azure storage button to retrieve the code.

After Adding a button to MainForm.cs
We will now add some code that will provide the ability to upload source code.

Adding Code-Behind
Repeat the steps from an earlier step to ADD A REFERENCE. The reference we will add is System.Configuration
. Be sure you have the check box inside the red box checked before clicking OK.

Adding a reference to System.configuration
Be sure that the top of MainForm.cs has the following new statements in place.

The necessary using statements
Modifying MainForm.cs
private void cmdAddCode_Click(object sender, EventArgs e)
{
string windowTitle = this.Text;
try
{
string selectedFolder = null;
FolderBrowserDialog fDialog = new FolderBrowserDialog();
if (fDialog.ShowDialog() == DialogResult.OK)
{
selectedFolder = fDialog.SelectedPath.ToString();
}
string storageAccountName = ConfigurationManager.AppSettings["StorageAccountName"];
string accessKey = ConfigurationManager.AppSettings["AccessKey"];
string connString = string.Format("DefaultEndpointsProtocol=https;AccountName={0};AccountKey={1}",
storageAccountName, accessKey);
var storageAccount = CloudStorageAccount.Parse(connString);
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
List<FileInfo> filesToUpload = new List<FileInfo>();
RecursiveFileUpload(selectedFolder, filesToUpload, "*.*");
var fileUploadParallelism = new ParallelOptions() {MaxDegreeOfParallelism = 4};
string blobContainerName = "code";
blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference(blobContainerName);
container.CreateIfNotExists();
Parallel.ForEach(filesToUpload, fileUploadParallelism, currentFileInfo =>
{
string cloudFileNamePath = currentFileInfo.FullName.Replace(@"\", @"_");
cloudFileNamePath = cloudFileNamePath.Length == 0 ? "" : cloudFileNamePath;
if (cloudFileNamePath.Length > 0)
{
if (cloudFileNamePath.Substring(0, 1).Equals("/"))
{
cloudFileNamePath = cloudFileNamePath.Substring(1);
}
}
try
{
var blobFileToUpload = container.GetBlockBlobReference(cloudFileNamePath);
ShowTitle("Uploading..." + currentFileInfo.Name);
if (!blobFileToUpload.Exists())
{
blobFileToUpload.OpenWrite(null, null, null);
blobFileToUpload.UploadFromFile(currentFileInfo.FullName, FileMode.Open, null, null, null);
}
}
catch (Exception exception)
{
MessageBox.Show("Issue with blob upload = " + exception.Message.ToString());
}
}
);
}
catch (Exception ex)
{
throw;
}
finally
{
this.Text = windowTitle;
}
}
delegate void StringParameterDelegate(string value);
public void ShowTitle(string value)
{
if (InvokeRequired)
{
BeginInvoke(new StringParameterDelegate(ShowTitle), new object[] { value });
return;
}
this.Text = value;
}
private List<FileInfo> RecursiveFileUpload(string sourceDir, List<FileInfo> filesToCopy, string search_type)
{
DirectoryInfo sDirInfo = null;
FileInfo sFileInfo = null;
if (!(sourceDir.EndsWith(Path.DirectorySeparatorChar.ToString())))
{
sourceDir += Path.DirectorySeparatorChar;
}
try
{
foreach (string sDir in Directory.GetDirectories(sourceDir))
{
sDirInfo = new DirectoryInfo(sDir);
RecursiveFileUpload(sDir, filesToCopy, search_type);
sDirInfo = null;
}
}
catch (Exception ex)
{
MessageBox.Show("Issue with RecursiveFileUpload " + ex.Message.ToString());
}
try
{
string[] theFiles = Directory.GetFiles(sourceDir);
foreach (string sFile in theFiles)
{
if (sFile.Length >= 1024)
continue;
sFileInfo = new FileInfo(sFile);
try
{
filesToCopy.Add(sFileInfo);
}
catch (System.IO.IOException ex)
{
MessageBox.Show("Skipping " + sDirInfo.FullName + " because of " + ex.Message.ToString());
}
sFileInfo = null;
}
}
catch (System.UnauthorizedAccessException ex)
{
MessageBox.Show("Skipping " + sourceDir + " because of " + ex.Message.ToString());
}
catch (System.Exception ex)
{
MessageBox.Show("Skipping " + sourceDir + " because of " + ex.Message.ToString());
}
return filesToCopy;
}
Some of the code needs updating in the Rewind()
method of the BLOBDATASOURCE.CS file.
public override bool Rewind()
{
if (_isStorageFailed)
return false;
try
{
var storageAccount = CloudStorageAccount.Parse(_connectionString);
_blobClient = storageAccount.CreateCloudBlobClient();
_blobTable = new Dictionary<string, List<string>>();
foreach (CloudBlobContainer container in _blobClient.ListContainers())
{
string containerName = container.Name;
List<string> blobURIs = new List<string>();
var blobs = container.ListBlobs();
foreach (var blobItem in blobs)
{
blobURIs.Add(blobItem.Uri.ToString());
}
_blobTable.Add(containerName, blobURIs);
}
if (!ResetIterators())
{
_isStorageFailed = true;
return false;
}
_isStorageFailed = false;
return true;
}
catch (Exception ex)
{
_isStorageFailed = true;
return false;
}
}
We have made some modifications to AskConnectForm.cs.
This will always retrieve the connection string so that the user
doesn't have to type it in continually. Ideally, we could write some
code to completely bypass the AskConnectForm form, but I'm trying to
avoid too many modifications to keep this post straightforward.
public AskConnectForm()
{
InitializeComponent();
string storageAccountName = ConfigurationManager.AppSettings["StorageAccountName"];
string accessKey = ConfigurationManager.AppSettings["AccessKey"];
string connString = string.Format("DefaultEndpointsProtocol=https;AccountName={0};AccountKey={1}",
storageAccountName, accessKey);
this.ConnectString.Text = connString;
}
Part 4 - Testing
We are now ready to start testing the application that we just
updated. One thing that might be of interest is to verify that we
correctly updated our storage account with the source code. I ran the
application once and uploaded source code to Azure Storage, as seen in
the picture below.
You can download the Azure Storage Explorer for free at the following URL:
https://archive.codeplex.com/?p=azurestorageexplorer
Once you've installed and configured Azure Storage Explorer, you can
go and browse the containers for whatever source code you may have
previously uploaded. It also allows you to delete the content should you
want to do so.

Tools like Storage Explorer
Although we are adding source code, you can pretty much add any file,
whether those are Word documents or PowerPoint. dtSearch will
automatically index many different types of documents.
By the way, the previous code performs the upload asynchronously, and
the developer can control the level of concurrency depending on network
and system resources.
See the code snippet:
var fileUploadParallelism = new ParallelOptions() {MaxDegreeOfParallelism = 4};
Click the highlighted button to add source code up to your Azure Storage account.

Adding code
You can repeat this process of selecting a folder that contains the
source code you wish to upload. All the files in the folder (and
sub-folders) you pick will also be used to populate Windows Azure
Storage with source code.

Selecting a folder that contains source code
When the index is created it will need a location to store the index files.
Enter a valid location and then hit the Index button.

Entering information about the index and creating the index
We already entered the necessary code above to populate this dialog
box with the appropriate connection string. You can just hit OK on this
dialog box.

Entering the connection string and clicking OK
You will click on two buttons in this dialog box. The first button is Index an Azure storage account. The second button is Search.

Performing the indexing operation and then clicking Search

Entering the search term and hitting Search
Our work is complete. You are now able to get lightning quick results
searching your keywords up against your Azure Storage account.
If you decide to add more source code to the Azure Storage account, you will need to regenerate the indexes.

Viewing the search results
Conclusion
You can now search literally terabytes of source code and get instant
search results. One of the core advantages here is that you don't have
to store all the source code locally on your own laptop or desktop
computer. All the source code can be securely stored up in your Azure
Storage account, available only to those that have the access keys.
More on dtSearch