Using StartsWith to Filter on RowKeys
There are many scenarios where filtering on partial RowKeys makes sense. One of these scenarios is Azure Diagnostics Log analysis where events are partitioned by time based PartitionKeys and by compound RowKeys. This allows us to filter and find information effectively.
Event RowKeys are composed of deployment IDs, role names, instance names, categories and other information:
8637d014bcf94452a1e48f393a11674b___Brisebois.WorkerRole___Brisebois.WorkerRole_IN_0___0000000001652031520___WADLogsLocalQuery
Querying WADLogsTable Effectively
The following example, shows how to target a specific table partition and filter events based on a StartsWith pattern.
var storageAccount = Microsoft.WindowsAzure.Storage.CloudStorageAccount.Parse(connectionString); var client = storageAccount.CreateCloudTableClient(); var table = client.GetTableReference("WADLogsTable"); // Querying Windows Azure Diagnostics by Partition for a partial RowKey var query = new FindWithinPartitionStartsWithByRowKey("0635204061600000000", "8637d014bcf94452a"); var result = query.Execute(table);
The FindWithinPartitionStartsWithByRowKey query returned many events from my pet project ‘This day on Azure‘. As we can observe, the RowKey structure allows us to partially filter from left to right. This example shows data from a single Cloud Service. But there are times when many Cloud Service uses the same Azure Storage Account to persist Azure Diagnostics.
{Microsoft.WindowsAzure.Storage.Table.DynamicTableEntity} ETag: "W/\"datetime'2013-11-18T21%3A17%3A38.0626707Z'\"" PartitionKey: "0635204061600000000" Properties: Count = 9 RowKey: "8637d014bcf94452a1e48f393a11674b___Brisebois.WorkerRole___Brisebois.WorkerRole_IN_0___0000000001652031489___WADLogsLocalQuery" Timestamp: {2013-11-18 9:17:38 PM +00:00} {Microsoft.WindowsAzure.Storage.Table.DynamicTableEntity} ETag: "W/\"datetime'2013-11-18T21%3A17%3A38.0626707Z'\"" PartitionKey: "0635204061600000000" Properties: Count = 9 RowKey: "8637d014bcf94452a1e48f393a11674b___Brisebois.WorkerRole___Brisebois.WorkerRole_IN_0___0000000001652031490___WADLogsLocalQuery" Timestamp: {2013-11-18 9:17:38 PM +00:00}
Filtering Within a Partition
In order to produce a StartsWith search pattern with Azure Table Storage queries, one must query based on RowKey (this also works for PartitionKeys) and set the lower bound greater or equal to the StartsWith pattern. Then we must increment the last character from the lower bound string in order to produce the upper bound of the query. This will produce the desired StartsWith capability that we all have come to cherish.
public class FindWithinPartitionStartsWithByRowKey : IQuery<CloudTable, List<DynamicTableEntity>> { private readonly string partitionKey; private readonly string startsWithPattern; public FindWithinPartitionStartsWithByRowKey(string partitionKey, string startsWithPattern) { if (partitionKey == null) throw new ArgumentNullException("partitionKey"); if (startsWithPattern == null) throw new ArgumentNullException("startsWithPattern"); this.partitionKey = partitionKey; this.startsWithPattern = startsWithPattern; } public List<DynamicTableEntity> Execute(CloudTable model) { var query = new TableQuery(); var length = startsWithPattern.Length - 1; var lastChar = startsWithPattern[length]; var nextLastChar = (char) (lastChar + 1); var startsWithEndPattern = startsWithPattern.Substring(0, length) + nextLastChar; var prefixCondition = TableQuery.CombineFilters( TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.GreaterThanOrEqual, startsWithPattern), TableOperators.And, TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.LessThan, startsWithEndPattern) ); var filterString = TableQuery.CombineFilters( TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partitionKey), TableOperators.And, prefixCondition ); var entities = model.ExecuteQuery(query.Where(filterString)); return entities.ToList(); } }
Filtering Within a Partition Range
This query is essentially the same as the previous query, but it allows us to extend our search to multiple partitions. Remember that smaller data sets produce faster results.
public class FindWithinPartitionRangeStartsWithByRowKey : IQuery<CloudTable, List<DynamicTableEntity>> { private readonly string partitionLowerBound; private readonly string partitionUpperBound; private readonly string startsWithPattern; public FindWithinPartitionRangeStartsWithByRowKey(string partitionLowerBound, string partitionUpperBound, string startsWithPattern) { if (partitionLowerBound == null) throw new ArgumentNullException("partitionLowerBound"); if (partitionUpperBound == null) throw new ArgumentNullException("partitionUpperBound"); if (startsWithPattern == null) throw new ArgumentNullException("startsWithPattern"); this.partitionLowerBound = partitionLowerBound; this.partitionUpperBound = partitionUpperBound; this.startsWithPattern = startsWithPattern; } public List<DynamicTableEntity> Execute(CloudTable model) { var query = new TableQuery(); var length = startsWithPattern.Length - 1; var lastChar = startsWithPattern[length]; var nextLastChar = (char) (lastChar + 1); var startsWithEndPattern = startsWithPattern.Substring(0, length) + nextLastChar; var prefixCondition = TableQuery.CombineFilters( TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.GreaterThanOrEqual, startsWithPattern), TableOperators.And, TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.LessThan, startsWithEndPattern) ); var partitionFilterString = TableQuery.CombineFilters( TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.GreaterThanOrEqual, partitionLowerBound), TableOperators.And, TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.LessThanOrEqual, partitionUpperBound) ); var filterString = TableQuery.CombineFilters( partitionFilterString, TableOperators.And, prefixCondition ); var entities = model.ExecuteQuery(query.Where(filterString)); return entities.ToList(); } }
Reusable Queries
In 2012, I wrote about building reusable and testable queries. This query object interface has been tested on several of my projects and has proven to clean up solutions. It provides a way to name questions so that we can discover them by using tools like R# (ReSharper). It also has the added benefit of promoting code reuse and strongly encourages self documenting code.
New developers can browse the solution and discover the application’s capabilities by reading file names. Each name describes a question that is used throughout the application.
I have used this technique on Cloud Services, Web APIs and even on Clients. And every time I put this into practice, the code got cleaner and easier to work with.
A query isn’t just SQL or LINQ, it’s a question that is asked about or to something. In this specific case, we are asking questions to Azure Table Storage, but this same pattern can be applied to business domains, databases and plain old objects.
public interface IQuery<in TModel, out TResult> { TResult Execute(TModel model); }
Filed under: Microsoft Azure, Table Storage Tagged: Azure Storage, Best Practices, C#, Microsoft Azure, Partition, PartitionKey, Range, RowKey, StartsWith, Table Storage
