Unique Index on URL's in SQL

Recently, while working on Findsi, I created a database design that included a table that basically listed all url's in the system. What seemed like a simple table turned out pretty hard to create. I specifically required my table to hold a unique list of url's, but did not want to use the url itself as the primary key. So i Created something like this: 

  • Id : guid : newid(); < PK
  • Url : varchar(max); < DESIRED UNIQUE INDEX 

Obviously setting the primary key to the Id column was not the problem here. Creating a Unique Index on the Url caused an error stating that only 900 bytes were allowed in each index, what meant either storing only url's shorter than 900 bytes - not an option as i do not control the input of the system - or finding another solution. 

After some serious Googling i concluded two things. First, i was definately not the only one having this issue and secondly, people were not really willing to comment on the subject.  All proposed solutions consisted of in someway hashing the url, and setting the constraint on a column containing the resulting hash. At first i believed this would solve my problem, but typically hash algorythms use the first n bytes for their encryption. Thus, using an 128-bit MD5 hash on two urls with a length longer than 128 (actually 126 i believe, but i'm not sure) chars, having these 1st 128 chars equally filled, would result in the same hash. 

Eventually I got some unexpected help from a University Professor who specializes in Information Retrieval. He assureld me I was not the only one facing this issue and suggested I created two hashes (in code) for each url. One for the first n bytes, the other for the last n bytes. The risk of colissions, using the combined hash result, so he predicted, was extremely low (though not impossible). 

  • StartHash + EndHash: char(256); < PK
  • Url : varchar(max);  

So far, it has worked like a charm for me!

10,9, 8, 7, 6, 5, 4, 3, 2, 1 ... lift-off

In my life I have done a lot of different things. Always willing to give new things a try. So to me, this is just a next (baby)step. As a software-development-a-holic I have read and still read many blog post. Strangely I have never felt the need to start one of my own.

Have I really been so preoccupied with things going on in my own little world? Have I been too ego-centric or arrogant to share my knowledge? Of course not! I guess I never felt up to the task or perhaps never thought I had something to share that was worth reading by anyone else but me.

Don't worry, this post is not going to be a sad story, nor an example of public self-criticism. I write this post as my 'letter to the world' so people know what to expect.

I have just recently taken the leap into a freelance career. After being self-employed and having managed a full-blown software company I decided to go back to my roots … software development, without all the corporate management hassle. As I'm new to this side of the table – I have hired many freelance developers in the past – I thought it might be fun sharing with other (potential) freelancers my experiences and adventures.

After taking the (tough) decision to part from the company I co-created, my huger for experiments and new techniques has arrived at a new all-time-high. I'm eagerly looking for a new challenge to dig my teeth in. Findsi (check it out here) has always given me a warm tingling feeling (within legal limitations) inside and I have several scrapbooks full of plans and innovative ideas that might take it to the next level. I'll write about our (it's not a one man show) ongoing voyage to technological perfection here as well.

Having developed for many years, the last few years solely using the .NET Framework, I think I have some tips and tricks to share that would make other developers life's a lot easier. Not only in terms of good/bad practices and codewise short-cuts, but also in terms of tools and software.

I might also write about my experiences with BlogEngine.NET, as I go along customizing it to my needs and wishes. Without trying to suck up to anyone, this program is solid and well thought through … compliments to the team.

Does an author actually need to explain his reasons for blogging? Does anyone ever read the first post, before reading other posts? I have no clue, but i'm bound to find out. Hopefully this blog will eventually turn out helpful to someone out there. If not, it might die out slowly, but at least I can than say that I gave it a shot and practiced my English a bit … ;)

Powered by BlogEngine.NET 1.5.0.7