A while ago I was looking into some odd errors that were cropping up in an application, when I spotted that all the columns in a table were defined as VARCHAR(MAX).
I’m sure that we’ve all seen instances of when that’s been done before?
When I mentioned to the developer that it’s probably not the best idea, he turned around and asked me, ‘why not?’
It was a genuine question. Why shouldn’t we just spam VARCHAR(MAX) over all of our columns? On the upside, it would get rid of all those annoying issues that crop up when we try to insert a value that overflows the datatype.
To investigate this, I’m going to create two tables. They’ll both be identical, they’ll both have an integer ID field and a notes field. The only difference will be the fact that one will contain a VARCHAR(MAX) notes field and the other’s will be VARCHAR(1000).
CREATE TABLE VarcharTest1000
(ID INT IDENTITY(1,1),
Notes VARCHAR(1000))
CREATE TABLE VarcharTestMAX
(ID INT IDENTITY(1,1),
Notes VARCHAR(MAX))
We’ll populate both tables with an identical 1000 rows of randomly generated ‘notes’ values.
Estimated Row Size
I’m going to run the following queries and just have a look at what SQL thinks is happening…
SELECT *
FROM VarcharTest1000
ORDER BY notes
SELECT *
FROM VarcharTestMAX
ORDER BY notes
The execution plans are looking identical,

…but let’s check out the properties of that table scan…


Do you see our first issue? When querying from VarcharTest1000 the estimated row size is 515B, compare that to the 4039B when querying from VarcharTestMAX.
Why the difference? We know that there are no values in either table that are over 1000 bytes, so why the massive over estimation from the VARCHAR(MAX) table?
This comes down to how SQL figures out its memory requirement estimations.
If a row is made up purely of fixed length data types (INT, CHAR(n), BIT, DATETIME…), SQL has it easy, it knows exactly how big the rows are going to be. Once we start throwing in variable length datatypes like VARCHAR(n) and VARCHAR(max), SQL is going to have to start doing some guess work, it doesn’t know how big those columns are going to be.
When looking at variable length columns, SQL assumes an average of around 50% fullness. So essentially, what SQL is going to do is look at the maximum length of a datatype and half it. Therefore, when estimating the row size for our VARCHAR(1000) table, it’s going to take 4 bytes for the INT datatype and around 500 bytes (there’s a little extra added for various overheads) for the VARCHAR(1000).
SQL came up with the number, 515 bytes.
So what’s going on with VARCHAR(MAX)? SQL has to do a similar thing as in the earlier example, but with a VARCHAR(MAX), the maximum in row data that it can store is 8000 bytes (things get a bit more interesting if we do start exceeding 8000 bytes in our values but that’s not what we’re looking at here). With the same 50% fullness assumption, SQL will come up with an estimate or around 4000 bytes.
Take a second to think about that. In our example we know that we’re never going to exceed 1000 characters but because we’ve assigned a column to a VARCHAR(MAX). SQL doesn’t know we don’t plan on exceeding 1000 characters and is massively over estimating the row size.
So why does that matter?
Memory Grant
When SQL executes a query, it needs to allocate a certain amount of memory. Complicated things like sorts and joins all need space to store their intermediary results.
This memory is generally allocated before the query starts running. The question then is, how does SQL know how much to allocate? Yup, you’ve guessed it, one very important piece of information that it uses is the estimated row size. So if we’re over estimating the row size then we’re going to be over allocating memory.
Let’s take a look at the MemoryGrantInfo part of the actual execution plan for the VARCHAR(1000) query.

So we can see there that SQL asked for 1600KB of memory, of that it actually used 472KB. So in that example we’re over granted by 1128KB bytes. To be honest, that’s not a ideal scenario, but lets see what happened in out VARCHAR(MAX) query.

This time SQL is asking for a whopping 6656KB while only using 496KB, an over grant of 6160KB.
So while that might not be a huge amount of memory in the scheme of things, remember that we’ve only got one VARCHAR(MAX) column in a relatively small table with a simple query. Add in more columns, a larger dataset and a complex query and it should be obvious why this can be an issue.
Too many queries like that running at a time, and you can easily hit the issue where SQL runs out of memory to grant. When that happens, we’ll see queries start having to wait for memory to free up. These will show up as RESOURCE_SEMAPHORE waits in the wait stats.
Another bad side effect of excessive grants is that it can put pressure on to the plan and buffer caches, forcing eviction from those. That can lead to increased compilations as well as increased IO, further impacting performance for the whole server.
Let’s take this a bit further…
So, we’ve already seen the sort of over grant that can happen with only one VARCHAR(MAX) in a table. But what happens if we have a table similar to the one I saw in the database I mentioned earlier?
Actually this isn’t halfway as bad as the filth that I witnessed but it’ll do to illustrate the point.

We’ll run a similar query as before….
SELECT *
FROM VarcharTestMAXSilly
ORDER BY notes

Take a look at that, a whopping 34,976KB granted but only 288KB were actually used. That’s a massive waste. Think about that for a second, we’re only playing around with a relatively small dataset of 1000 rows and a pretty simple query. Think how that could blow right out of control with a decent number of rows or some complicated aggregates or joins?
Indexing
We’ve seen that VARCHAR(MAX) fields are pretty bad news when it comes to memory grants, but what about indexes? What happens if we try to index on?
Let’s give it a go…
CREATE NONCLUSTERED INDEX ix_VarcharTestMAXSilly_FirstName
ON VarcharTestMAXSilly (FirstName)

Oooo, what’s this? We can’t create an index on a VARCHAR(MAX) column? Well that does make sense, other than the fact that indexing on a column that could potentially hold up to 2GB of data would be a disgusting thing to do, SQL has a limit of 1700bytes on a non clustered index key column (900 bytes on a cluster index). Our VARCHAR(MAX) columns way exceed that number.
So Why Shouldn’t we Just Use VARCHAR(MAX)
So, if you developer comes and asks you, ‘why not just use VARCHAR(MAX)’, you should be able to give him some compelling reasons against it.
As always, don’t be lazy with your database design. Data types are one of the most important choices that you’re going to make when building a database, so take the time to think about it and choose them sensibly.