ace
|
 |
| Joined: 24 Jun 2004 |
| Total Posts: 200 |
| |
|
Rowcount revisited Posted: 23 May 2005 03:03 PM |
Hi all,
I was wondering as to what is best way to have an accurate rowcount of a table in SQL Server. I can't really do a SELECT COUNT(*) as we're talking about massive amounts of data in this case.
In fact under circumstances does it goes out of sync on SQL Server 2000?
All INSERTs are logged, and if you look at the log via DBCC LOG it seems to modify the [rows] column in [sysindexes] from memory. Likewise with DELETEs.
So it would be for "non-logged" operations like TRUNCATE TABLE, although strictly speaking there is no such thing in SQL 2000. Except that depending on your database recovery model different things are logged, so maybe that influences this. Although it shouldn't.
So does anyone know under what circumstances it all gets out of sync?
Otherwise the best approach I can think of is to write INSERT and DELETE TRIGGERs on all table which increment and decrement a table with the table names and rows.
If I can guarantee that I only have single row inserts for example on a database in simple recovery is the sysindexes guaranteed to be accurate? Thus the quesion about the circumstances above.
Anyway, has anyone come across and technical literature on why [sysindexes].[rows] get out of sync with SQL Server 2000?
Cheers,
Victor
|
|
|
 |
|
|
Re: Rowcount revisited Posted: 23 May 2005 03:25 PM |
Hmm ... what amount of data do you term "massive amounts"? I've done a COUNT([PK Column]) over a few million rows without much problems. Of course doesn't a COUNT(*) read in all fields in the row? Which would in turn comsume more resources. I also recall reading somewhere that this can also lead to rows being excluded due to nullability.
Generally when I'm looking for an accurate rowcount and I can't do a COUNT, I run DBCC UPDATEUSAGE on the specific table/index and then query sysindexes.
I think the "technical literature" that you're looking for will be in some dark hole somewhere in Microsoft. Not good marketing practice to advertise product failings ;)
|
Thanks
Phill
Colt 45 - the original point and click interface ;)
http://philcart.blogspot.com/ |
|
 |
|
ace
|
 |
| Joined: 24 Jun 2004 |
| Total Posts: 200 |
| |
|
Re: Rowcount revisited Posted: 23 May 2005 04:36 PM |
Hmmm...
We're talking about 100 tables or so, a third in excess of 10 million records, some over 500 million. And growing... About 20GB per quarter.
It's basically to see if 2 databases are in sync. Real-time system. Legacy one has particularly inefficient data types, bad indexing and very wide tables. PK is typically over 50 bytes in size. The legacy database is approaching 300GB. Only get's better huh?
So it all collectively adds up.
DBCCUPDATE USAGE won't help as that is an expensive operation and this real-time system has data fed into it every 5 minutes.
My problem is that I think that all these SQL Gurus are just repeating the mantra of you can't trust [SYSINDEXES].[ROWS] from SQL 6 days, but no-one has tested this. Perhaps it can be relied upon in certain circumstances...
With my testing below it seems that [SYSINDEXES].[ROWS] shold be accurate if you only ever have T-SQL INSERTs, DELETEs (UPDATEs are irrelevant) and even TRUNCATEs:
create database victor
go
use victor
go
create table t1 (c1 int)
checkpoint
insert t1 values (1)
dbcc log (victor,3) -- Note LOP_DELTA_SYSIND record
select count(*) FROM t1 union select rows from sysindexes
where name = 't1'
-- Same!
checkpoint
insert t1 values (2)
dbcc log (victor,3) -- Note LOP_DELTA_SYSIND record
select count(*) FROM t1 union select rows from sysindexes
where name = 't1'
-- Same!
checkpoint
delete t1 where c1 = 2
dbcc log (victor,3) -- Note LOP_DELTA_SYSIND record
select count(*) FROM t1 union select rows from sysindexes
where name = 't1'
-- Same!
checkpoint
update t1 set c1 = 10 where c1 = 1
dbcc log (victor,3) -- Note LOP_DELTA_SYSIND record
select count(*) FROM t1 union select rows from sysindexes
where name = 't1'
-- Same!
-- What about a nested insert?
checkpoint
insert t1 select * from t1
dbcc log (victor,3) -- Note LOP_DELTA_SYSIND record
select count(*) FROM t1 union select rows from sysindexes
where name = 't1'
-- Same!
-- What about a truncate?
checkpoint
truncate table t1
dbcc log (victor,3) -- Note LOP_DELTA_SYSIND record
select count(*) FROM t1 union select rows from sysindexes
where name = 't1'
-- Even truncate seems to works as you'd expect as the
operation is no logged
drop database victor
Since all operations in SQL 2000 are logged, it looks like every T_SQL INSERT and DELETE is logged an modifies [SYSINDEXES] on the way...
The only thing I can think of is your BCP / BULK INSERTs where (I imagine as I have not tested this) the T/L might just record the new extents/pages allocated, or something like that...
So that's why I'd like to know that if I can guarantee that my database only ever gets INSERTs (not even DELETEs, unless it's to correct a mistake which shouldn't happen) and minimal UPDATEs (which should have no effect), no BCP / BULK INSERT / DTS operations, is
[SYSINDEXES].[ROWS] accurate?
All I am after is at least one (or two, but not all) scenario where in the above circumstance where I can not rely upon it. As is will than straight away negate my hypothesis that it [SYSINDEXES].[ROWS] is accurate.
Perhaps under certain circumstance it can be...
Cheers,
Victor
As an aside a COUNT(*) does not read all the fields in a row, that is the worst possible thing it can do ;o) You're right about nullability with aggregate functions but COUNT(*) is THE exception.
|
|
|
 |
|
|
Re: Rowcount revisited Posted: 24 May 2005 12:49 PM |
Hi Victor,
I have done some massive inserts and deletes and find that rowcounts from sysindexes sometimes do not match with select count(*) from X.
This is what I " think" that happens...
MSsql server do update statistics automatically as we know. However it only do update stats when server is not busy (if busy it will take low priority).
The way I've got around it:
1. come back the next day (if server not busy), you may find sysindexes get updated.
2. I force it to run update stats on table and this will update on sysindexes.
Update stats can take a very long to run on big tables
I would like hear your opinion on this as I'm not too sure if this is an ideal thing to do.
Cheers,
Nam
|
|
|
 |
|
ace
|
 |
| Joined: 24 Jun 2004 |
| Total Posts: 200 |
| |
|
Re: Rowcount revisited Posted: 24 May 2005 02:50 PM |
Hi Nam,
Just quickly, how do you the inserts? Single T-SQL INSERTs or soemthing else? Are there also DELETEs, etc?
UPDATE STATISTICS is triggered by the number of rows / percentage of rows changed depending on the cardinality of the table. So it's quite simple with when it's triggered by the Optimiser. Don't think it has anything to do with how busy your system is. I think.
Problem is that with this system the data is updated every 5 minutes, so even if I UPDATE STATISTICS at 3 AM, it might beout of sync by lunchtime, which is unaccpetable as this is a real-time market analysis system in a trading environment.
I am investigating this further by myself and through other factors, will be curious to see what eventuates.
Will post anything useful of interesting...
Thanxs for your comments,
Victor |
|
|
 |
|
|
Re: Rowcount revisited Posted: 01 Jun 2005 10:02 AM |
On a previous large scale db, I ran across the same issue. After a lot of digging around MS and various SQL sites, I discovered someone figured out the the system tables could be used to retrieve the recordcount rather than querying the table in question.
Try this stored procedure...
Create Procedure dbo.pr_RecordCount
(
@pTableName nvarchar(128)
)
As
Set nocount on
select rows as Count
from sysobjects o inner join sysindexes i on o.id = i.id
where i.indid < 2
and o.name = @pTableName
return
GO
|
|
|
 |
|
|
Re: Rowcount revisited Posted: 03 Jun 2005 01:23 PM |
This stored proc will work ONLY if stats is up to date.
If statistics is not being updated then the results from the proc will be out of synch with the actual select count(*) from table_name.
|
|
|
 |
|
prfbrown
|
 |
| Joined: 09 Aug 2005 |
| Total Posts: 5 |
| |
|
Re: Rowcount revisited Posted: 10 Aug 2005 06:03 PM |
You may also want to routinely write the rowcounts for your tables into a rowcount history table seeing as though you have them being gathered. Retrospective analysis of rowcount change is sometimes useful for planning. The TSQL code that gathers the rowcount can also gather a number of other parameters (eg: space allocations for table data and indexes) and therefore you are able to extrapolate trends towards capacity planning.
I have some further information about rowcounts on the web here:
http://www.mountainman.com.au/software/Winluck/row_counter.html
|
|
|
 |
|