Tech Point Fundamentals

Sunday, March 27, 2022

Different Ways for Getting Record Count in SQL

Different Ways of Getting Record Count in SQL

getting-record-count-sql

Getting record count is the most common thing which we do in any project. So this is also very common in the interview as well. The interviewer may asks this question in a different manner. In this article, we will walk through the different ways to get the record count in SQL.


Please visit our YouTube Channel for more SQL Interview Questions and Answers videos by the below link:




Today's Interview Question


Q01. How can you get the record count in a table without using the COUNT() function?

Q02. What is the difference between COUNT(*), COUNT(1), and COUNT(column_name)?

Q03. What is the difference between COUNT(*) and COUNT_BIG(*) in SQL?

Q04. What is the difference between DISTINCT COUNT() and COUNT(DISTINCT)?

Q05. How can you get the Distinct Records Count in SQL?

Q06. What are the different ways to get the records COUNT in SQL?

Q07. How can you get the record count having NULL values? How can you count NULL values in SQL?

Q08. What is the fastest way to get the record COUNT in SQL? 

Q09. Can you use MAX(COUNT()) in SQL?

Q10. What is the difference between COUNT(*) and COUNT(1)?

Q11. Does the COUNT function hold the lock on the table?







Introduction


There are different ways to get the record count in SQL. We can get the record count using the inbuilt COUNT function and without that as well. 


We will use below "EmployeeDetails" table for all the demo in this article. Please note that in this table Name and Email fields contains some NULL values as well.


employee-table-data




COUNT(*) vs COUNT(1) vs COUNT(column)


Technically there is no difference among  COUNT(*), COUNT(1), and COUNT(col). All are the same SQL COUNT function except the different passing parameters. 


One of the cores and special gotcha about the COUNT function is the NULL value. So if you pass any column name in the COUNT function it will ignore all the NULL values.


The COUNT function always returns an int data type as result. The SQL COUNT() function always returns the accurate records count.


COUNT(*) or COUNT(1):


  • It returns the total number of rows after satisfying conditions specified in the WHERE clause (if any). 
  • It does not eliminate Duplicate values. 
  • It does not eliminate the NULL values.
  • For the COUNT(*) function,  SQL Server has to run a blocking scan against the entire table in order to derive this count. That is expensive. 




COUNT(column):


  • It also returns the total number of rows after satisfying conditions specified in the WHERE clause (if any).
  • It does not eliminate Duplicate values.
  • It eliminates the NULL values in the output.


count_vs_count
In the above example if you see COUNT(Name) and COUNT(Email) is returning 8 and 9 respectively not 10 as returned by COUNT(Id) because both Name and Email column contain NULL values. So they are always equal if:


COUNT(1) = COUNT(*) = COUNT(PrimaryKeyColumn)


Please watch the SQL NULL Values video here and SQL 3 Valued Logic video here for more details.




COUNT(*) vs COUNT(1)


Generally, there are two common ways to count the record count, either COUNT(*) or COUNT(1). But there are some logical differences there.


Here the "*" specifies that COUNT() should count all rows to determine the total table row count to return. So it counts each row separately, including rows that contain NULL values.


countone_vs_countstar


If you see the above example for COUNT(*) physical read is 3, read-ahead read is 45195, CPU Time is  983 ms, and Elapsed Time is 16363 ms.  


On the other hand for COUNT(1) physical read is 0, read-ahead read is 0, CPU Time is  421 ms, and Elapsed Time is 451 ms. So you can say that COUNT(1) is a little bit faster compared to COUNT(*).




COUNT(DISTINCT) vs DISTINCT COUNT()


Actually, COUNT(DISTINCT) is not the same as DISTINCT COUNT(). If you see the syntax for the COUNT function in SQL:


COUNT ([ALL | DISTINCT] expression)


So by default, it returns all the records count. If you Pass the DISTINCT keyword along with the column name, it will return a distinct record count. So the DISTINCT keyword actually:


COUNT(DISTINCT column)


  • It returns the distinct number of rows after satisfying conditions specified in the WHERE clause (if any).
  • It eliminates Duplicate values.
  • It eliminates the NULL values in the output.
  • You can not use DISTINCT with COUNT(*)

But you have to remember that DISTINCT COUNT() is not the same as COUNT(DISTINCT). If you see in the below example you are getting two different outputs for both.

distinct-record-count



COUNT() vs COUNT_BIG()


In SQL the COUNT_BIG() functions work the same as the COUNT() function. At first glance, they are looking the same but there are two differences between them.

COUNT_BIG is a deterministic function when used without the OVER and ORDER BY clauses. COUNT_BIG is non-deterministic when used with the OVER and ORDER BY clauses. 

If you see the below example we are getting the same result as COUNT().


count_vs_countbig

There are only below two differences between COUNT() vs COUNT_BIG():

  1. It returns the long int instead of int, unlike the COUNT() function.
  2. You can not create an index in the view if you use the COUNT() function. So in view, it is required to use the COUNT_BIG() function for getting the record count.

COUNT-function-View

If you use the COUNT() function in view and try to create an Index you will get the below error:

"Cannot create index on view "TechPoint.dbo.UserDetailsView" because it uses the aggregate COUNT. Use COUNT_BIG instead."




Counting NULL Values in SQL


This question is a little bit tricky here. How can you count the NULL values in any column?

If you use the COUNT(column), it will not return the count, because the COUNT() function discards the NULL values. In the below example you can see it is returning 0 as count.

Please watch the NULL vs COALESCE video here.


counting-null-values



Getting MAX COUNT in SQL


In SQL you cannot use multiple aggregate functions at the same time. So you cannot use MAX(COUNT()) directly. You have to either use CTE (Common Table Expression) or DT (Derived Table). 

Please watch the CTE video here and the DT video here for more details.


max-count

Please watch the Getting 3rd Highest Salary in SQL video here.



Getting Record Count Without COUNT() or COUNT_BIG() Function



This question is asked very commonly in the interview. There are multiple ways to get record count without using the COUNT() function.

The very first way is ROW_NUMBER() Function with CTE or DTPlease watch the CTE video here and the DT video here for more details.


count_without_count


Please watch the ROW NUMBER vs RANK vs DENSE RANK video here for more details.



Getting Record Count Using Dynamic Management View



SQL Server has to run a blocking scan against the entire table in order to derive the record count using the COUNT function and that is very expensive.  This information is stored in the catalog views and DMVs, and you can obtain it without all of that I/O or blocking.

The benefit of using the SQL COUNT function is that it is always an accurate indicator of exactly how many rows exist in the table at the time query processing begins. To do that it holds the lock on the table.

There are some dynamic views that are used for getting the record count without using the SQL COUNT function.




Using sys.sysindexes View:


For getting the record count you can use the "sisindexes" view which contains the record count for every table and index.

  • It contains one row for each index and table in the current database.
  • XML indexes are not supported in this view. 
  • Partitioned tables and indexes are not fully supported in this view; use the sys.indexes catalog view instead.
This view contains so many columns but below are some common columns:

  • ID: Id of the table to which the index belongs.
  • INDID: ID of the index (0 - Heap, 1 - Clustered Index, >1 - Non-Clustered Index)
  • ROWS: Data-level row count based on indid = 0 and indid = 1, and the value is repeated for indid >1.
This view will not return accurate results for some reserved columns if the table or index contains data in the ROW_OVERFLOW allocation unit. To view accurate page counts, use the sys.allocation_units or sys.partitions catalog views, or the sys.dm_db_partition_stats dynamic management view.

Please read more here.



Using sys.dm_db_partition_stats View:


It returns page and row-count information for every partition in the current database. The sys.dm_db_partition_stats displays information about the space used to store and manage in-row data LOB data, and row-overflow data for all partitions in a database. One row is displayed per partition.

The counts on which the output is based are cached in memory or stored on disk in various system tables. In-row data, LOB data, and row-overflow data represent the three allocation units that make up a partition. The sys.allocation_units catalog view can be queried for metadata about each allocation unit in the database.

If a heap or index is not partitioned, it is made up of one partition (with partition number = 1); therefore, only one row is returned for that heap or index. The total count for an individual table or an index can be obtained by adding the counts for all relevant partitions.

This view is also used by the sp_spaceused uses to derive the count. It is also true that these views are not 100%, to-the-microsecond accurate. 




Some common columns are as follows:

  • partition_id: ID of the partition. This is unique within a database. This is the same value as the partition_id in the sys.partitions catalog view.
  • object_id: Object ID of the table or indexed view that the partition is part of.
  • index_id: ID of the heap or index the partition is part of (0 - Heap, 1 - Clustered Index, >1 - Nonclustered Index)
  • partition_number: 1-based partition number within the index or heap.
  • row_count: The approximate number of rows in the partition.

Please read more here.


record-count-using-dmv

In the above example, you can see that you are getting the same record count by using both either sysindexes or sys.dm_db_partition_stats view.



Using sys.partitions:


It contains a row for each partition of all the tables and most types of indexes in the database. Special index types such as Full-Text, Spatial, and XML are not included in this view. 

All tables and indexes in SQL Server contain at least one partition, whether or not they are explicitly partitioned.

Some common columns are as follows:

  • partition_id: Indicates the partition ID. Is unique within a database.
  • object_id: Indicates the ID of the object to which this partition belongs. Every table or view is composed of at least one partition.
  • index_id: Indicates the ID of the index within the object to which this partition belongs(0 - Heap, 1- CI, >1 - NCI)
  • partition_number: Is a 1-based partition number within the owning index or heap. For non-partitioned tables and indexes, the value of this column is 1.
  • rows: Indicates the approximate number of rows in this partition.

One can join sys.partitions with sys.tables and sys.indexes to get the record counts for all the tables of a database or any particular table as well.

sys-partition-view




Live Demo






Conclusion


There are so many ways to get the record count in SQL, but you have to choose the best one that suite's your requirement. COUNT() function has always a blocking scan against the table but always gives the super-accurate record count in a table. 

On the other hand, the dynamic management catalog view may not give accurate results all the time, but sometimes it is better than COUNT(*) function.




SQL Scripts


   
-- Employee Table Data
SELECT * FROM EmployeeDetails;


-- COUNT(*) vs COUNT(1) vs COUNT(Column_Name)
SELECT 
		 COUNT(*)	AS [COUNT(*)]
		,COUNT(0)	AS [COUNT(0)]
		,COUNT(1)	AS [COUNT(1)]
		,COUNT(UserId)	AS [COUNT(UserId)]
		,COUNT(Name)	AS [COUNT(Name)]
		,COUNT(Email)	AS [COUNT(Email)]

FROM EmployeeDetails;


-- Getting Record Count Containing NULL Values
SELECT 				
	    COUNT(Name)	AS [COUNT(Name)]
	   ,COUNT(*)	AS [COUNT(*)]
	   ,COUNT(1)	AS [COUNT(1)]		

FROM EmployeeDetails
WHERE Name IS NULL;


-- Getting Distinct Record Count
SELECT 					
		DISTINCT COUNT(Name) AS DistinctCount,
		COUNT(DISTINCT Name) AS CountDistinct,
		COUNT(Name)			 AS NormalCount	

FROM EmployeeDetails;


-- COUNT() vs COUNT_BIG()
SELECT 		
	 COUNT_BIG(*)		AS [COUNT_BIG(*)]
	,COUNT_BIG(0)		AS [COUNT_BIG(0)]
	,COUNT_BIG(1)		AS [COUNT_BIG(1)]
	,COUNT_BIG(UserId)	AS [COUNT_BIG(UserId)]
	,COUNT_BIG(Name)	AS [COUNT_BIG(Name)]
	,COUNT_BIG(Email)	AS [COUNT_BIG(Email)]

FROM EmployeeDetails;


-- COUNT() vs COUNT_BIG() in View
CREATE VIEW UserDetailsView
WITH SCHEMABINDING
AS 
	SELECT		
		Name,		
		COUNT(*) AS Total
	FROM [dbo].[EmployeeDetails]
	GROUP BY Name;
	
-- Creating Index on View Containing COUNT Function	
CREATE UNIQUE CLUSTERED INDEX IX_VWUser ON UserDetailsView(Name);

-- ERROR : Cannot create index on view "TechPoint.dbo.UserDetailsView" because it uses the aggregate COUNT. Use COUNT_BIG instead.


-- Getting Record Count Without COUNT() Function
-- Using CTE (Common Table Expression)
 WITH CTE AS 
( 
  SELECT  ROW_NUMBER() OVER (ORDER BY Name) AS [RowNumber]  FROM [dbo].[EmployeeDetails] 
 )  
SELECT MAX(RowNumber) AS RecordCountUsingCTE FROM CTE;

-- Using DT (Derived Table)
SELECT MAX(RowNumber) AS RecordCountUsingDT FROM
( 
SELECT  ROW_NUMBER() OVER (ORDER BY Name) AS [RowNumber]  FROM [dbo].[EmployeeDetails] 
)DT;

-- Getting Record Count Without COUNT() Function

-- Using sysindexes
SELECT 
	rows AS RecordCount  
FROM sysindexes 
WHERE id = OBJECT_ID('EmployeeDetails') AND indid < 2;

-- Using sys.dm_db_partition_stats dynamic management view
SELECT 
	OBJECT_NAME(object_id) AS TableName, 
	SUM(row_count) AS RecordCount
FROM sys.dm_db_partition_stats
WHERE object_id = OBJECT_ID('EmployeeDetails') AND index_id < 2 
GROUP BY OBJECT_NAME(object_id);

-- Getting Record Count Without COUNT() Function
SELECT 
	TBL.name as TableName, 
	SUM(PART.rows) AS TotalRecords
FROM sys.tables TBL
INNER JOIN sys.partitions PART ON TBL.object_id = PART.object_id
INNER JOIN sys.indexes IDX ON PART.object_id = IDX.object_id AND PART.index_id = IDX.index_id
WHERE TBL.name = 'EmployeeDetails' AND  IDX.index_id < 2
GROUP BY TBL.object_id, TBL.name;


-- COUNT(*) vs COUNT(1)
-- Total Records in SalesDetails Table = 3639510 
SET STATISTICS IO ON
SET STATISTICS TIME ON

SELECT 
	COUNT(*)	
FROM [dbo].[SalesDetails];

SELECT 
	COUNT(1)	
FROM [dbo].[SalesDetails];


-- Getting Max Count
SELECT MAX(RecordCount) AS MaxRecordCount FROM
(
	SELECT 
			Name, 
			COUNT(*) AS RecordCount
	FROM EmployeeDetails
	GROUP BY Name
)DT;

-- COUNT vs Non-COUNT Approach Performance Comparision
	SET STATISTICS IO ON
	SET STATISTICS TIME ON
	SELECT COUNT(*) AS [COUNT(*)] FROM SalesDetails;

	SELECT rows AS TotalRecordsCount  FROM sysindexes 
	WHERE id = OBJECT_ID('SalesDetails') AND indid < 2;






Recommended Articles


Most Frequently Asked SQL Interview Questions
Most Frequently Asked C# Interview Questions
Foreign Key Interview Questions
Unique Key Interview Questions
Primary Key Interview Questions
Derived Table (DT) Interview Questions
Unicode vs Non-Unicode Data Types Interview Questions
Table Variable (TV) Interview Questions
Table-Valued Type (TVT) Interview Questions
Common Table Expression(CTE) Interview Questions
Temp Table Interview Questions
Static Class Interview Questions and Answers
Sealed Class Interview Questions and Answers
Sealed Method Interview Questions and Answers
Abstract Class vs Interface
Interface Interview Questions and Answers







Thanks for visiting this page. Please follow us on Twitter, Facebook, LinkedIn, Telegram, WhatsApp, Youtube, and Quora for regular updates.


External Reference Links



No comments:

Post a Comment

Please do not enter any HTML. JavaScript or spam link in the comment box.