From my student life, I used to do programming a lot. I still can remember those sleepless thrilling nights working with my favorite language C. I used to sit in front of my PC hour after hour and sometime just forgot to eat and sleep. I hated to be interrupted by anybody while I used to do programming.
I know, everyone feels the same way. Whatever we do, we just love to continue doing our work with fullest concentrations and hate to be interrupted by anything, anybody. But, somehow God is not that much kind to us and we have to work within a team. While doing work, we have to answer queries to the teammates, managers and others. We have to talk to others often. Most of the times, we can’t stick to a single task, and, have to try to concentrate on multiple types of tasks and to satisfy everybody’s expectations around us. This is not easy, but, this is life and we have no way but to work in this way.
While doing a work alone, I am the developer, tester, QA, and Manager of my work. That means, it’s me only, who has to play all different roles in the same work. That sounds hard. But, we all know, working alone is 100 times easier than working in a team where different people play different roles. Why?
While I work alone, my human brain is the single information repository and processing engine. There it’s me talking to me only. While I work, I play different roles in different phase, and, I use the single information repository and processing engine (My brain). That means, my “QA” role continuously communicates and exchanges all information with my “developer” role and vice versa. The same goes for my other roles (Manager, tester etc). So, there a great “teamwork” happens inside my brain, and, that “teaming” is tuned and each role inside this team is 100% transparent to other roles while communicating. So, as I can see, each teammate in this team gives its best and I combine their output to produce the best possible outcome by doing excellent communication.
Now, while we work in a team in the “real world”, how can we get the same tuned best possible output? That’s easy. We have to learn from the “team” inside our brain. We have to implement the same “teaming” strategy.
The key factors are the “communications, integrity of information and transparency”. Communicate as much as you can, talk a lot whenever you need. Don’t hesitate to interrupt and be interrupted. Be transparent. Be “open”, ”flexible” and “positive” while in a discussion (That’s what we are when we talk to ourselves). Respect other’s opinion in the same way you respect yours. Only then all members in your team will be able to know the same thing and work for the same goal to achieve. Only then, all members can give their bests in the team and the best possible output gets produced.
One of my favorite mentors used to explain “TEAM” as follows:
T=Together
E=We
A=Achieve
M=More
Thursday, March 12, 2009
Home » Archives for March 2009
Monday, March 9, 2009
Home » Archives for March 2009
Understanding “Set based” and “Procedural” approaches in SQL
DARE TO SHARE?
Perhaps we all have heard that “Set based approaches” are better than the “Procedural approaches” while writing data access routines in SQL. In this article, we will try to investigate the two mentioned approaches to understand why "Set based approaches" are better.
What is a “procedural approach”?
Procedural approach is actually the “programmatic approach” that we are used to work with. In this approach, we tell the system “what to do” and “how to do” it. We write the data operational and manipulation logics using looping, conditional and processing statements. The runtime does whatever we want it to do, however we want it to do.
In TSQL, any User Defined Function (UDF), or Cursor that executes on a result set row by row is a procedural approach.
What is a “Set based approach”?
Set based approach is actually an approach which lets you specify “what to do”, but, does not let you specify “how to do”. That is, you just specify your requirement for set of data that has to be obtained from a “set of data” (Be it a simple table/view, or, joins of tables/views), filtered by optional condition(s). Sometimes, the specification of the “sets” you like to retrieve data from may be obtained by using complex joins/SubQuery/conditional case statements, but, at the end, there is a set of data from which the resultant set has to be obtained. You never have to specify “how” the data retrieval operation has to be implemented internally. You never have to specify how to implement the “joining operation” internally either. Also, you don’t have to specify how to apply filter condition against the rows. The database engine determines the best possible algorithms or processing logics to do these.
Why “Set based approach” is better than the “Procedural approach”?
The internal execution engine of databases are designed and optimized for taking “set based instructions” as input (SQLs) and executing these instructions in the best possible way (that varies based on lots of criteria) to produce output. For example, three types of joins are there in SQL server database. These are,
--Merge join : Most optimized joining algorithm. Takes place when both tables are joined on indexed columns which are sorted.
--Nested loop join : Moderate cost joining algorithm. Takes place when one of the table in the join clause contains small number of records comparing to the number of records in the other participating table.
--Hash join : Most costly joining algorithm. Takes place when large, unsorted, non-indexed columns are used in the join condition.
Now, whenever we specify any join predicate in any SQL, based upon the participating column nature and set (and, nature) of values in the participating columns, the SQL server determines and use the best possible joining algorithm and implement the logic to perform the actual joining operation in memory. Note that, we don’t have to specify the type of joining algorithm in the SQL. SQL server does that, and, does its best to provide the result as fast as possible.
Another important fact is, whatever SQL written in “set based approach” is issued in the database, the query optimizer generates an execution plan first, and then, the execution engine executes the plan to retrieve data from the physical storage and to process output in an efficient manner. That is, there is a single execution plan tree for each single SQL statement be it simple or complex. Executing that single execution plan tree is generally a faster operation.
But, when we specify our own way of processing a result set (That is obtained by an SQL) using another SQL in a row-by-row manner, the database engine has to execute an execution plan for each row. Imagine a row-by-row operation that is getting executed for a result set containing 1 million rows. In this case, the initial data retrieval operation would require one execution plan to be executed, and, later, 1 million times another execution plan has to be executed for processing each row. That’s what happens when a User Defined Function (UDF) is executed for each row in a result set. An additional overhead of using UDF is the amount of stack I/O that takes place for invoking the UDF.
Also, if you use a Cursor to process a result set row-by-row, while executing, the Cursor locks the rows in the corresponding table and unlocks rows when processing done. This involves lots of resource usage on the server and in case of large result sets, this severely slows down performance.
Experiment shows that, using a UDF for row-by-row operation involving up to 1000 rows may provide a performance that is within acceptable range. But, as the number of rows are increased, using UDF would result in dramatically slow performance. The same goes for cursors.
That is why, “set based SQLs” always should outperform the “procedural SQLs”, specially, if the result set to process becomes large.
OK, now, how to get rid of “Procedural SQLs”?
Simple tricks:
--Use inline sub queries to replace User Defined Functions.
--Instead of a cursor, use a table variable to host result set and navigate table variable’s rows using a loop to process rows.
What is a “procedural approach”?
Procedural approach is actually the “programmatic approach” that we are used to work with. In this approach, we tell the system “what to do” and “how to do” it. We write the data operational and manipulation logics using looping, conditional and processing statements. The runtime does whatever we want it to do, however we want it to do.
In TSQL, any User Defined Function (UDF), or Cursor that executes on a result set row by row is a procedural approach.
What is a “Set based approach”?
Set based approach is actually an approach which lets you specify “what to do”, but, does not let you specify “how to do”. That is, you just specify your requirement for set of data that has to be obtained from a “set of data” (Be it a simple table/view, or, joins of tables/views), filtered by optional condition(s). Sometimes, the specification of the “sets” you like to retrieve data from may be obtained by using complex joins/SubQuery/conditional case statements, but, at the end, there is a set of data from which the resultant set has to be obtained. You never have to specify “how” the data retrieval operation has to be implemented internally. You never have to specify how to implement the “joining operation” internally either. Also, you don’t have to specify how to apply filter condition against the rows. The database engine determines the best possible algorithms or processing logics to do these.
Why “Set based approach” is better than the “Procedural approach”?
The internal execution engine of databases are designed and optimized for taking “set based instructions” as input (SQLs) and executing these instructions in the best possible way (that varies based on lots of criteria) to produce output. For example, three types of joins are there in SQL server database. These are,
--Merge join : Most optimized joining algorithm. Takes place when both tables are joined on indexed columns which are sorted.
--Nested loop join : Moderate cost joining algorithm. Takes place when one of the table in the join clause contains small number of records comparing to the number of records in the other participating table.
--Hash join : Most costly joining algorithm. Takes place when large, unsorted, non-indexed columns are used in the join condition.
Now, whenever we specify any join predicate in any SQL, based upon the participating column nature and set (and, nature) of values in the participating columns, the SQL server determines and use the best possible joining algorithm and implement the logic to perform the actual joining operation in memory. Note that, we don’t have to specify the type of joining algorithm in the SQL. SQL server does that, and, does its best to provide the result as fast as possible.
Another important fact is, whatever SQL written in “set based approach” is issued in the database, the query optimizer generates an execution plan first, and then, the execution engine executes the plan to retrieve data from the physical storage and to process output in an efficient manner. That is, there is a single execution plan tree for each single SQL statement be it simple or complex. Executing that single execution plan tree is generally a faster operation.
But, when we specify our own way of processing a result set (That is obtained by an SQL) using another SQL in a row-by-row manner, the database engine has to execute an execution plan for each row. Imagine a row-by-row operation that is getting executed for a result set containing 1 million rows. In this case, the initial data retrieval operation would require one execution plan to be executed, and, later, 1 million times another execution plan has to be executed for processing each row. That’s what happens when a User Defined Function (UDF) is executed for each row in a result set. An additional overhead of using UDF is the amount of stack I/O that takes place for invoking the UDF.
Also, if you use a Cursor to process a result set row-by-row, while executing, the Cursor locks the rows in the corresponding table and unlocks rows when processing done. This involves lots of resource usage on the server and in case of large result sets, this severely slows down performance.
Experiment shows that, using a UDF for row-by-row operation involving up to 1000 rows may provide a performance that is within acceptable range. But, as the number of rows are increased, using UDF would result in dramatically slow performance. The same goes for cursors.
That is why, “set based SQLs” always should outperform the “procedural SQLs”, specially, if the result set to process becomes large.
OK, now, how to get rid of “Procedural SQLs”?
Simple tricks:
--Use inline sub queries to replace User Defined Functions.
--Instead of a cursor, use a table variable to host result set and navigate table variable’s rows using a loop to process rows.
Saturday, March 7, 2009
Home » Archives for March 2009
What do you need to learn?
DARE TO SHARE?
I have heard many people saying that a particular topic/subject/technology seems hard/boring to him/her. Interestingly, I’ve often found that, the same topic/subject/technology that seems boring/hard to person “A” is actually an easy/interesting topic/subject/technology to person “B”. Why?
My observation says, (Most of the times) we tend to rate a particular topic/subject/technology hard/boring if we don’t have good knowledge on it, and as a result, when we feel discomfort with it. That means, only those who have good knowledge on a particular topic/subject/technology, likely to rate it as easy/interesting.
To be very specific, if you ask 100 developers about the subject “Database Management System”, I can bet more than 80 developers will tell you that this is a boring subject. Even, I was among those 80 a few days ago. When I discovered that I was feeling discomfort with this particular subject, I asked myself why I had such a feeling. Obviously, the answer I found was, I actually had lack of knowledge on this subject!
I decided to get rid of this “Discomfort feeling” with DBMS (Becuase, I am a software developer, and, I need to feel good with DBMS), and, started to learn the “Database Management System” more and more and started to explore the DBMS world. Soon I found that my “Discomfort feeling” was going away. If you ask me today, I’ll tell you that the “Database Management System” is an extremely interesting subject to me.
The takeaway of the above discussion is, if you feel discomfort with any topic/subject/technology, or, if you think a particular subject (Related to your software development career) seems hard/boring to you, that means, you don’t know the topic/subject/technology well.So, as you want to be among the “bests”, you have to challenge yourself to know it better and gain experience with it. I can ensure you, within a very few days, the topic/subject/technology will no longer be hard/boring to you.
Home » Archives for March 2009
Performance problem analysis methodology in SQL server
DARE TO SHARE?
SQL server dynamically generates different query plans based on:
--Volume of Data
--Statistics
--Index variation
--Parameter value in TSQL
--Load on server
So, when diagnosing any SQL performance related problem , a methodology should be followed to simulate the production environment in Test server, and, to solve the problem. Following is a methodology you can follow:
A. Capture trace at production server and include ShowPlan in the trace. Do the same in the Test server and compare the plans.
B. Create Replay Trace in production and replay the trace on test server to create a similar load.
C. In the Database Engine Tuning Advisor tool, use the production trace as the workload to view tuning suggestions.
D. Take performance counter log in the production server and correlate it with the SQL profiler trace to diagnose the bottleneck.
E. Analyze the Query plans in the production SQL profiler trace to diagnose whether indexes are properly utilized in the SQL’s
F. Execute the TSQL (Provider earlier) to view the External/Internal fragmentation values for indexes in the target database.
--Volume of Data
--Statistics
--Index variation
--Parameter value in TSQL
--Load on server
So, when diagnosing any SQL performance related problem , a methodology should be followed to simulate the production environment in Test server, and, to solve the problem. Following is a methodology you can follow:
A. Capture trace at production server and include ShowPlan in the trace. Do the same in the Test server and compare the plans.
B. Create Replay Trace in production and replay the trace on test server to create a similar load.
C. In the Database Engine Tuning Advisor tool, use the production trace as the workload to view tuning suggestions.
D. Take performance counter log in the production server and correlate it with the SQL profiler trace to diagnose the bottleneck.
E. Analyze the Query plans in the production SQL profiler trace to diagnose whether indexes are properly utilized in the SQL’s
F. Execute the TSQL (Provider earlier) to view the External/Internal fragmentation values for indexes in the target database.
Home » Archives for March 2009
10 Steps for tuning data access in SQL server
DARE TO SHARE?
Recently, I have been exploring a lot on the data access optimizations in SQL server. Followings are my findings so far that I would like to share with you:
1. Re-factor the SQL’s used in Stored procedures/Views/Triggers and apply the best practices to write optimized queries
2. Make sure that, indexing is properly done in the database (All tables have primary keys, and, tables has appropriate non-clustered indexes on columns which
A. Are used as search criteria fields
B. Are used to join other tables
C. Are used as foreign key fields
D. Are used in the ORDER BY clause
–While creating indexes, create appropriate “Covering indexes” involving frequently accessed fields
–Use “SQL server management studio” to view the query execution plan and identify the scopes for improvement in the SQLs
–Use “SQL server tuning advisor” to get help from SQL server in creating indexes.
3. Consider moving your SQL’s from application to Stored Procedures/Views if there is any
4. Consider using the “Full text search” feature to perform search on textual columns.
5. Re-build indexes if fragmentation occurs.
6. Create “Indexed Views” with expensive select queries that internally saves result sets on queries.
7. Consider refactoring your table design and apply De-normalizations to improve select operations.
8. Consider creating different “User defined file groups” and put your database objects into these file groups effectively based upon
-Frequency of table I/O
-Read/write nature of tables
-Types of objects (Tables/Index)
9. Consider creating “History tables” that contains archived read only data where select operations will be applied. Use SQL server Maintenance feature to populate those history tables.
10.Consider partitioning big fat tables into different file groups so that the table spans across different files and SQL server engine can concurrently access the table data.
Please note that, there are whole lot of ways for database tuning in SQL server, that are probably out of scope of the developers. The above 10 stuffs should be exercised by developers to write optimized data access routines.
Optimization is fun!
1. Re-factor the SQL’s used in Stored procedures/Views/Triggers and apply the best practices to write optimized queries
2. Make sure that, indexing is properly done in the database (All tables have primary keys, and, tables has appropriate non-clustered indexes on columns which
A. Are used as search criteria fields
B. Are used to join other tables
C. Are used as foreign key fields
D. Are used in the ORDER BY clause
–While creating indexes, create appropriate “Covering indexes” involving frequently accessed fields
–Use “SQL server management studio” to view the query execution plan and identify the scopes for improvement in the SQLs
–Use “SQL server tuning advisor” to get help from SQL server in creating indexes.
3. Consider moving your SQL’s from application to Stored Procedures/Views if there is any
4. Consider using the “Full text search” feature to perform search on textual columns.
5. Re-build indexes if fragmentation occurs.
6. Create “Indexed Views” with expensive select queries that internally saves result sets on queries.
7. Consider refactoring your table design and apply De-normalizations to improve select operations.
8. Consider creating different “User defined file groups” and put your database objects into these file groups effectively based upon
-Frequency of table I/O
-Read/write nature of tables
-Types of objects (Tables/Index)
9. Consider creating “History tables” that contains archived read only data where select operations will be applied. Use SQL server Maintenance feature to populate those history tables.
10.Consider partitioning big fat tables into different file groups so that the table spans across different files and SQL server engine can concurrently access the table data.
Please note that, there are whole lot of ways for database tuning in SQL server, that are probably out of scope of the developers. The above 10 stuffs should be exercised by developers to write optimized data access routines.
Optimization is fun!
Subscribe to:
Posts (Atom)