db2-table-row-tuple-examination

The row size for a table in DB2 is important in several ways. It determines which page size can be used for a tablespace and it can help us estimate table size – either optimal or future.

Selecting a Page Size

DB2 page sizes are adjustable only for tablespace and buffer pool data – DB2 uses 4K pages for all other memory configuration parameters, no matter what other page sizes are used for the actual data. Defining a page size on database creation will change the page size for the three default tablespaces – SYSCATSPACE, TEMPSPACE1, and USERSPACE1.

Page sizes are chosen at object creation time, and cannot be changed after creation. For this reason, thoughtful selection of page sizes is important. After table creation, a table can be moved from one tablespace to another to change page size, but even with the wonderful ADMIN_MOVE_TABLE tool, this is not without challenges and risks.

Choosing the page sizes for DB2 to use in a database is a question based in performance and geometry. For data warehousing and analytics environments, the right page size is nearly always 32 KB. For OLTP or transaction-focused databases, it is worth testing performance on different page sizes to see what page size works best in a specific environment. Depending on the storage subsystem, the minimum size of data that is handled by the I/O subsystem may be larger than your page size. The page size also affects how data is handled once it is in memory. Handling data in memory is still an operation that consumes CPU time and, of course, memory. For a database that is not I/O bound, this is an important consideration.

The maximum possible row size should fit on the page size chosen. This used to be required, but is now an option. The technical implementation behind extended row size almost guarantees that you will see better performance if a row fits on a page than if extended row size is actually being used. There are more details to consider when choosing a page size – see Calculating Row Width and Choosing a Page Size for more details.

Estimating Table Size

One of the most important reasons for looking at row size is to estimate future data size for capacity planning purposes. We can estimate table size based on either the maximum possible row size, the average current row size, or a combination of both. Table size estimation is important when making data pruning decisions and also when making decisions about things like data partitioning and what other kinds of partitioning may be appropriate to manage maintenance and performance over time.

Calculating Table Row Size

Types of Row Size

Many tables in any RDBMS include varying-length data types. These data types consume only the amount of space needed on disk, and include a defined maximum possible size. This leads to different ways of understanding row size, depending on whether you’re looking at the definition or the reality of the data. Row sizes are generally expressed in bytes.

Maximum Possible Row Size

Calculate the maximum possible row size by understanding how much space the data type in each column can use, and adding values for the columns together. It is rare that every single variable length column in a table is populated to the maximum possible size. Usually lengths for each column mostly hover around a common number, with some outliers. Often, developers tend to choose values that allow for the maximum imaginable size, and then add some space, not realizing how easy it is in DB2 to increase the size of a column. If there are no varying length columns in a table, then the values calculated here are the actual sizes, not the maximums.

Average Row Size

Calculate the average row size by looking at the current data in all the columns and describing what the row looks like in reality. Unless there are no varying-length data types, the average row size is always less than the maximum possible row size.

Before Table Exists

Before the table exists, calculating the maximum possible row size can be done through a manual process detailed in Calculating Row Width and Choosing a Page Size. This can be labor intensive, but is important in capacity planning. Don’t forget to do a similar process to calculate space for indexes. Indexes are physical objects that consume space and are often counted as a part of the table size.

For an Existing Table

For an existing DB2 table, the system catalog views hold information to calculate both the maximum and the average row size.

Maximum Possible Row Size

The best option is to create this stored procedure, developed by Serge Rileau.

In some databases, our clients don’t want us to create objects, so this SQL does the same work:

with compr as (
  select 
    TABSCHEMA
    , TABNAME
    , CASE WHEN compression in ('B', 'V') THEN 2 ELSE 0 END as compression_mod
    FROM SYSCAT.TABLES AS T1
),
col as (
  SELECT TABSCHEMA
         , TABNAME
         , COLNAME
	 ,COALESCE(D.SOURCENAME, C.TYPENAME) AS TYPENAME 
         ,COALESCE(D.LENGTH, C.LENGTH) AS LENGTH
         ,C.SCALE, C.NULLS, C.INLINE_LENGTH, D.METATYPE
         ,D.INLINE_LENGTH AS STRUCT_INLINE_LENGTH
	 , CASE WHEN C.inline_length = 0 
                    THEN C.inline_length
                WHEN metatype = 'R' THEN D.inline_length 
                WHEN COALESCE(D.SOURCENAME, C.TYPENAME) IN ('CLOB', 'BLOB', 'DBCLOB') 
                    THEN CASE WHEN COALESCE(D.LENGTH, C.LENGTH) <=       1024 THEN 68
                              WHEN COALESCE(D.LENGTH, C.LENGTH) <=       8192 THEN 92
                              WHEN COALESCE(D.LENGTH, C.LENGTH) <=      65536 THEN 116										 
                              WHEN COALESCE(D.LENGTH, C.LENGTH) <=     524000 THEN 140
                              WHEN COALESCE(D.LENGTH, C.LENGTH) <=    4190000 THEN 164
                              WHEN COALESCE(D.LENGTH, C.LENGTH) <=  134000000 THEN 196
                              WHEN COALESCE(D.LENGTH, C.LENGTH) <=  536000000 THEN 220
                              WHEN COALESCE(D.LENGTH, C.LENGTH) <= 1070000000 THEN 252
                              WHEN COALESCE(D.LENGTH, C.LENGTH) <= 1470000000 THEN 276
                              WHEN COALESCE(D.LENGTH, C.LENGTH) <= 2147483647 THEN 312
                         ELSE -2 END
                WHEN COALESCE(D.SOURCENAME, C.TYPENAME) IN ('LONG VARCHAR', 'LONG VARGRAPHIC')
                           THEN 20
                WHEN COALESCE(D.SOURCENAME, C.TYPENAME) = 'XML' THEN 80
           ELSE 0 END as LOBLENGTH
    FROM SYSCAT.COLUMNS AS C
         LEFT OUTER JOIN SYSCAT.DATATYPES AS D
              ON D.typeschema = C.typeschema 
              AND D.typename = C.typename
              AND D.typemodulename IS NULL
              AND C.typeschema  'SYSIBM  '
),
tot as (select  
	col.TABSCHEMA as tabschema
	, col.TABNAME as tabname
        ,SUM(CASE TYPENAME 
            WHEN 'SMALLINT'        THEN length + compression_mod
            WHEN 'INTEGER'         THEN length + compression_mod
            WHEN 'BIGINT'          THEN length + compression_mod
            WHEN 'REAL'            THEN length + compression_mod
            WHEN 'DOUBLE'          THEN length + compression_mod
            WHEN 'DECFLOAT'        THEN length + compression_mod
            WHEN 'DECIMAL'         THEN TRUNC(length / 2) + 1 + compression_mod
            WHEN 'CHARACTER'       THEN length + compression_mod
            WHEN 'VARCHAR'         THEN length + 4 - compression_mod
            WHEN 'GRAPHIC'         THEN length * 2 + compression_mod
            WHEN 'VARGRAPHIC'      THEN length * 2 + 4 - compression_mod
            WHEN 'LONG VARCHAR'    THEN 24 - compression_mod
            WHEN 'LONG VARGRAPHIC' THEN 24 - compression_mod
            WHEN 'CLOB'            THEN loblength + 4 - compression_mod
            WHEN 'BLOB'            THEN loblength + 4 - compression_mod
            WHEN 'DBCLOB'          THEN loblength + 4 - compression_mod
            WHEN 'XML'             THEN loblength + 3 - compression_mod
            WHEN 'DATE'            THEN length + compression_mod
            WHEN 'TIME'            THEN length + compression_mod
            WHEN 'TIMESTAMP'       THEN length + compression_mod
            ELSE CASE WHEN metatype = 'R' THEN loblength + 4 - compression_mod
                     ELSE -3 END
            END 
          + CASE WHEN compression_mod = 0 AND NULLS = 'Y' THEN 1 ELSE 0 END) as row_size

  FROM compr join col on compr.tabname=col.tabname
		and compr.tabschema=col.tabschema
  GROUP BY col.tabschema, col.tabname)
select rtrim(tot.tabschema) as tabschema
       , rtrim(tot.tabname) as tabname
       , row_size + CASE WHEN compr.compression_mod = 0 THEN 2 ELSE 0 END 
  FROM tot join compr on tot.tabschema=compr.tabschema and tot.tabname=compr.tabname;

Note that the logic here properly handles lobs – reporting them as a part of the rowsize only if they are inlined, otherwise only reporting the lob locator as a part of the row size.

Average Row Size

The average row size is stored as part of the statistics in syscat.tables. The AVGROWSIZE column holds this information. It is updated only when statistics are collected, so before relying on it, make sure that RUNSTATS has been done on the table recently.

Actual Table Size

If calculating the total size of an existing table as a whole, it is a better choice to use the number of pages allocated for the table than taking the row size and multiplying by the number of rows. Calculating using pages will account for overhead and other issues within the table. The total number of pages for a table is specified in the FPAGES column of SYSCAT.TABLES. You would still need to add the size of indexes to this value

Summary

Need more detailed help with table sizes, page size selection, or capacity planning? Fill in the contact form on the right, and an XTIVIA DB2 specialist can be engaged to help.

For further information, be sure to read the exciting new things about DB2!

Share This