Introduction
The Oracle GoldenGate (OGG) Extract option SQLPREDICATE may be used to speed-up the initial load of large tables when the source database is Oracle, DB2, SQL Server, or Teradata. In this article we shall discuss the various OGG options for splitting data across multiple initial load Extract groups and demonstrate the effectiveness of each.
The scripts and information provided in this article are for educational purposes only. They are not supported by Oracle Development or Support, and come with no guarantee or warranty for functionality in any environment other than the test system used to prepare this article.
Main Article
When planning an initial load and the source database is either Oracle, DB2, SQL Server, or Teradata, the OGG Extract option SQLPREDICATE is a better choice for splitting large tables than WHERE or FILTER. As documented in the Oracle GoldenGate Reference Guide, SQLPREDICATE:
- Forces the records returned by the Extract’s SELECT statement to be ordered by the key values.
- Keeps the Extract from fetching all of the records before filtering them.
- For Oracle tables, reduces the amount of data that is stored in the undo segment.
It is important to note that the columns specified as part of the SQLPREDICATE WHERE clause must be part of a key or index to prevent full table scans which reduce efficiency and performance.
Source Data Size
In my source Oracle database I have one table that is 3 GB in size::
TABLE_NAME NUM_ROWS AVG_ROW_LEN TOTAL_BYTES ------------------------------ ------------ ------------ -------------------- MILLTBLEA 2,000,205 1,188 3,019,898,880
We want split the load of MILLTBLEA across multiple OGG processes; so lets take a look at the performance for the various options; FILTER, WHERE, and SQLPREDICATE. To keep things simple, we will configure two initial load Extracts; where in a production scenario we would probably use more. So our test configurations will be:
Load Option | First Extract | Second Extract |
---|---|---|
FILTER (@RANGE()) | extract tmysqlc userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 rmtfile ./dirdat/tq, maxfiles 999999, megabytes 1500, purge table east.milltblea, filter (@range (1,2)); |
extract tmysqld userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 rmtfile ./dirdat/tr, maxfiles 999999, megabytes 1500, purge table east.milltblea, filter (@range (2,2)); |
FILTER ([col value]) | extract tmysqlc userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 rmtfile ./dirdat/tq, maxfiles 999999, megabytes 1500, purge table east.milltblea, filter (cola < 499294558289218750); |
extract tmysqld userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 rmtfile ./dirdat/tr, maxfiles 999999, megabytes 1500, purge table east.milltblea, filter (cola >= 499294558289218750); |
WHERE | extract tmysqlc userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 rmtfile ./dirdat/tq, maxfiles 999999, megabytes 1500, purge table east.milltblea, where (cola < 499294558289218750); |
extract tmysqld userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 rmtfile ./dirdat/tr, maxfiles 999999, megabytes 1500, purge table east.milltblea, where (cola >= 499294558289218750); |
SQLPREDICATE | extract tmysqlc userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 rmtfile ./dirdat/tq, maxfiles 999999, megabytes 1500, purge table east.milltblea, SQLPREDICATE “WHERE COLA < 499294558289218750″; |
extract tmysqld userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 rmtfile ./dirdat/tr, maxfiles 999999, megabytes 1500, purge table east.milltblea, SQLPREDICATE “WHERE COLA >= 499294558289218750″; |
FILTER(@RANGE()) may be used to divide the rows of any table across two or more OGG processes. A hash value is computed based upon the specified input column; which in this case will be the primary key.
To determine the value of the primary key column, COLA, for the filter and where clauses; we query the database for the median data value:
SQL> col COLA format 999,999,999,999,999,999 SQL> select median (cola) as COLA from milltblea; COLA ------------------------ 499,294,558,289,218,750
In my test environment, the data read time for each test execution was:
Load Option | TMYSQLC Records Read | TMYSQLC Read Time (secs) | TMYSQLD Records Read | TMYSQLD Read Time (secs) | Total Read Time (secs) |
---|---|---|---|---|---|
FILTER (@RANGE()) | 1,000,571 | 465 | 999,634 | 345 | 810 |
FILTER ([col value]) | 1,000,102 | 311 | 1,000,103 | 188 | 499 |
WHERE | 1,000,102 | 138 | 1,000,103 | 171 | 309 |
SQLPREDICATE | 1,000,102 | 124 | 1,000,103 | 141 | 265 |
As can be seen in the above table, WHERE and SQLPREDICATE provide better read rates than FILTER. However, SQLPREDICATE does have slightly better performance than WHERE because the Extract SELECT statement has better optimization.
When the source database is Oracle, the SQLPREDICATE clause “AS OF SCN [scn value]” may be specified to have Extract perform an Oracle Flashback Query. Using Flashback is faster that reading directly from the source table. To demonstrate this, we set-up the following test:
Direct Read Extract | Flashback Query Extract |
---|---|
extract tmysqld userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 rmtfile ./dirdat/tr, maxfiles 999999, megabytes 1500, purge table east.milltblea; |
extract tmysqlc userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 rmtfile ./dirdat/tq, maxfiles 999999, megabytes 1500, purge table east.milltblea, sqlpredicate “AS OF SCN 27260769″; |
Records Read: 2,000,205 | Records Read: 2,000,205 |
Read Time (secs): 388 | Read Time (secs): 343 |
Summary
Oracle GoldenGate provides the user with multiple options for partitioning large tables across multiple initial load processes. In this articles we demonstrated that, where supported, the SQLPREDICATE option is the most effective choice.
All content listed on this page is the property of Oracle Corp. Redistribution not allowed without written permission