ORA-29702 during Flashback or reverting database upgrade

Anil Nair

Distinguished Product Manager (RAC) at Oracle

Published Nov 17, 2020

Mike Dietrich blogged about this issue but I received some feedback/questions that I thought it is best to provide additional details on this error.

and

As mentioned earlier, Mike has covered this topic in depth and this is my attempt to provide additional information along with snippets from the trace files to explain the root cause analysis process, so in future DBA's can perhaps reuse this information to better understand ORA-29702 errors

ORA-29702

An ORA-29702 error is typically signaled when the Database instance attempts to talk to the Clusterware (ocssd process). The ocssd process is the Oracle Cluster Synchronization Services daemon responsible for Node management services. The most common cause of 29702 error is typically some misconfiguration that breaks the authentication required by the database process. A simple orachk will check for required permission and provide hints on what is broken (hint: check oradism configuration). I have also covered some more details in this slide share "Smart Monitoring: How does Oracle RAC manage Resources and State".

Ocssd process has steadily added more and more features over the years. One such new feature is the increase in maximum number of members that CSSD can support to 2048 from 512). In order to benefit from this change, all entities that were earlier (prior to the upgrade) connected to the CSSD needs to be register with the new maximum members value.

So any GI and database release greater than 19.3 would work without signaling the ORA-29702 error as the 19.3 database would now use the new 2048 value for MaxMembers during registration. The value of maximum members that can connect to the CSSD is persisted to storage so that future database start up can reuse that information. The problem occurs when a customer either reverts the upgrade or performs flashback operation

Lets understand this with an example

GI version 11.2.0.4 running with 11.2.0.4 release of RDBMS with one database called Sales
Maxmembers used by Sales is 512
GI and Database is subsequently upgraded to 19.3
When database sales starts, it requests MaxMembers to be set to 2048
This value is persistently stored in the GI
Now either Sales DB is flashed back to 11204 or the upgrade was reverted to 11204
When Sales DB in 11204 version attempts to connect to the 19.3 GI, its Maxmember requested is 512 whereas the persistent value in the GI is 2048

Lmon trace file snippet

Batching factor: cache replay 114 size per lock 72
kjxggin: CGS tickets = 1000
kjxgrdmpcpu: CPU Total 14 Core 7 Socket 2 OCPU 14
kjxgrdmpcpu: High load threshold 17920
2020-06-03 13:52:41.714: [ CSSCLNT]clssgsGroupJoin: bad server response(-24)
kgxgnreg: error: status 1 (0 )
kjxgmjoin: can not join the group (DAALL_DB_XXXXXX) with id 0 (inst 1)
kjxgmjoin: kgxgn error 16
2020-06-03 13:52:41.714585 : IMR recording device closed, terminating IMR
kjfmreg: Joining the cluster failed with err code 16

Cssd trace file snippet

2020-06-15 14:52:31.537 : CSSD:487659264: [ INFO] clssgmCheckGrpAttrCompat: Change was rejected due to incompatible MaxMembers value grock DAALL_DB_XXXXX, ID 219071, current value 2048, requested value 512

As seen above, CSSD rejects the request by the RDBMS instance as the MaxMembers current value is 2048 and the requested value by the older version of the RDBMS is 512

The solution as Mike suggested is already in the latest RU's. Additionally it is possible to bypass this error by shutting the GI on all the nodes.

Somdyuti Paul

Database Management Specialist-Google

Anil Nair- good to know.

Aritra Kundu

Senior Manager at Oracle

This is addressed in lastest RU, workaround is cold crs restart

1 Reaction

JESUS BASTIDAS BRICEÑO

ORA-29702 during Flashback or reverting database upgrade

Anil Nair

Distinguished Product Manager (RAC) at Oracle

ORA-29702

Lmon trace file snippet

Cssd trace file snippet

More articles by this author

Insights from the community

Others also viewed

How can you identify the root cause of slow SQL query performance in Oracle Database?

How to restore SAP Oracle Online Backup?

Oracle Wait Events - db file scattered read

Leveraging Exascale Snapshots and Clones with Oracle Database #JoelKallmanDay

How to startup a Single Instance Oracle Database

Identifying Invalid Objects in Oracle

ORACLE Database Performance Tuning

Top 10 Advanced Oracle DBA Interview Questions and Expert Answers

Oracle Database - What and How to learn

UNDO management in Oracle Database with real world examples

Explore topics

ORA-29702

Lmon trace file snippet

Cssd trace file snippet

Staying upto date with Oracle Real Application Clusters (RAC)

Jan 30, 2024

Patching, Upgrading Operating Systems on Servers running Oracle Database

Jun 14, 2023

Oracle RAC features on Exadata

Jan 15, 2021

Oracle Automatic Storage Management (ASM)

Oct 19, 2020

Upgrade and apply latest RU/RURs at the same time

Jul 24, 2020

Upgrade to Oracle RAC 19c with Zero downtime

Feb 8, 2020

Under the Covers of Scalability and Availability

Jan 17, 2020

Noteworthy changes in Oracle RAC 18c and 19c

Nov 22, 2019

OOW2019- Oracle Real Application Clusters 19c: Best Practices and Secret Internals

Sep 10, 2019

High Level steps to upgrade to Oracle 19c RAC

Jul 23, 2019

Insights from the community

Others also viewed

How can you identify the root cause of slow SQL query performance in Oracle Database?

How to restore SAP Oracle Online Backup?

Oracle Wait Events - db file scattered read

Leveraging Exascale Snapshots and Clones with Oracle Database #JoelKallmanDay

How to startup a Single Instance Oracle Database

Identifying Invalid Objects in Oracle

ORACLE Database Performance Tuning

Top 10 Advanced Oracle DBA Interview Questions and Expert Answers

Oracle Database - What and How to learn

UNDO management in Oracle Database with real world examples

Explore topics