Sunday, April 22, 2012

Oracle Data-guard Issues - 'APPLIED'-Column not updated in v$archived_log table


I am taking a break from my regular route to Beginners Guide because indeed today I faced an issue which typically was confusing. So confusing that at-last when I got over it I thought of blogging it so that if similar issues you face in the near future you can get an instant solution to it and won’t go around pulling your hairs.

So coming to the issue let me brief the scenario in short. 

We have a Production Database where heavy OLTP transactions occur and to support the disaster recovery for this server we have a Data-guard solution set up. The Oracle version we are using here is 10.2.0.4 Enterprise Edition on Sun SPARC OS.

Yesterday, the client reported that DG is not in Synch with the Production so that’s where my job came into scene. Logging into the database I found that indeed there was an archive log gap and due to this the DG had stop synching with the Production.

I ran the following command on DG to verify the archive log gap.

SQL> SELECT THREAD#, LOW_SEQUENCE#, HIGH_SEQUENCE# FROM V$ARCHIVE_GAP; 


 THREAD#          LOW_SEQUENCE  HIGH_SEQUENCE#
----------------------------------------------
1          65524      65533

SQL> SELECT max(sequence#) AS "STANDBY", applied FROM v$archived_log GROUP BY applied;

STANDBY APPLIED
-----------------
65783    NO
65523    YES

From above output it is quite clear that there is an archive gap and due to which the recovery services had stopped. The first thing to do insuch scenario is to check if the missing archives are present at the Production. If they are present.. Pheww!!!! You are saved from a hectic recovery schedule because you just need to take the backup of this archives from production or transfer these archives from Production to DR and apply them there. But in case you are not so lucky and you don’t have archive backups at the production you need to follow the other way around where you need to take the incremental RMAN backup at Production and restore the same at the DR end. But that is the topic for another blog. 

In my case I was too lucky as the retention policy at the Production for archives is 3 days. So the archives were present there. Lucky me… So what I did was just transferred the missing archives from Production to DR and at DR stopped the media recovery process, recovered the DR with RMAN and started the media recovery process. Guess this should have solved my problem. Right!!!!!! Well I was wrong… So what went wrong? These are the steps I followed.

  1. Used Simple SCP to transfer the missing archives from sequence 65524 to 65533 from Production to DR.
  2. Stopped the Media recovery Process at DR.
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

  1. Recover the DR database using simple RMAN recovery.
bash$  rman target /
rman> RECOVER DATABASE UNTIL SEQUENCE 65533 THREAD 1;

  1. Start the media recovery process again.
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM CURRENT LOGFILE;

At first instance all was well. The recovery process started well and the gap was filled and synching was completed. Then the real problem arose.
Well I was checking the DR end regarding the synching using following commands.

sql>  ARCHIVE LOG LIST
      Database log mode                     Archive Mode
Automatic archival                    Enabled
Archive destination                   C:\app\mudassar\archive
Oldest online log sequence            65783
Next log sequence to archive  
Current log sequence                  65794

      sql> SELECT THREAD#, LOW_SEQUENCE#, HIGH_SEQUENCE# FROM V$ARCHIVE_GAP; 
         no rows selected


sql> SELECT max(sequence#) AS "STANDBY", applied FROM v$archived_log GROUP BY  applied;

STANDBY APPLIED
-----------------
65794    YES

Seems fine… But hell No Client was not convinced he ran the following query at the Production and the output was always NO.

sql> select a.sequence#,a.applied,a.archived, b.sequence#, b.applied,b.archived from  (select * from v$archived_log where sequence# in(select max(sequence#)-1 from v$archived_log) and name='stndby') a, (select * from v$archived_log where sequence# in(select max(sequence#)-1 from v$archived_log) and name like '/archive/archive-log%') b where a.sequence#!=b.sequence# OR a.APPLIED!='YES' OR b.APPLIED!='NO';

SEQUENCE#  APP ARC
------------------------------------
65794       NO  YES
65794       NO  YES

Whats wrong in this then? Why the output different at Production and DR when it should be same? After lots of searching, I found that this is bug in Oracle 10g whereby the MEDIA RECOVERY PROCESS gets hanged due to some reason causing this issue. I won’t go in more detail regarding the bug but if you have metalink account on Oracle you can logged in and go through Note: 1369630.1 if not you can download the documentation here.


Note that this error doesn’t mean that Production and DR are not in synch but as the Media Recovery Process is hanged at Production Site it is not able to update the internal views. The solution to this is upgrade to Oracle 11g or higher (Oracle 12c is already introduced) or you can restart the Production Database if possible to overcome the issue temporarily.

Hope this would have been help to all you folks!!!! Comments are WELCOME!!!!

5 comments:

  1. really awesome...thats why I like your blog Mudassar. Thanks

    ReplyDelete
  2. Thanks Aman .. Your comments mean a lot :)

    ReplyDelete
  3. Hi Mudassar,
    Thanks for sharing this. It's occurred on my environment if there is any network hiccup for a period even my timeout setting is longer than that. Instead of restart a production database, killing all archived background processes is fine to fix the primary v$archived_log.applied.

    Bundit

    ReplyDelete
  4. excellent explanation... helps alot.Thank you

    ReplyDelete
  5. I am very thankful to you for providing such a great information. It is simple but very accurate information.

    ReplyDelete