Introduction
Here I am trying to explain exactly what happens when using ALTER
TABLESPACE ... BEGIN BACKUP and ALTER TABLESPACE ... END BACKUP, and why
it is mandatory to use it when the online backup is done with a tool
that is external to Oracle ( such as OS backups using cp, tar, BCV, etc.
)
It also gives an answer to those frequent questions:
Does Oracle write to data files while in hot backup mode ?
What about ALTER DATABASE BEGIN BACKUP ?
Why it is not used with RMAN backups
What if you do an online backup without setting tablespaces in backup mode ?
What if the instance crashes while the tablespaces is in backup mode ?
How to check which datafiles are in backup mode
What are the minimal archive logs to keep with the hot backup ?
Why use OS backups instead of RMAN ?
Why BEGIN BACKUP takes a long time ?
Description
Offline backup (Cold backup)
A
cold OS backup is simple: the database has been cleanly shut down (not
crashed, not shutdown abort) so that: all datafiles are consistent (same
SCN) and no redo is needed in case of restore the datafiles are closed:
they will not be updated during the copy operation. Thus, it can be
restored entirely and the database can be opened without the need to
recover.
Online backup (Hot backup)
An
hot backup does the copy while the database is running. That means that
the copy is inconsistent and will need redo applied to be usable.
Recovery
is the process of applying redo log information in order to
roll-forward file modifications as they were done in the original files.
When
the copy is done with Oracle (RMAN), Oracle copies the datafile blocks
to backupset so that it will be able to restore them and recover them.
When the copy is done from the OS (i.e with a tool that is not aware of the Oracle file structure), several issues come up:
Header
inconsistency: Nothing guaranties the order in which the files are
copied, thus the header of the file may reflect its state at the
beginning or at the end of the copy.
Fractured
blocks: Nothing guaranties that an Oracle block is read in one single
i/o so that two halves of a block may reflect its state at two different
points in time.
Backup
consistency:As the copy is running while the datafile is updated, it
reads blocks at different point in time. The recovery is able to roll
forward blocks from the past, but cannot deal with blocks from the
future, thus the recovery of the copy must be recovered at least up to
the SCN that was at the end of the copy.
So
it is all about consistency in the copy: consistency between datafiles,
consistency within datafiles and consistency within data blocks, and
keep this consistency in the current files (obviously) as well as in the
copy (as it will be needed for a restore/recovery)
Backup mode
The
goal of ALTER TABLESPACE ... BEGIN BACKUP and ALTER TABLESPACE ... END
BACKUP is to set special actions in the current database files in order
to make their copy usable, without affecting the current operations.
Nothing
needs to be changed in the current datafiles, but, as the copy is done
by an external tool, the only way to have something set in the copy is
to do it in the current datafiles before the copy, and revert it back at
the end.
This
is all about having a copy that can be recovered, with no control on
the program that does the copy, and with the minimal impact on the
current database.
In order to deal with the 3 previous issues, the instance that will do the recovery of the restored datafiles has to know:
that the files need recovery
from which SCN, and up to which SCN it has to be recovered at least
enough information to fix fractured blocks
During backup mode, for each datafile in the tablespace, here is what happens:
1- When BEGIN BACKUP is issued:
The hot backup flag in the datafile headers is set, so that the copy is identified to be a hot backup copy.
This is to manage the backup consistency issue when the copy will be used for a recovery.
A checkpoint is done for the tablespace, so that no dirty buffer remains from modifications done before that point.
Begin backup command completes only when checkpoint is done.
2- During backup mode:
The
datafile header is frozen so that whenever it is copied, it reflects
the checkpoint SCN that was at the beginning of the backup.
Then, when the copy will be restored, Oracle knows that it needs to start recovery at that SCN to apply the archived redo logs.
This is to avoid the header inconsistency issue.
That means that any further checkpoints do not update the datafile header SCN (but they do update a 'backup' SCN)
Each
first modification to a block in buffer cache will write the full block
into the redo thread (in addition to the default behaviour that writes
only the change vector).
This
is to avoid the fractured block issue. There may be a fractured block
in the copy, but it will be overwritten during the recovery with the
full block image.
That means that everything goes as normal except for two operations:
- at checkpoint the datafile header SCN is not updated
-
when updating a block, the first time it is updated since it came in
the buffer cache, the whole before image of the block is recorded in
redo
- direct path writes
do not go through the buffer cache, but they always write full blocks
and then full block is written to redo log (if not in nologging)
3- When END BACKUP is issued:
A
record that marks the end of backup is written to the redo thread so
that if the copy is restored and recovered, it cannot be recovered
earlier than that point. This is to avoid the backup consistency issue.
The hot backup flag in the datafile headers is unset.
The header SCN is written with the current one.
Remarks:
the
fractured block is not frequent as it happens only if the i/o for the
copy is done at the same time on the same block as the i/o for the
update. But the only mean to avoid the problem is to do that full
logging of block for each block that will be written while the copy is
occuring, just in case.
if
the OS i/o size is multiple of the Oracle block size (e.g backup done
with dd bs=1M), that supplemental logging is probably not needed because
fractured blocks cannot happen.
the
begin backup checkpoint is mandatory to manage the fractured block
issue: if a dirty buffer remains, from a modification done before the
begin backup, it would have no full-image redo, and may be subject to
fractured block when written to disk .
The
supplemental logging occurs when accessing the block for the first time
in the buffer cache. If the same block is reloaded again in the buffer
cache, supplemental logging will occur again. I haven't seen that point
documented, but a testcase doing a 'flush buffer_cache' proves that.
Consequence on the copy (the backup)
When
the copy has been done between begin backup and end backup, the copy is
fully available to be restored and recovered using the archive log
files that where generated since the begin backup.
After
the files have been restored, Oracle sees that the SCN is older than
the current one and says that the database needs recovery.
The
recovery must be done up to a point in time ulterior to the end backup
point in time so that we are sure that there is no blocks in the
datafile that comes from the future.
Consequence on the current database
All operations can be done during the backup mode.
However,
as more logging is written, it should be done during a low activity
period. And for the same reason, it is better to do the tablespaces one
after one instead of putting all the database tablespaces in backup
mode.
In addition, it is not possible to shutdown the database while a tablespace is in hot backup.
This is because, as the datafile header is frozen with a non current SCN, the datafile would be seen as if it requires recovery.
However that cannot be avoided if the instance crashes (or shutdown abort), and then the startup of the database will raise:
ORA-1113: file ... needs media recovery
This
is the only case I know where instance recovery is not automatic, you
need to issue 'alter database... end backup;' before opening the
database.
Frequent questions
1.Does Oracle write to data files while in hot backup mode ?
Ans: Yes of course, it would not be called 'online' backup if it were not the case.
2. What about ALTER DATABASE BEGIN BACKUP ?
Ans: That command put all database tablespaces in backup mode at the same time.
As
seen previously, it is a bad idea to put all tablespaces in backup
mode, as it is better to do it one by one in order to minimize the
supplemental redo logging overhead. Oracle introduces this 'shortcut'
for one reason only: when doing backup with a mirror split (BCV,
Flashcopy, etc), the copy gets all the datafiles at the same time, and
the copy lasts only few seconds. In that case, it is easier to use that
command to put all tablespaces in backup mode during the operation.
3. Why it is not used with RMAN backups
Ans: RMAN is an Oracle tool, that is totally aware of the datafile structure, and the way they are written.
Then,
it knows how it can read the datafiles in a way the copy is consistent:
write the good version of datafile header, read the blocks with an i/o
size that is multiple of the Oracle block size so that there is no
fractured blocks, and check head and tail of the block to see if block
is fractured (in that case, it re-reads the block to get a consistent
image). That is one advantage among many others of using RMAN for
backups.
4. What if you do an online backup without setting tablespaces in backup mode ?
Ans:
If you don't put the tablespace in backup mode, we can't be sure that
the copy is recoverable. It may be fine, but it may have
inconsistencies.
We can suppose that the copy is consistent if we make the copy with the following conditions
Header inconsistency: If the file copy is done from beginning to end, then the datafile header should reflect the right SCN
Fractured blocks: If the copy does i/o with a size that is multiple of the Oracle block size, then you should not have fractured blocks
Backup consistency:If you take care to recover later than the point in time of the end of the copy, you should not have inconsistency
But
there may be other internal mechanisms that are not documented so that
we can't be sure that this list of issues is exhaustive.
And, as it is not supported, we cannot rely on a backup done like that. Note that you will have no message
5. What if the instance crashes while the tablespaces is in backup mode ?
Ans"
When you start the database after that, Oracle will say that it
requires recovery. This is because the SCN was frozen and it is the
expected behaviour because if you restore the copied file, it has to be
recovered. (and the only way Oracle has to set its value in the copy is
to set it in the current file while it is copied). In that case you can
can issue:
ALTER DATABASE END BACKUP;
ALTER DATABASE OPEN;
to open the database.
But your backup is not usable, you have to do it again.
6. How to check which datafiles are in backup mode
Ans: The V$BACKUP view shows the datafiles that are currently in backup mode (status ACTIVE).
Some old documentation says to check V$DATAFILE_HEADER column FUZZY.
This
is because in previous versions (<9i) the begin backup unsets the
online fuzzy bit in the datafile header, and set it back at when end
backup is issued.
Since 9i, the online fuzzy bit is unset only when datafile is offline or read-only, not for backup mode.
7. What are the minimal archive logs to keep with the hot backup ?
ans: The backup done online is unusable if there is not at least the possibility to restore archive logs
- from the archive log that was the current redo log when the backup started,
- up to the archive log that was archived just after the backup (of the whole database) ended.
That is sufficient to do an incomplete media recovery up to the point of 'end backup'.
Subsequent archive logs will be needed to recover up to the point of failure.
8. Why use OS backups instead of RMAN
Ans:
The best way to do online backups is using RMAN as it has a tons of
features that you cannot have with OS backups. Yet, the OS backup are
still used when using OS tools that can copy an entire database in
seconds, using mirror split (BCV, Flashcopy, etc), for very large
databases.
9. Why BEGIN BACKUP takes a long time ?
Ans:
BEGIN BACKUP has to checkpoint the dirty buffers related with the
tablespace, so that all future writes are protected by supplemental
logging.
The duration of that checkpoint is proportional to the buffer cache size and the number of datafiles.
The performance of BEGIN BACKUP checkpoint has been improved in 10g.
Good article to learn
ReplyDeleteI have a small doubt regarding backup I want to know whether backup taken is HOTBACKUP OR COLDBACKUP is there any v$ view or any other method to find out the same.
I am using RMAN with out catalog for taking backup
@Sujith: I hope you know what Hot backup and Cold backup is. In Hot backup the DB is up and running and in Cold backup it is in mounted stage and not open. The simplest way to find out whether the backup taken was HOT or COLD is to check the SCN number of all the datafiles.
ReplyDeleteIf you have pfile and controlfile mount and restore the database. Now check the SCN number of all the datafiles using query:
select CHECKPOINT_CHANGE# from v$datafile;
This should be same for all the datafiles in case of Cold backup as the DB is not running while it should be different in case of Hot Backup as the DB is up and running.
Second method is to check the datafile information. Use this query for this:
select distinct fhsta from x$kcvfh;
If output for above query gives 0, then the all the files are in cold backup mode.
If output for above gives 1, then all the files are in hot backup, we need to apply more archives
For your information, x$kcvfh is the internal table which reads the data from all the datafile header.
Hope this answers your query.. Will be happy to help with other queries if you have ... Keep posting...
What if ur database is still in backup mode and you lost archive log before u perform end backup ? how would we resync tablespace now
ReplyDeleteContact to DB Recovery Support to bring out the Oracle Recovery Problem
ReplyDeleteAssume you execute any order and in the meantime you discovered blunder has happen then what conceivable techniques you are taking care of this issue? Well! In the first place you need to check and screen what sort of database you are utilizing and after that you need to likewise check which kind of order you are executing. When you check these things and as yet confronting a similar issue at that point rapidly contact to Cognegic's Exchange Database Recovery or DB Recovery Services. Here we give viable answer for these issues.
For More Info: https://cognegicsystems.com/
Contact Number: 1-800-450-8670
Email Address- info@cognegicsystems.com
Company’s Address- 507 Copper Square Drive Bethel Connecticut (USA) 06801