Wednesday, May 10, 2017

Recover from Oracle 11g XE database file corruption on Windows 7

Recently, I experienced an eerie Hard Disk screeching sound from my laptop, a worst nightmare a professional can have - The sign that the hard disk was on its last legs.
Apart from the valuable documents, my machine contained a prototype Oracle XE 11g database. It also contained several Apex screens that I had recently developed as part of the prototype.

With the early warning of the screeching sound, I was able to recover most of the documents. However, for the Oracle database, there was one DBF file that was corrupted. The OS copy of this file continued to fail.

This blog post will discuss the methodology of recovering the Oracle DBF files if you run into corruption of the database files and do not have any backups.

I ran Windows chkdsk on the server and it ran fine with 0 indications of the bad sectors.

I tried using the "Backup Database" feature of the XE database but it would fail indicating the corruption in the DBF file.

After doing some search on the web, I was enlightened with the DBV utility that Oracle database comes with. Its an acronym for DB Verify. This tool validates the integrity of the DB files.

I ran this tool against a few files that were known to be good.
C:\oracle\product\oraclexe\app\oracle\oradata\XE>dbv file=SALES_REP.DBF
DBVERIFY: Release 11.2.0.2.0 - Production on Wed May 10 21:44:31 2017
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
DBVERIFY - Verification starting : FILE = C:\ORACLE\PRODUCT\ORACLEXE\APP\ORACLE\ORADATA\XE\SALES_REP.DBF

DBVERIFY - Verification complete

Total Pages Examined         : 2824
Total Pages Processed (Data) : 465
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 710
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 803
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 846
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 2086537 (0.2086537)

I ran this tool against the known bad file and voila, it validated that the file was indeed corrupt.
C:\oracle\product\oraclexe\app\oracle\oradata\XE>dbv file=ORDERS_REP.DBF

DBVERIFY: Release 11.2.0.2.0 - Production on Wed May 10 21:46:44 2017

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = C:\ORACLE\PRODUCT\ORACLEXE\APP\ORACLE\ORADATA\XE\ORDERS_REP.DBF

DBV-00600: Fatal Error - [28] [27070] [0] [0]

So, Oracle validated that the file was corrupt even though the OS didnt think so 😈 

As I mentioned earlier, the normal OS file copy did not work. I used xcopy instead. This forgotten command from the world of MSDOS 6.0 has an option /C that lets the copy continue even if the errors are encountered.
  xcopy ORDERS_REP.DBF ORDERS_REP_W.DBF /C
and the file got copied successfully.


Now, I ran the DB Verify again on this new file and the results were better.

C:\oracle\product\oraclexe\app\oracle\oradata\XE>dbv file=ORDER_REP_W.DBF
DBVERIFY: Release 11.2.0.2.0 - Production on Wed May 10 21:48:17 2017
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
DBVERIFY - Verification starting : FILE = C:\ORACLE\PRODUCT\ORACLEXE\APP\ORACLE\
ORADATA\XE\ORDERS_REP_W.DBF

DBVERIFY - Verification complete
Total Pages Examined         : 150168
Total Pages Processed (Data) : 41226
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 11763
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 85845
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 11334
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 2117572 (0.2117572)

Now, the time to attach this file to the database.
Shutdown the database using the following commands
sqlplus / as sysdba
sqlplus> shutdown immediate
Database dismounted.
ORACLE instance shut down.

Rename the ORDERS_REP.DBF to ORDER_REP_original.DBF
Rename the file from ORDERS_REP_W.DBF to ORDERS_REP.DBF

Bring the database back 

SQL> startupORACLE instance started.
Total System Global Area 1068937216 bytesFixed Size                  2260048 bytesVariable Size             784335792 bytesDatabase Buffers          276824064 bytesRedo Buffers                5517312 bytesDatabase mounted.ORA-01113: file 6 needs media recoveryORA-01110: data file 6:'C:\ORACLE\PRODUCT\ORACLEXE\APP\ORACLE\ORADATA\XE\ORDERS_REP.DBF'

The copied file will not work. So, the media recovery needs to be performed.

SQL> shutdown immediateORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
SQL> startup mountORACLE instance started.
Total System Global Area 1068937216 bytes
Fixed Size                  2260048 bytes
Variable Size             784335792 bytes
Database Buffers          276824064 bytes
Redo Buffers                5517312 bytes
Database mounted.
SQL> recover databaseMedia recovery complete.
SQL> alter database open;Database altered.

I was able to bring the database up.

Also, the Backup Database command worked now with the newly created DB file.



Sunday, October 14, 2012

A tale of two agents

With ODI 11g, Oracle has provided integration of ODI agents with J2EE infrastructure to provide High Availability and Load Balancing in a RAC environment. However, there are several small to mid size businesses which do not want to invest in this infrastructure and hence cannot use High Availability and Load balancing feature of ODI agents in the J2EE environment.

The good news is that Oracle is continuing to maintain the Standalone agent for ODI 11g and with its basic features of load balancing, this may be enough in some situations.

We cannot achieve the high availability as promised by the Weblogic implementation but we can certainly configure the standalone agents in a way where we can load balance, have minimum downtime and yet maintenance can be done on the agents.

I have seen many implementations where it is chosen to use only One instance of an agent and everything might seem to work well initially. And then there is a long way to go.

And that does beg a question - In what cases, would a "single agent" implementation be not enough.

Will I need to perform maintenance operations on the agent, reboot the agent and if at that time a second agent is available, surely, it can come to the rescue of my long running processes and also share/transfer the load.

Here are a few cases to think:

1.) You make extensive use of Jython and external Java libraries inside the ODI JVM. This may cause the JVM overflow and make an agent unresponsive.

2.) New code needs to be deployed to the agent which warrants an agent bounce. In case of a single agent, the processes that are running will be impacted. However, if you have a multi-agent setup, then the currently running processes can be shifted over to the other running agent.

3.) The agent JVM parameters need to be changed - adding more memory to JVM or changing the path where the JVM creates its log files, etc

ODI supports the notion of a Master Agent and several slave agents and ideally each ODI implementation should have atleast 3 agents - 1 Master and 2 slave. This is how a minimal ODI configuration should look like for a Master - Slave setup:


The master agent accepts the job and passes it to the available slave agent. This configuration can be specified in the ODI topology:


The master agent is the one that is tied to a logical agent and a context. You can chose to have master agent to be a "worker agent" - it can also perform processing and share load (by selecting it as "linked" agent). Or it can be configured to be merely a delegator. Its advisable to configure the master agent in a "delegator" mode only. So, this agent can have basic ODI configuration and doesnt do anything fancy. This way there would less chances of this agent going down (either buffer overflow or other maintenance operations).

The other "n" slave agents can be tied to the Master agent as "linked" agents. Now when a job is submitted to the Master agent, it finds which slave agent that has less load and delegates the job to that agent. In case, a slave agent is unavailable, it will re-assign the job to other available agent(s).

So, in case you want to bring an agent down, the load can be diverted to the other available agent:


However, its not simple that you simply bring an agent down and the Master would divert the load other agents. The sessions that are running need to be completed. The available sessions parameter of the agent to be brought down should be set to 0. So, that no new connections are accepted. Once the existing job is complete, the "Use new Load Balancing" parameter - if set to Yes will rebalance the load and start using the other agents for subsequent steps.



Thursday, December 29, 2011

Parallelism of scenarios

Lets take a deep dive into the parallelism capability of ODI using StartScen and Asynchronous mode.
Here is a nice article from Christophe (http://blogs.oracle.com/dataintegration/entry/parallel_processing_in_odi) that illustrates how this can be achieved.
However, the important thing to understand here is that the job kick-off process is not truly parallel.

A truly parallel process should look like the following where the Start will spawn 3 parallel processes.

Whereas in ODI, the above behavior is mimicked by connecting jobs sequentially to each other and then executing them in Asynchronous mode.

Does the question arise - What's the difference as long as the jobs are executing in parallel ? A slight difference.
Consider a scenario when the scenario "StartEmpLoad" cannot start due to some reason (Scenario not available or Agent selected for the Scenario is down). In that case, the next (or did we say parallel) job "StartSalesLoad" will never be kicked off. Since both of them are connected with an "OK" and since the first job wasnt OK, the flow never reaches the next process. This error is a job kick-off error and is different from the data errors that would normally be encountered during a job execution (Duplicate keys, Data not found, etc).

Each StartScen in here kicks off a job and then moves on to the next StartScen. Being Async jobs, even though they dont wait for the results to come back, they do make sure that the job was kicked off successfully (vs completed successfully).

Wednesday, December 28, 2011

ODI Tools without Hardcodings

ODI provides us with a set of utilities that can assist us in day to day tasks such as Ftp, FileCopy, SendMail, SQLUnload, etc. These tools make ODI powerful.
These tools are available in the Toolbox (once you start creating a package).

A common problem of all the tools is the need of hardcoding file paths, user names, passwords, etc. This is the least desirable of any tool. As a developer you will want the flexibility to change the parameters whenever the need arises. Unfortunately, the way the tools are laid out this change is not very intuitive.
It seems that the values need to be changed and then the scenarios need to be recompiled.

Wait ... Did I say recompile the scenarios for changing a parameter ? Ohh. Such a basic problem.
C'Mon. Its better to write programs in C where the parameters can be passed from the Command line using argv[].

How did such an excellent tool miss on this one ?
Ok. Ok. Lets explore a couple of ways in which the life can be simplified.

  1. Use them as Sunopsis API - Each tool can be used as a Sunopsis API in the ODI Procedure. Its very easy to see the syntax of the corresponding API for each tool. Once you drag-drop the tool inside a package, in the Properties Explorer click on the "Command" tab. As you keep adding parameters on the "General" tab, the "Command" tab keeps getting updated with the values. ODIexperts specify an excellent way of invoking the ODI tools inside an ODI procedure - http://odiexperts.com/calling-odisqlunload-using-odi-procedure-with-no-hardcoded-password

  2. Use ODI variables - Define an ODI Refresh variable for each of the parameter that you want to parameterize. Before invoking the tool, refresh the ODI variable from a database configuration table. Use this ODI variable in the parameters section of the tool. The following snapshots will illustrates how this can be done.


  3. This method will mandate that the package contains several variables. This may make the package look too full with too many connections.

  4. Use Jython Variables - This method is an enhancement to the method mentioned above. The only difference being the use of Jython variables instead of ODI variables and a bit more :D . The real advantage of this approach is the less cluttered ODI package as all the variables being used are created in the ODI procedure.
    This will be a 2 step process:
    • Create and Refresh all the Jython variables in an ODI procedure.
    • Use those variables in parameters of ODI Tool inside the package.

    In the example that is provided here, the odiSQLUnload tool is used and the Jython variables are being used for the Connection information. The procedure that will contain the commands to declare and populate the jython variables will look as follows:


    Then the package will be created where this procedure will preceed the ODI tool OdiSqlUnload.

    The OdiSqlUnload tool's command should be configured to look similar to this:

    OdiSqlUnload "-FILE=/MyPath/sqlunloadfile.txt" "-DRIVER=<@=jvJDBC@>" "-URL=<@=jvURL@>" "-USER=<@=jvUser@>" "-PASS=<@=jvPass@>" "-FILE_FORMAT=VARIABLE" "-FIELD_SEP=," "-ROW_SEP=\r\n" "-DATE_FORMAT=yyyy/MM/dd HH:mm:ss" "-CHARSET_ENCODING=ISO8859_1" "-XML_CHARSET_ENCODING=ISO-8859-1" "-FETCH_SIZE=5000"
    select emp_no from emp
The above method illustrates an effective usage of Jython variables.
In a similar fashion other values can be refreshed from a DB configuration table and then used inside the ODI tools.

Hope this helps in laying a path towards cleaner usage of ODI tools without hardcoding.

Wednesday, April 20, 2011

When Temporary is not Temporary

Lets have a little questionnaire about the temporary interfaces that are used in Oracle Data Integrator. They are also popularly known as "Yellow Interfaces" because of the Yellow icon that ODI gives them at creation as opposed to the Blue icon of a regular interface.



Which of the following statements are false about Temporary interfaces:
  1. The tables get dropped once the calling interface is complete.
  2. The data in the temporary interfaces is deleted once the calling interface is complete.
  3. The table referred to as Target in the Temporary interface is created each time the calling interface invokes it.
  4. They can only be executed in Sunopsis Memory Engine.

Answer Choices:
1 is false
2 is false
3 is false
1 and 2 are false
1 and 3 are false
2 and 4 are false
All of the above



The answer is "All of the above" statements are false.

I will elucidate upon each of the above points and remove common misconceptions:
  1. The table that gets created stays there forever. It is a physical table that exists.
  2. The data in the temporary interface also stays there. It has to be removed manually or overwritten. Even though, this sounds similar to Global temporary tables in Oracle, it is not.
  3. This is dependent upon how you have configured your interface. If the option "CREATE_TARG_TABLE" is set to "Yes", then ODI will attempt to create this table each time. Otherwise, it will NOT. If the table already exists, the statement to create the table will raise an Error (Warning).
  4. The fact that you create a Temporary interface using Sunopsis Memory Engine doesnt mean that it cannot be created in the RDBMS using are working with. You can also create the temporary interfaces in the database of your choice. The fact of the temporary interface being in Memory is coincidental to the idea of a temporary datastore. If the Sunopsis Memory Engine is chosen as the "Work Schema", then the temporary table will remain there untill the Engine is rebooted/shutdown or an explicit code to drop that table is issued.

A very important statement -
There is nothing temporary about a temporary interface, but its name
.

The term "temporary" comes into picture because the temporary area (or the staging area) can be purged after executions. Though, they have to be purged manually or in a process.
The advantage of using a Yellow interface is ONLY to avoid the actual preparation of the destination datastore in the database or creation of this datastore in the "Model". Since, these are deemed to be temporary, theoretically, they are not part of our data model and hence, dont need to be in the "Model".

In all cases, the destination data is written to a database (either in memory of RDBMS). This depends on what staging area has been chosen.

Typically, you will want to create a temporary interface that gets invoked as a source in another permanent (blue) interface. A yellow interface may be invoked as a source in more then one blue interface.
In this case, each of the blue interfaces will share the same table (populated by yellow interface). So, if you are trying to run those jobs in parallel, then you may want to redesign the usage of yellow interface.

The temporary Interface, has an option to choose its datastore location, being: Data Schema or Work Schema. If the Work Schema is chosen and for StagingArea the Memory Engine then it will remain there until the Engine is shutdown or an explicit code to drop the table is issued.

Wednesday, February 23, 2011

ODI startscen.sh forks local agent for a scenario

I will discuss the behavior of odi startscen.sh/bat command. Often, we use this command to execute a scenario from our custom shell/batch scripts or might even add it to our workflow engine.
The syntax being:
startscen.sh scn_load_data 001 GLOBAL -NAME=odiSchedAgent001

However, there is an important point to know the underlying functioning of the startscen command. This command forks out a new agent process that is local to the machine on which the command was started.
The startscen.sh has a parameter -NAME=<agentName> which naturally makes us assume that this agent will be used to execute the scenario. However, this parameter is just used to store the agent name in the logs of the operator. It has nothing to do with using that agent to execute the scenario.
Another obvious side effect is that the OS execution privileges of the user who initiated the startscen command are used to generate log files or any other OS files, which may be different from the execution privileges of the remote agent that you intended to use.

Consider a small test case:

(1) Use Unix account "A" to start a background agent process by agentscheduler.sh.

agentscheduler.sh -PORT=20910 -NAME=odiSchedAgent001

(2) On the same Unix machine, use Unix account "B" to execute the scenario "scn_load_data" startscen.sh

startscen.sh scn_load_data 001 GLOBAL -NAME=odiSchedAgent001

However, in step (2), it would fork a new agent process to execute the scenario instead of using the existing agent process started in (1). You can verify this by scanning the processes.
On Unix, you can check that using "ps -ef" and see that a new process would be initiated.

Even if you use same Unix account "A" to execute startscen.sh, it would still fork a new agent process.

This may be a problem. As I said earlier - that the permissions of the user who initiated the agent vs the user who initiated the startscen command would differ.
However, there are more cases in which this will be a Bigger problem.
Consider a scenario in which you have 2 Unix systems - System-A and System-B. System-A is running the agent process and the System-B is being used to kick-off the scenario using startscen command. If there are any OS based activities done by the scenario, then they would be done on the file-system of System-B, whereas you wanted those to done on System-A (after all thats where the agent is running. Or so you thought)

Now the Good part... The Solution :
Use the startcmd.sh instead of startscen.sh
The syntax is
startcmd.sh OdiStartScen -SCEN_NAME=scn_load_data -SCEN_VERSION=001 -CONTEXT=GLOBAL -AGENT_CODE=odiSchedAgent001
This command actually uses the remote agent specified in the parameter.

Tuesday, February 1, 2011

ODI 11g Upgrade issues - Set operator disabled

In this post, I will be consolidating all the issues that I had with an ODI 11g upgrade from ODI 10g. This is strictly valid for ODI 11g instances which have been migrated from ODI 10g.

1.)
During the upgrade from 10g to 11g the upgrade, if you forget to upgrade the Technologies, you will find that the SET based functionality doesnt work. Thanks to Nicolas for pointing this out.
In an interface mapping, you cannot add multiple datasets. The Add/Delete buttons for the datasets are disabled.



The cause -
The Oracle Technology was not upgraded to support these new features. So, you will see the following in the Topology Manager -> Physical Architecture -> Technologies -> Oracle


The options highlighted above are unchecked. Hence, the functionalities of Partitioning, Native Sequences, Lookups, Derived Tables and Set operators is not available.

Solution -
You will need to make changes to the Oracle Technology as follows: