Omnis Technical Note TNDM0002 Updated November 2006
Data Corruption Problems
For Omnis Classic & Studio
By Omnis Technical Support
Due to its publication date, this technote contains information which may no longer be accurate or applicable on one or more of the platforms it refers to. Please refer to the index page which may contain updated information.
What could be causing data corruption problems?
How can corruption problems be rectified?
Contents
Introduction
Segment Sizing
Opportunistic Locking Screen
Savers and Energy Savers
Write Behind Caching
File Formats
Repairing Datafiles
Old Files
Cabling
General Hints and Tips
Back-up and Omnis data corruption case study
Additional Notes
Introduction
Generally one can say that the Omnis Datafile technology is safe, but
it does have one essential weak point: there is no server side checking
of the data, as is the case with SQL backends. Omnis relies on a functioning
network to write data to a file server and if this networking is defective
network packets can get lost, thereby possibly corrupting the datafile.
The impression that the network is OK is not always correct, especially
in high traffic situations, as in these cases corrupted packets can slip
through.
More often than not, the cause of the corrupted datafile is some network
issue. The points below try to give an overview of what can cause network
problems and what might be the cause of damage to a datafile and are in
no specific order. The Write-Behind-Caching seems to have helped in many
cases though.
Segment Sizing
The first segment of your datafile is automatically sized for you. After
adding a second segment to your datafile, you should immediately increase
the size of the segment. This does not occur automatically but has to
be done with code.
Increasing the number of blocks in a datafile
Test for only one user
or
Calculate #F as $cdata.$shared.$assign(kFalse)
If flag true
If $cdata.$freesize<250000
Yes/No message (Do you want to expand the datafile ca 1MB ?)
If flag true
Calculate %DATAFILESIZE as $cdata.$disksize
Calculate $cdata.$segments.1.$disksize as %DATAFILESIZE+1000000
End if
End if
End if
Calculate #F as $cdata.$shared.$assign(kTrue)
Add a new segment to a datafile
Calculate #F as $cdata.$shared.$assign(kFalse) | ||
If flag true | ||
Set reference REF to | ||
$cdata.$segments.$add(path_to_datafile,512000) | ||
; (creates a new 256MB large segment of the datafile) | ||
End if | ||
Calculate #F as $cdata.$shared.$assign(kTrue) |
Should the function fail try setting a new path in you autoexec.bat
SET OMNIS=C:\OMNIS\Data
or set your working directory to your datafile directory before performing
the above actions.
; This code can be performed in the STARTUP.
Opportunistic Locking
Opportunistic locking on NT should be turned off:
WHAT IS OPPORTUNISTIC LOCKING: Opportunistic locking is used by Windows
NT to perform read-ahead, write-behind, and lock caching. Basically, if
one client is accessing a block range in a file, that range is marked
for opportunistic locking and the client can perform read-ahead, write-behind,
and lock caching. If another user attempts to write to that block range,
the opportunistic locking has to be switched off for the previous client
and the data needs to be synchronized with the server before the second
user can access the range.
SITUATION: Users were seeing regular corruption of their database. All
had the package installed on a Windows NT Server (3.51 or 4) and were
running Windows 95 at the workstations. Corruption would happen several
times a day.
CAUSE: Windows NT Server tries to use a feature called Opportunistic Locking
in order to speed them up. This does not work well with a database.
RESOLUTION: This fix needs careful attention. We recommend that a responsible
network person make this change. Any time that you edit a machine's registry
information, you risk bigger problems if it is not done correctly.
Steps to disable opportunistic locking on an NT Server
1 | Open REGEDT32 on the server machine. |
2 | Go toHKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ LanmanServer\Parameters. |
3 | From the menu, select Edit/add value. |
4 | Fill in the blanks (Value Name) EnableOplocks (Data Type) REG_DWORD. |
5 | Select OK. |
6 | A DWORD editor dialogue box will appear, type in a zero and leave it HEX. |
7 | Select OK. The new value should appear on the right half of the registry viewer. |
8 | Exit the registry editor. |
9 | Reboot the server. The value will only go into effect after a reboot. |
See attached file that performs this procedure for more information.
Screen Savers and Energy Savers
Screensavers and energy savers on Windows machines should be turned
off, especially on the server. These are supposed to disable all connections
cleanly if the computer has been idle for some time and reconnect after
the computer recognises some action, but more often than not these do
not function correctly. Omnis is very network sensitive, so if the network
is not OK with packets getting lost, Omnis has no influence over what
data is written to the datafile. So if the reconnection to the datafile
is not clean, data can be lost and the datafile corrupted.
Write-Behind Caching
Turn off write-behind caching on the Win95/98 machines. This type
of caching stores information that needs to be written to the hard disk
and sends it when the system is idle or after a certain amount of time
has elapsed. This is a built-in feature of Windows 95/98 and is provided
by the SmartDrv Utility under the various versions of Windows 3.
Disabling Write Behind Caching:
Using Windows 95:
1 | Right button-click on My Computer, and select the properties menu item. |
2 | Click on the Performance tab in the System Properties window that appears. |
3 | Click on the File system... button at the bottom left hand corner of the window. |
4 | Click on the Troubleshooting tab in the File System Properties window that appears. |
5 | Place a tick in the "Disable write-behind caching for all drives" check box. |
6 | Click OK in the File System Properties window. |
7 | Click OK in the System Properties window. |
8 | Reboot Windows 95. |
This will reduce the performance of your machine slightly, as writes
to disk are changed from write-behind to write-through caching. If you
are doing a reorganisation of a very large file and want to get every
bit of performance out of your system, it is worth turning this flag off
and rebooting before doing the reorganisation. The speed hit depends on
the performance of your hard drives and their interfaces.
The main concern is that popular system optimisation software (e.g. First
Aid) suggests to the user that this setting is bad, and tries to turn
it off again, enabling write behind caching. So even if you have done
the right thing, the user (or a technician trying to improve the system
performance) may unwittingly undo all your good work.
The following code allows one to test if the write-behind-cache of Windows95/98
is on or off and report this to the user. It could be helpful to prevent
data corruption caused by write-behind-caching by advising the customer
of this fact before starting the application.
If sys(6)='W'&sys(7)='4.0' ;; Windows 95/98
Calculate #2 as 0
Register DLL ("ADVAPI32.DLL","RegOpenKeyA","JJCN")
Call DLL
("ADVAPI32.DLL","RegOpenKeyA",-2147483646,"System\CurrentControlSet\control\
FileSystem",#2) Returns #1
Register DLL ("ADVAPI32.DLL","RegQueryValueExA","JJCNNNN")
Register DLL ("ADVAPI32.DLL","RegCloseKey","JJ")
Calculate #50 as 4
Calculate #48 as -1 ;; Default value (just in case key doesn't exist)
Call DLL
("ADVAPI32.DLL","RegQueryValueExA",#2,"DriveWriteBehind",0,#49,#48,#50)
Returns #3
Call DLL ("ADVAPI32.DLL","RegCloseKey",#2) ;; Close key
If #48=0
Do cMsg.$assign('Write Behind Cahce is OFF') Returns #F
Else
Do cMsg.$assign('Write Behind Cahce is ON') Returns #F
End If
End If
Using Windows for Workgroups 3.11 (WFW 3.11):
N.B. You must upgrade to WFW 3.11 if you are using
an earlier version.
If you are using 32-bit file access on all or some drives:
There may be a line (or lines) in your system.ini file that is/are present
in the section entitled [vcache] beginning with ForceLazyOn= or ForceLazyOff=
.
1 | If there is a line beginning with ForceLazyOn=, delete the entire line. |
2 | If there is a line beginning with ForceLazyOff=, ensure that all the active drives in your system are included in the letters following ForceLazyOff=, e.g. if your system has two drives, C: & D:, make the line read as follows: ForceLazyOff=CD |
3 | If there is no line beginning with ForceLazyOff=, add the following line in the [vcache] section: ForceLazyOff=CDEF |
Again, in this instance the letters CDEF refer to the four drives
C:, D:, E: & F: and should be changed as required to suit your system.
You should also include network drives in deciding what letters to add
to the line.
The [vcache] section of the system.ini file should look something like
this when you have finished:
[vcache]
MinFileCache=512
ForceLazyOff=CDEF
If you are using 16-bit file access on all or some drives:
There should be a line in your autoexec.bat file that looks something
like this:
c:\dos\smartdrv.exe
Add the switch /x to this line so that it reads:
c:\dos\smartdrv.exe /x
Using
Windows 3.1:
There should be a line in your autoexec.bat file that looks something
like this:
c:\dos\smartdrv.exe
Add the switch /x to this line so that it reads:
c:\dos\smartdrv.exe /x
File Formats
If the file formats (or classes in Studio) are corrupt this will get passed
on to the datafile. Have a look at that file format. It will probably
look OK. It will be corrupt at the tokenised level. You will need to replace
the file format. See http://Omnis.notabene.at/html/demos.html#SlotMaker
for a tool from The Omnis LAB that creates new file formats based on the
slots in a data file.
A corrupted datafile will stay corrupted, so it would be advisable to
implement one or more of the steps above, then export/import the data
and update the library and datafile concurrent to the changes made to
the operating systems.
Repairing Datafiles
The best way to check or repair a datafile is to
run a Full check. The procedure is:
1 | Run the repair utilities with all the ÔCheck data file structureÕ, ÔCheck recordsÕ, ÔCheck indexesÕ and ÔRepair dataÕ options selected. Completely ignore any messages reported in the log. |
2 | Repeat step 1. A second time. |
3 | (Optional) Clear the check data log and repeat step 1. but without the ÔRepair dataÕ option. Any messages that now appear in the log will probably denote irreparable damage. |
A full check should fix the great majority of problems, if it doesnÕt
the only solution is to export and re-import the data. Datafiles often
pick up small amounts of damage with regular use and this generally causes
no long-term problems (just like Norton Utilities nearly always seems
to find something wrong with a hard disk). So even if a datafile seems
to be working fine it is sensible to perform the Full Check routine described
above every month or two. This could usefully be carried out after the
network hardware check recommended elsewhere in the document.
Don't use the Quick check facility, instead rely on performing a Full
Check every couple of months. In practice most damage reported by Quick
check is not permanent but was instead flagged by some momentary network
glitch which Omnis managed to successfully circumvent. If a datafile becomes
damaged on a regular basis always check the network for hardware problems
before attempting to repair the datafile.
Plan ahead and assume that problems will happen from time to time. Make
sure there is a reliable backup system and plans in place to perform periodic
checks and deal with emergencies - it takes a long while to perform a
Full Check on a large datafile and even longer to export and re-import
data. This planning may identify cases that a server based SQL database
is the only sensible solution for a large amount of business critical
data.
Old
Files
Datafiles created prior to Omnis 7 v2 may appear to function correctly
but often contain invisible damage that was not picked up by the repair
tools available at that time. This means that it is safer not to convert
these old datafiles, instead export the data with the original Omnis and
import it into a new datafile with the current Omnis.
Cabling
Defective network cable or connectors can be a problem, especially in
an Ethernet network. Twisted pair tends to be a lot safer. Even old network
cables can be a cause.
There can be "cross talk" caused by poor quality cables and connectors.
"Reflection" caused by improper cable radiuses and running too close to
electrical lines. Missed and corrupted packets caused at the software
level by incorrectly installed drivers and/or corrupted drivers. A malfunctioning
hard drive can write bad data and or lose data in selected sectors. The
list goes on.
A 4K Cable tester is an investment worth making. Many sites that are inspected
with this tester do not meet category 5 cable guidelines. Generally cables
are tested and certified to 100mhz, then 3COM Ethernet cards are recommended
for ALL the machines on the network, including those not running the software.
Problems can mysteriously clear up after cabling is upgraded to Cat 5
from Cat 3. In the case of low-end network cards, perhaps some cards do
not do check summing very well, in which case a corrupted packet could
get through. Even a network class 5 cable that had a desk leg placed on
it has been the cause of problems. It was causing one computer to run
slowly and thus corrupted the data.
It is necessary to check for bad cables, cards and hardware by doing a
'ping-a-thon' once a month to every piece of network hardware.
General Tips and Hints
Never try to reorganise data if severe data damage is suspected. With
current Omnis versions this will only make things worse.
Make especially sure that there is a reliable backup before repairing
or reorganising data. Otherwise a crash during these operations could
be really bad news. If a workstation crashes for any reason whilst Omnis
is updating the datafile it can cause corruption and locking problems.
Make sure the users are educated not to switch off their workstations
improperly.
We always set the NT Server performance setting to "balanced" rather than
"maximize for file sharing".
Please find below an example structure (simplified)
for how to update or insert records to a datafile:
Load error handler STARTUP/18 | |
Repeat | |
Cancel prepare for update | |
Prepare for edit | |
;Data Update Process | |
;Data Update Process | |
;Data Update Process | |
;Data Update Process | |
;Data Update Process | |
Update files | |
Until Flag true | |
Unload error handler STARTUP/18 |
The actual error handler is simply:
Parameter ErrorCode |
Parameter ErrorText |
;(We write the error code and text with time/date stamps to a log) |
; Format error string and append to file using FileOps commands |
Calculate #F as 0 |
SEA continue execution |
Two network cards with different driver versions installed can
be a corruption cause. This causes bad network packets in the network
eventually resulting in a damaged datafile. Note that any computers in
that network, not just those running Omnis, can create the bad packets,
messing up the network in general.
Another problem that occurs from time to time is corruption on the hard
disk. Have you ever run Scan Disk and found a cross-linked file? If you
have and you have a large data file then the chances are that the cross-link
is in your data.
Packets being sent between the routers, as you know, are checked between
the routers for packet integrity. The router can request that the packet
be sent again from the sending computer if it senses any problems. There
are saturation points on all routers that can cause a bad packet to slip
through. Routinely getting Damaged Data, Bad Pointer' errors and having
to re-index and export/import on a regular basis is an indication of this.
Remember that an Omnis native datafile is updated AND MANAGED solely by
the client computer (unlike an SQL server). The client even seems to control
the resorting of all the indexes contained in all files of all the records
involved in your update. If something gets in the way the results can
be messy. There are many benefits to using a native datafile over anything
else. This, however, is not one of them.
Omnis tells me when I check the Data file: "Needs repair The record
structure for {File-name} is damaged".
Remedy this problem by erasing these file formats and rewriting them from
scratch. Use TCP/IP as a Network protocol. Problems can be experienced
with NetBEUI for example. Switching from NetBEUI to TCP/IP can remedy
this.
Customers have been reporting data corruption problems running ASIP 6.1.1
on a Mac when Windows AND Macintosh clients are connected to the server.
The problems do not seem to occur when connecting the Windows machines
using PC MacLan v7.x . Apparently Apple is aware of a problem and this
will be fixed in ASIP v. 6.2 .
Back-up and Omnis data
corruption case study
This case pertains to an NT site where everything had previously been
fixed but bad things began happening despite no apparent changes on the
application, server or network of clients.
Having spent some considerable time trying to find the problem, and building
registry checking into launching of the application, the customer's own
IT support person found the problem as follows:
They are using Backup Exec.
In the setup, apparently there is an option of whether to backup open
files. If you set to backup open files, there is another option of "with
locks".
So, what was happening was that every few weeks (or days), the on-site
administration person would forget to swap the backup tapes when they
went home. Realising their mistake the next morning, they would then swap
them and Backup Exec was setup in such a way as to wait until the correct
tape arrived - therefore backups commenced during the day. When Backup
Exec got to the Omnis datafile, it was in use and it therefore proceeded
to lock portions of it whilst backing them up.
I think that Omnis would not perhaps recognise another application locking
a portion of it's datafile, but NT may have discarded changes written
by the Omnis clients to locked portions of the datafile, or Omnis just
got confused when it found e.g. part of an index or record locked. Either
way, the data file will get corrupted very quickly, but apparently randomly.
The support guy spotted it when he realised that his backup error log
corresponded with the log of when damage appeared in the data file.
So if you use Backup Exec (or any other software with similar settings), don't let it lock portions of your Omnis data file. Since finding the cause (we hope), no data damage has appeared.
Additional Notes
For additional information, you can download the following Word
documents and the addenda.
Integrating Databases and Netware (Win95DatProblemsNetware.doc
31k)
Opportunistic Locking: Understanding the Problem (OpportunisticLocking.doc
77k)
Addendum 1:
Data corruption's and Novell Netware 5
The Client Software must be release 3.1 with Service Pack 2 installed.
The following Client Registry properties MUST be set as follows:
Cache Writes : Off
Close Behind Ticks : Zero (default)
Delay Writes : Off (default)
File Cache Level : Zero
File Write Through : On
Max Cache Size : Zero
Opportunistic Locking : Off (this will only appear after Service Pack
2 has been applied)
True Commit : On
These entries can be found by Right-Clicking on Network Neighbourhood and selecting Properties, then selecting the Novell Netware client. The above properties will be found under the 'Advanced Tab'.
Information supplied by: Nick Harris of Kamino & Alain Stouder of Smartway.
Addendum 2:
Windows Netware Client (4.91 SP2)
The following is a list of caching properties for Windows Netware Client (4.91 SP2). The settings can be changed in the Novell Client properties on the 'Advanced Settings Tab', as follows:
'File Caching' default is ON set to OFF (This is the really important
one)
'File Commit' default is OFF set to ON (This increases data safety and
might involve a speed penalty but may not be significant a fast system)
'Max Read Burst Size' default 36000 set to 65535 (This might not be a
good idea on a slower network)
'Max Write Burst Size' default 15000 set to 65535 (This might not be a
good idea on a slower network)
All other settings are at their defaults.
Information supplied by: Vik Shah, The DLA Group.