Omnis Technical Note TNDM0002 Updated November 2006

Data Corruption Problems

For Omnis Classic & Studio
By Omnis Technical Support

What could be causing data corruption problems?
How can corruption problems be rectified?

Contents
Introduction
Segment Sizing
Opportunistic Locking Screen
Savers and Energy Savers
Write Behind Caching
File Formats
Repairing Datafiles
Old Files
Cabling
General Hints and Tips
Back-up and Omnis data corruption case study
Additional Notes

Introduction
Generally one can say that the Omnis Datafile technology is safe, but it does have one essential weak point: there is no server side checking of the data, as is the case with SQL backends. Omnis relies on a functioning network to write data to a file server and if this networking is defective network packets can get lost, thereby possibly corrupting the datafile. The impression that the network is OK is not always correct, especially in high traffic situations, as in these cases corrupted packets can slip through.
More often than not, the cause of the corrupted datafile is some network issue. The points below try to give an overview of what can cause network problems and what might be the cause of damage to a datafile and are in no specific order. The Write-Behind-Caching seems to have helped in many cases though.

Segment Sizing
The first segment of your datafile is automatically sized for you. After adding a second segment to your datafile, you should immediately increase the size of the segment. This does not occur automatically but has to be done with code.

Increasing the number of blocks in a datafile
Test for only one user
or
Calculate #F as $cdata.$shared.$assign(kFalse)
If flag true
If $cdata.$freesize<250000
Yes/No message (Do you want to expand the datafile ca 1MB ?)
If flag true
Calculate %DATAFILESIZE as $cdata.$disksize
Calculate $cdata.$segments.1.$disksize as %DATAFILESIZE+1000000
End if
End if
End if
Calculate #F as $cdata.$shared.$assign(kTrue)


Add a new segment to a datafile

Calculate #F as $cdata.$shared.$assign(kFalse)
If flag true
  Set reference REF to
    $cdata.$segments.$add(path_to_datafile,512000)
  ; (creates a new 256MB large segment of the datafile)
End if
Calculate #F as $cdata.$shared.$assign(kTrue)


Should the function fail try setting a new path in you autoexec.bat
SET OMNIS=C:\OMNIS\Data
or set your working directory to your datafile directory before performing the above actions.
; This code can be performed in the STARTUP.

Opportunistic Locking
Opportunistic locking on NT should be turned off:

WHAT IS OPPORTUNISTIC LOCKING: Opportunistic locking is used by Windows NT to perform read-ahead, write-behind, and lock caching. Basically, if one client is accessing a block range in a file, that range is marked for opportunistic locking and the client can perform read-ahead, write-behind, and lock caching. If another user attempts to write to that block range, the opportunistic locking has to be switched off for the previous client and the data needs to be synchronized with the server before the second user can access the range.
SITUATION: Users were seeing regular corruption of their database. All had the package installed on a Windows NT Server (3.51 or 4) and were running Windows 95 at the workstations. Corruption would happen several times a day.
CAUSE: Windows NT Server tries to use a feature called Opportunistic Locking in order to speed them up. This does not work well with a database.
RESOLUTION: This fix needs careful attention. We recommend that a responsible network person make this change. Any time that you edit a machine's registry information, you risk bigger problems if it is not done correctly.

Steps to disable opportunistic locking on an NT Server

1 Open REGEDT32 on the server machine.
2 Go toHKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ LanmanServer\Parameters.
3 From the menu, select Edit/add value.
4 Fill in the blanks (Value Name) EnableOplocks (Data Type) REG_DWORD.
5 Select OK.
6 A DWORD editor dialogue box will appear, type in a zero and leave it HEX.
7 Select OK. The new value should appear on the right half of the registry viewer.
8 Exit the registry editor.
9 Reboot the server. The value will only go into effect after a reboot.

See attached file that performs this procedure for more information.

Screen Savers and Energy Savers
Screensavers and energy savers on Windows machines should be turned off, especially on the server. These are supposed to disable all connections cleanly if the computer has been idle for some time and reconnect after the computer recognises some action, but more often than not these do not function correctly. Omnis is very network sensitive, so if the network is not OK with packets getting lost, Omnis has no influence over what data is written to the datafile. So if the reconnection to the datafile is not clean, data can be lost and the datafile corrupted.

Write-Behind Caching
Turn off write-behind caching on the Win95/98 machines. This type of caching stores information that needs to be written to the hard disk and sends it when the system is idle or after a certain amount of time has elapsed. This is a built-in feature of Windows 95/98 and is provided by the SmartDrv Utility under the various versions of Windows 3.
Disabling Write Behind Caching:

Using Windows 95:

1 Right button-click on My Computer, and select the properties menu item.
2 Click on the Performance tab in the System Properties window that appears.
3 Click on the File system... button at the bottom left hand corner of the window.
4 Click on the Troubleshooting tab in the File System Properties window that appears.
5 Place a tick in the "Disable write-behind caching for all drives" check box.
6 Click OK in the File System Properties window.
7 Click OK in the System Properties window.
8 Reboot Windows 95.

This will reduce the performance of your machine slightly, as writes to disk are changed from write-behind to write-through caching. If you are doing a reorganisation of a very large file and want to get every bit of performance out of your system, it is worth turning this flag off and rebooting before doing the reorganisation. The speed hit depends on the performance of your hard drives and their interfaces.
The main concern is that popular system optimisation software (e.g. First Aid) suggests to the user that this setting is bad, and tries to turn it off again, enabling write behind caching. So even if you have done the right thing, the user (or a technician trying to improve the system performance) may unwittingly undo all your good work.

The following code allows one to test if the write-behind-cache of Windows95/98 is on or off and report this to the user. It could be helpful to prevent data corruption caused by write-behind-caching by advising the customer of this fact before starting the application.

If sys(6)='W'&sys(7)='4.0' ;; Windows 95/98
Calculate #2 as 0
Register DLL ("ADVAPI32.DLL","RegOpenKeyA","JJCN")
Call DLL
("ADVAPI32.DLL","RegOpenKeyA",-2147483646,"System\CurrentControlSet\control\
FileSystem",#2) Returns #1
Register DLL ("ADVAPI32.DLL","RegQueryValueExA","JJCNNNN")
Register DLL ("ADVAPI32.DLL","RegCloseKey","JJ")
Calculate #50 as 4
Calculate #48 as -1 ;; Default value (just in case key doesn't exist)
Call DLL
("ADVAPI32.DLL","RegQueryValueExA",#2,"DriveWriteBehind",0,#49,#48,#50) Returns #3
Call DLL ("ADVAPI32.DLL","RegCloseKey",#2) ;; Close key
If #48=0
Do cMsg.$assign('Write Behind Cahce is OFF') Returns #F
Else
Do cMsg.$assign('Write Behind Cahce is ON') Returns #F
End If
End If

Using Windows for Workgroups 3.11 (WFW 3.11):
N.B. You must upgrade to WFW 3.11 if you are using an earlier version.

If you are using 32-bit file access on all or some drives:
There may be a line (or lines) in your system.ini file that is/are present in the section entitled [vcache] beginning with ForceLazyOn= or ForceLazyOff=
.

1 If there is a line beginning with ForceLazyOn=, delete the entire line.
2 If there is a line beginning with ForceLazyOff=, ensure that all the active drives in your system are included in the letters following ForceLazyOff=, e.g. if your system has two drives, C: & D:, make the line read as follows: ForceLazyOff=CD
3 If there is no line beginning with ForceLazyOff=, add the following line in the [vcache] section: ForceLazyOff=CDEF

Again, in this instance the letters CDEF refer to the four drives C:, D:, E: & F: and should be changed as required to suit your system. You should also include network drives in deciding what letters to add to the line.
The [vcache] section of the system.ini file should look something like this when you have finished:
[vcache]
MinFileCache=512
ForceLazyOff=CDEF

If you are using 16-bit file access on all or some drives:
There should be a line in your autoexec.bat file that looks something like this:
c:\dos\smartdrv.exe

Add the switch /x to this line so that it reads:
c:\dos\smartdrv.exe /x

Using Windows 3.1:
There should be a line in your autoexec.bat file that looks something like this:
c:\dos\smartdrv.exe

Add the switch /x to this line so that it reads:
c:\dos\smartdrv.exe /x

File Formats
If the file formats (or classes in Studio) are corrupt this will get passed on to the datafile. Have a look at that file format. It will probably look OK. It will be corrupt at the tokenised level. You will need to replace the file format. See http://Omnis.notabene.at/html/demos.html#SlotMaker for a tool from The Omnis LAB that creates new file formats based on the slots in a data file.
A corrupted datafile will stay corrupted, so it would be advisable to implement one or more of the steps above, then export/import the data and update the library and datafile concurrent to the changes made to the operating systems
.

Repairing Datafiles
The best way to check or repair a datafile is to run a Full check. The procedure is:

1 Run the repair utilities with all the ÔCheck data file structureÕ, ÔCheck recordsÕ, ÔCheck indexesÕ and ÔRepair dataÕ options selected. Completely ignore any messages reported in the log.
2 Repeat step 1. A second time.
3 (Optional) Clear the check data log and repeat step 1. but without the ÔRepair dataÕ option. Any messages that now appear in the log will probably denote irreparable damage.

A full check should fix the great majority of problems, if it doesnÕt the only solution is to export and re-import the data. Datafiles often pick up small amounts of damage with regular use and this generally causes no long-term problems (just like Norton Utilities nearly always seems to find something wrong with a hard disk). So even if a datafile seems to be working fine it is sensible to perform the Full Check routine described above every month or two. This could usefully be carried out after the network hardware check recommended elsewhere in the document.
Don't use the Quick check facility, instead rely on performing a Full Check every couple of months. In practice most damage reported by Quick check is not permanent but was instead flagged by some momentary network glitch which Omnis managed to successfully circumvent. If a datafile becomes damaged on a regular basis always check the network for hardware problems before attempting to repair the datafile.
Plan ahead and assume that problems will happen from time to time. Make sure there is a reliable backup system and plans in place to perform periodic checks and deal with emergencies - it takes a long while to perform a Full Check on a large datafile and even longer to export and re-import data. This planning may identify cases that a server based SQL database is the only sensible solution for a large amount of business critical data.

Old Files
Datafiles created prior to Omnis 7 v2 may appear to function correctly but often contain invisible damage that was not picked up by the repair tools available at that time. This means that it is safer not to convert these old datafiles, instead export the data with the original Omnis and import it into a new datafile with the current Omnis.

Cabling
Defective network cable or connectors can be a problem, especially in an Ethernet network. Twisted pair tends to be a lot safer. Even old network cables can be a cause.
There can be "cross talk" caused by poor quality cables and connectors. "Reflection" caused by improper cable radiuses and running too close to electrical lines. Missed and corrupted packets caused at the software level by incorrectly installed drivers and/or corrupted drivers. A malfunctioning hard drive can write bad data and or lose data in selected sectors. The list goes on.
A 4K Cable tester is an investment worth making. Many sites that are inspected with this tester do not meet category 5 cable guidelines. Generally cables are tested and certified to 100mhz, then 3COM Ethernet cards are recommended for ALL the machines on the network, including those not running the software.
Problems can mysteriously clear up after cabling is upgraded to Cat 5 from Cat 3. In the case of low-end network cards, perhaps some cards do not do check summing very well, in which case a corrupted packet could get through. Even a network class 5 cable that had a desk leg placed on it has been the cause of problems. It was causing one computer to run slowly and thus corrupted the data.
It is necessary to check for bad cables, cards and hardware by doing a 'ping-a-thon' once a month to every piece of network hardware.

General Tips and Hints
Never try to reorganise data if severe data damage is suspected. With current Omnis versions this will only make things worse.
Make especially sure that there is a reliable backup before repairing or reorganising data. Otherwise a crash during these operations could be really bad news. If a workstation crashes for any reason whilst Omnis is updating the datafile it can cause corruption and locking problems.
Make sure the users are educated not to switch off their workstations improperly.
We always set the NT Server performance setting to "balanced" rather than "maximize for file sharing".
Please find below an example structure (simplified) for how to update or insert records to a datafile:

Load error handler STARTUP/18
Repeat
  Cancel prepare for update
  Prepare for edit
  ;Data Update Process
  ;Data Update Process
  ;Data Update Process
  ;Data Update Process
  ;Data Update Process
  Update files
Until Flag true
Unload error handler STARTUP/18

The actual error handler is simply:

Parameter ErrorCode
Parameter ErrorText
;(We write the error code and text with time/date stamps to a log)
; Format error string and append to file using FileOps commands
Calculate #F as 0
SEA continue execution

Two network cards with different driver versions installed can be a corruption cause. This causes bad network packets in the network eventually resulting in a damaged datafile. Note that any computers in that network, not just those running Omnis, can create the bad packets, messing up the network in general.
Another problem that occurs from time to time is corruption on the hard disk. Have you ever run Scan Disk and found a cross-linked file? If you have and you have a large data file then the chances are that the cross-link is in your data.
Packets being sent between the routers, as you know, are checked between the routers for packet integrity. The router can request that the packet be sent again from the sending computer if it senses any problems. There are saturation points on all routers that can cause a bad packet to slip through. Routinely getting Damaged Data, Bad Pointer' errors and having to re-index and export/import on a regular basis is an indication of this.
Remember that an Omnis native datafile is updated AND MANAGED solely by the client computer (unlike an SQL server). The client even seems to control the resorting of all the indexes contained in all files of all the records involved in your update. If something gets in the way the results can be messy. There are many benefits to using a native datafile over anything else. This, however, is not one of them.

Omnis tells me when I check the Data file: "Needs repair The record structure for {File-name} is damaged".
Remedy this problem by erasing these file formats and rewriting them from scratch. Use TCP/IP as a Network protocol. Problems can be experienced with NetBEUI for example. Switching from NetBEUI to TCP/IP can remedy this.
Customers have been reporting data corruption problems running ASIP 6.1.1 on a Mac when Windows AND Macintosh clients are connected to the server. The problems do not seem to occur when connecting the Windows machines using PC MacLan v7.x . Apparently Apple is aware of a problem and this will be fixed in ASIP v. 6.2 .

Back-up and Omnis data corruption case study
This case pertains to an NT site where everything had previously been fixed but bad things began happening despite no apparent changes on the application, server or network of clients.
Having spent some considerable time trying to find the problem, and building registry checking into launching of the application, the customer's own IT support person found the problem as follows:
They are using Backup Exec.
In the setup, apparently there is an option of whether to backup open files. If you set to backup open files, there is another option of "with locks".
So, what was happening was that every few weeks (or days), the on-site administration person would forget to swap the backup tapes when they went home. Realising their mistake the next morning, they would then swap them and Backup Exec was setup in such a way as to wait until the correct tape arrived - therefore backups commenced during the day. When Backup Exec got to the Omnis datafile, it was in use and it therefore proceeded to lock portions of it whilst backing them up.
I think that Omnis would not perhaps recognise another application locking a portion of it's datafile, but NT may have discarded changes written by the Omnis clients to locked portions of the datafile, or Omnis just got confused when it found e.g. part of an index or record locked. Either way, the data file will get corrupted very quickly, but apparently randomly. The support guy spotted it when he realised that his backup error log corresponded with the log of when damage appeared in the data file.

So if you use Backup Exec (or any other software with similar settings), don't let it lock portions of your Omnis data file. Since finding the cause (we hope), no data damage has appeared.

Additional Notes
For additional information, you can download the following Word documents and the addenda.
Integrating Databases and Netware (Win95DatProblemsNetware.doc 31k)
Opportunistic Locking: Understanding the Problem (OpportunisticLocking.doc 77k)

Addendum 1:

Data corruption's and Novell Netware 5

The Client Software must be release 3.1 with Service Pack 2 installed.

The following Client Registry properties MUST be set as follows:

Cache Writes : Off
Close Behind Ticks : Zero (default)
Delay Writes : Off (default)
File Cache Level : Zero
File Write Through : On
Max Cache Size : Zero
Opportunistic Locking : Off (this will only appear after Service Pack 2 has been applied)
True Commit : On

These entries can be found by Right-Clicking on Network Neighbourhood and selecting Properties, then selecting the Novell Netware client. The above properties will be found under the 'Advanced Tab'.

Information supplied by: Nick Harris of Kamino & Alain Stouder of Smartway.

Addendum 2:

Windows Netware Client (4.91 SP2)

The following is a list of caching properties for Windows Netware Client (4.91 SP2). The settings can be changed in the Novell Client properties on the 'Advanced Settings Tab', as follows:

'File Caching' default is ON set to OFF (This is the really important one)
'File Commit' default is OFF set to ON (This increases data safety and might involve a speed penalty but may not be significant a fast system)
'Max Read Burst Size' default 36000 set to 65535 (This might not be a good idea on a slower network)
'Max Write Burst Size' default 15000 set to 65535 (This might not be a good idea on a slower network)

All other settings are at their defaults.

Information supplied by: Vik Shah, The DLA Group.