1. ## questions about data correction

Hello community,
I would like to have a look into the topic "data correction".

Let's say we have an archive which has some bits wrongly set and we want to correct the values of thouse bits.

Can checksums be used to correct the data?

How do you usually determine which values the bits and bytes should have?

Does someone know/have some good materials to read?

Do you think that it makes sense to add additional data to correct the pay load to a general purpose archive which is beeing distributed over the internet?

2. i was a bit involved in development of the new RAR5 error correction scheme. Plank's research papers was almost everything we've used in this work. you can start with the tutorial

3. ## The Following User Says Thank You to Bulat Ziganshin For This Useful Post:

just a worm (28th November 2013)

4. Interesting research. His fast Galois field library would be good for hashing and probably other stuff.

5. Thank you. I wasn't aware of the xor-trick. It seems pretty helpful.

6. Originally Posted by just a worm
Can checksums be used to correct the data?
You probably read that CRC can repair a single bit and Hamming Codes can repair a couple of bitsâ€¦

Protection of files and archives commonly works in a different way.

1. All data are cut into blocks of the same size
2. Vertical slices of blocks multiplies with Generator Matrix
3. Multiplication produces slices of parity
4. Slices of parity combines into recovery blocks

There are two often ways to build Generator Matrix:

1) Simple interleaved XOR-Matrix:

100010001000
010001000100
001000100010
000100010001

Simple XOR can repair one contiguous burst error and some random errors, but there are unsolvable errors, because simple XOR is weak. In matrix above 1st and 5th data symbols cannot be repaired together;

2) Reed-Solomon Codes use Matrix without zeros over a Finite Field:
Vandermonde Matrix : Aij=i^j
Cauchy Matrix: Aij=1/(i-j)

In case of Reed-Solomon Codes there are no unsolvable errors, every symbol of parity can repair any lost data symbol, and any recovery block can repair any data block in any combinations.

The integrity of blocks is verified by CRC32 or MD5.

So for data protection you should cut your data into blocks and calculate some recovery blocks. Then, when CRC of some data blocks become wrong, you can repair them with equal amount of recovery blocks.

You can download ICEECC from ice-graphics or my RSC32 from livebusinesschat to play with. My RSC32 supports Matrix up to 2000000x2000000 under Win32 (and 64-bit does not exist -))) )

So you can say split your data into 1000000 blocks and calculate 100000 recovery blocks, i.e. 10% of redundancy. Every recovery block can repair every data block

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•