# Thread: Idea

1. ## Idea

Hi,
has anyone ever thought about setting up a cellular automata or automatons to compress numbers?
it was an idea i had a few months back, but have been too busy with other projects.

there are a number of questions:

lets first ask wether or not there exists such an automaton:
if it does not exist, its no point in going on.
if it does exist:

several quesions arise, just a few of them will i go through here:

• in what base-system could it be the most prominent to work in to set up for such an automaton? there can be different base systems for rules (the updating function) and cells etc. its up to the programmer to find out which one that works best for finding a potential compression automaton.
• since automatons are good as visualizing tool, it could be an idea to look at compression as something that reduces in size, hence use the y-axis (time) of the automaton to see if it shrinks.
• how big is the space of possible automatons in a setup like this, and which ones are reversible if any? being reversible is the key for being able to go back from initial states (raw uncompressed data).
• if one increases the number of states in a cell does the number of possible reversible automatas increase or decrease?
• is there a way to find out if it reaches a limit (a compression floor). in rule 30 it creates an "infinite line" of ones with one black cell if it has reached the limit. with more states is it possible to see this? rule 30 can only compress a few certain numbers. it would be interesting to find out if more states can improve upon this.

doing this for more states than just (0 and 1) for each cell is what i am proposing. i found a scheme to do this, but has not written it in code yet. its basically restricted, but it is possible to increase the number of bits in a cell and get more possibilities for manipulating the bits in more ways than in a just 2-state cell. it might be that increasing the number of states also increase the chances of finding an automaton that is reversible and also shrinks in size until it reaches a limit.
this presupposes that there exists such an automaton. 2. Cellular automata are turing-complete ( see https://www.youtube.com/watch?v=My8AsV7bA94 ) so its possible in theory.
But first you have to find working massively-parallel compression algorithms, which we don't really have yet, so no sense to look at cellular automata atm either. 3. I've modelled different types of CA in my research. But there is nothing concrete. Nothing yet found.
After going back to the basics, we see interesting patterns that decrease in the automaton output. Some exhibit longer timesteps than others.
This is reversible, but it requires the number of time-steps to get back to its initial state. So that means more bits than the original unfortunately.

This however seems to be the only elementary Automaton that seem to exhibit some form of compression based on the visual output.
It looks like rule 120 in ECA, but with an L-shape as input, its rule 106.

The logic in boolean form for rule 106 is : R xor (P and Q). for rule 120 its: P xor (Q and R), so basically P- and R inputs are switched. The red arrow shows a lesser configuration than the original (just a few bits compressed however).
How to implement this functionality into multiple automatons in parallel is yet to be discovered. Or if two or more other types of automata exhibits the same traits as rule 105 or rule 120.
At a very basic level, its interesting to see P xor (Q and R) can compress bits if one knows the timestep-length beforehand.
If there are any other types of logic gate combinations that can compress im glad to hear it and im making my own ECA rule map at the moment which I can lookup in to try and find more patterns like these in my research. 4. When I was learning how to program on the Commodore 64 in elementary school, I compressed phone numbers, storing (following an initial 8-bit byte indicating number of nybbles to follow) two digits per byte -- each 4-bit nybble contained a value of 0 through 9, and then 10 through 15 represented the asterisk, octothorpe, dash, plus symbol, and both parenthesis. Although it's not a compression algorithm, and doesn't relate directly to cellular automata, I wonder if it might at least inspire some alternative thoughts about a compressed format for number storage? 5. Originally Posted by Randolf When I was learning how to program on the Commodore 64 in elementary school, I compressed phone numbers, storing (following an initial 8-bit byte indicating number of nybbles to follow) two digits per byte -- each 4-bit nybble contained a value of 0 through 9, and then 10 through 15 represented the asterisk, octothorpe, dash, plus symbol, and both parenthesis. Although it's not a compression algorithm, and doesn't relate directly to cellular automata, I wonder if it might at least inspire some alternative thoughts about a compressed format for number storage?
Good old Commodore 64. I believe that compression in general use characters or symbols the way your example shows. I mean that, the width of symbols and frequencies and how these are manipulated, shuffled and so on is part of it. Im no expert in compression either, so Im just saying things as far as my own perspective goes. Now, these automatas are basically working on a bit by bit level, or logic-gate by logic-gate level, so its important to get it working right the first time, or else everything will fall apart, and the outputs are scrambled and makes no sense whatsoever.

The output has to contain all the information, that is no information should be lost, and thats basically the same as lossless compression. Rule that are multiples of 15 are reversible. So n*15, where n={0..16}. There are 16 of these rules. Of course there are some other rules that are reversible that is not in the list. So these are one class of reversible automatons. Looking at their actual binary form shows that the lo- and hi-nybbles are inverted, I dont know if this is a feature of reversibility, but Im making note of that.

<inserting table here didnt work>

Example:
Rule 15 is 00001111. <- both nybles are inversions of eachother.
Rule 30 is 00011110. <- same here. and so on..
All of these r=3 automatons are reversible.
Another noting that these rules all have the same hamming weight = 4.
There are several ways to set up an automaton, so many things to explore. The inputs doesnt need to be on the same time-step. One can use different geometry on the grid if one likes to call it that.
Now.. reversibility is not the same as compressibility, but, if one does bit-shuffling or any other operation using these reversible-automatas, one can get back to its original form by reversing it. Maybe a simpler or better analog would be that it does a transform to the bits, and these transforms can be alot of things. 6. ## The Following User Says Thank You to Rudi For This Useful Post:

Randolf (29th January 2017)

7. I wonder if setting up a pattern for bit rotations by looping through the pattern could make the data more compressible? So, assuming a pattern of "1,2,3,4,5,6,7,0" rotate the first byte by 1 bit, the second byte by 2 bits, etc., then run the modified data through the compression algorithm. For decompression, simply rotate bits in the opposite direction after decompression.

Of course, the pattern indicates the number of bits to rotate, and so it doesn't have to be sequential, and could even be longer than the sample I provided. And it also could apply to multiple bytes at once, such as 4 bytes treated as a single 32-bit value, in which case the pattern could include rotational shifts ranging from 0 through 31. 8. ## The Following User Says Thank You to Randolf For This Useful Post:

Rudi (30th January 2017)

9. Here's basically a test i did, my initial thought was to use 4 rules that ive studied for reversibility, but it turned out in this example i only needed two. Both of them are inversions of eachother. I like to name these rules "Water". Different numbers(initial conditions) doesnt compress, but needs to be transformed.
Randolf: yes bit rotation might be useful to change initial conditions into more probable compression. some of the Automatas have shift-left/right and rotate behaviour.

The image above shows 25 bits compressed to 14 bits. A compresion ratio of 1.78~ bits losslessly (I only did it for 2 passes), so maybe it will compress more with more passes?
Thought a couple of bits are needed for the rules themselves, and some bits for the amount of iterations each rule takes.
But this is just a basic experiment im doing. 10. Instead of starting another thread, I'll use this thread to post a few questions and thoughts.
I dont want to spam the forum with lots of threads of my questions when many of them are diverged from one concept to another.
Also, I dont think I am expert in one narrow field, but know a little about lots of fields, and maybe more in one than another.
Im focused on learning more about lossless compression which I think many people in this forum knows alot about.

Let me start by describing a question about delta coding (which I posted in a chat room but got not much reply):
<_rudi> when collecting data into a frequency table of n-bits, i wonder if its better to expand the table to n+1 bits and so on, so that the frequencies have higher differences?
<_rudi> that is for compressing data
<_rudi> if one for example wants to do delta-coding
<_rudi> reading two bits for example makes the data collection more dense, while increasing it to more bits makes the data more dispersed. if one does delta coding on 2 bits i guess it wont compress much, but doing it for more bits, if the rawdata is large will help?
<_rudi> also, another question is when should one do deltacoding? for example here's different frequency tables for the same dataset: http://pastebin.com/raw/WQvaKGb0
<_rudi> dcoding on 2 bits doesnt seem right, not 3 bits either. but on 4,5 or 6? 5-bits have one empty slot. 6-bits have several empty slots. and when should one stop. or is it some other feature that one should look at. ive read that dcoding works well for low-frequency data: that is where the difference between each element is low. so increasing the bit-size for dense data doesnt seem wrong.
<_rudi> "Delta encoding can be used for data compression when the values in the original data are smooth, that is, there is typically only a small change between adjacent values."
So I had this dataset, and I made a program that gathers occurences of different bit-lengths, for example 2-bit, 3-bit or 4-bit and so on.
The frequency table showed more dispersed occurances after increasing the bit-length, it was easy to think that beforehand, so that came as no big surprise.
So I read online that deltacoding is good for signals that vary slowly from sample-to-sample.
Allright, let's exclude the cases where there are big differences between a couple of samples, but that most of the samples are smooth after deltaencoding,
then when should one stop increasing the bit-length? is there an easy way to see it after building the frequency tables? for example when 'one' entry in the frequency table is zero?
or should many be zero, or should one study the entries more closer? 11. To store deltas on n-bit data you don't need n+1 bit values. Just do modulo arithmetic and it'll cycle around. Eg 8-bit stream of 0 200 0 200 delta is 0 +200 -200 +200, but -200 is the same as +56 as far as 8-bit arithmetic goes so you can just code the entire result in 8 bits still. 12. Thats true. And so since delta is good for smooth signals(sample differences), and the amplitude may be positively high or negatively low, deltacodes may be much smaller in size than the original values.

original 8 bit per value:
219, 224, 221, 217, 218..

only the first occurance need 8 bits in this case:
219, 5, -3, 6, 1,...

deltas only need:
abs(5) + abs(-3) + 1 = 9 =>
log2(9) = 3.169~ =>
ceil (3.169~) = 4.
4 bits!

A little more than half the original size, since the first occurance (219) need 8 bits.
For this to be true the signal requires no peaks or lows at all - but uber-smooth. I guess folks determine where smooth signals are and excluse peaks and lows in a more complicated set.
And this is where things are getting complicated: how to deal with rough samples. #### Tags for this Thread

cellular automata, compression #### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•