Results 1 to 17 of 17

Thread: Deconstructing Bjork with fv and simple filters

  1. #1
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts

    Deconstructing Bjork with fv and simple filters

    Somebody asked in another thread what fv is, and I thought I'd put my answer in a separate thread rather than derailing that one...

    fv is a spiffy plotting program Matt Mahoney wrote. It goes through a file keeping dictionaries of 1-, 2-, 4-, and 8-byte strings (hashes actually), and plotting the match distances. 1-byte matches show up in black, 2-byte matches show up in red, 4-bytes in green, and 8-bytes in blue. Matt's "Data Compression Explained" online book uses a few pictures generated with fv; so does his page talking about the data in the LTCB test corpus:

    http://cs.fit.edu/~mmahoney/compression/textdata.html

    There's a link in there to the program. It's free and open source.

    I find fv to be most interesting when applied to non-text data, in conjunction with some simple filters to quantize and deinterleave data.

    For example, if you look at an uncompressed audio or image file straight through fv, you can generally see the major strides, including the 4-byte stride of 16-bit stereo WAV's, the pixel and raster strides for RGB images, etc.

    Here's a hunk of a Bjork song ("Hunter") as 16 bit stereo:

    Click image for larger version. 

Name:	bjork_fv.png 
Views:	271 
Size:	149.1 KB 
ID:	2841

    The plot shows that as we go left to right through the file, we see a lot of one-byte matches (gray) at distances distributed anywhere from 4 to about a thousand bytes, which looks a lot like noise. We also get a lot of two-byte matches at much longer distances, which also looks a lot like noise. This is clearly not text, and LZ77 isn't going to do well here---we have pretty short match lengths at distances long enough to not be worth it. And we're not seeing any noticeable 4-byte matches (green) or 8-byte matches (blue).

    Worse, if you look at the distribution of one-byte match distances, we're not seeing the mostly short-range matches that would make Huffman coding bytes work well. If we had the kind of striking frequency skew we want, we'd have noticeable match-distance skew too, because frequently-seen data are usually recently-seen too.

    But there's a noticeable stride---the gray line across the plot at 4, and the weaker lines at 8, 12, 16. (Those strides continue up until they get lost in the log plot blur, but they're still there in the gray and pink bands, as a linear plot can show.)

    If a stride is evident like that, you can just deinterleave the data on that stride before running it through fv, and pow, you can see what's happening in the most and least significant bytes of each channel, as separate side-by-side pictures within the fv plot.

    Click image for larger version. 

Name:	bjork_x4_fv.png 
Views:	272 
Size:	184.6 KB 
ID:	2842

    That's what you get by skipping through memory looking at every 4th byte, seeing all the four-byte-aligned bytes, then doing it again for the ones that are four-byte-aligned but off by 1, then again by 2 and 3.

    In that picture we can see that the even bytes still look like noise, but the odd ones don't at all. (That's typical of 16-bit data representing an approximation of a more or less continuous function, whether it's specifically audio or not.)

    Or you can quantize the data without deinterleaving---just blindly zero out the lowest few bits of every byte---and various strides become much more obvious in the interleaved data:

    Click image for larger version. 

Name:	bjork_zl4_fv.png 
Views:	281 
Size:	210.2 KB 
ID:	2843

    (Here I've zeroed the lower four bits of each byte and left the other four intact.)

    For an audio file, quantizing like that often brings out correlation between channels, which would be lost in a deinterleaved plot. The plain fv plot will have a strong stripe at say, 4 for 16-bit stereo audio, because within each channel, the most significant bytes are usually the same as for the previous sample. But if you quantize, you start to see more matches across the channels, with weaker stripes at 2, 6, etc. too.

    (I'd guess that's partly because (1) except when the music is loudest, the sample values tend to be closer to the center of their range than the extremes, and partly because (2) the overall amplitude is usually dominated by low frequencies, and bassy sources are usually not placed far off to the side of the stereo field.)

    Less obviously, you can also do simple tricks that often make an unknown stride obvious, like blindly deinterleaving
    the data at a stride of 48, and seeing what happens. If there's a stride on a factor of 48 (e.g., 12 or 16) it will usually show up in the 48-way deinterleaved plot in a way that gives you a big hint what the real stride is. (Conversely, if you deinterleave on a small stride that's a factor of an actual, larger stride, that will generally show up visually in a different way.)

    You can essentially use fv with a couple of filters, varying the parameters, to "tune in" to the regularities in structured data, sort of like messing around with an oscilloscope to find the right fundamental frequency. Once you've got that, you can mess around with things like quantization to see what happens to various fields when you lop off a few low bits, or a few high bits, to get hints as to what kind of data is at each offset from the stride. (It's also useful to have filters that respect alignment, chopping things up into nonoverlapping chunks.)

    Here's Bjork with both quantization (zeroing the least significant halves of bytes) and deinterleaving:

    Click image for larger version. 

Name:	bjork_x4_zl4_fv.png 
Views:	271 
Size:	193.3 KB 
ID:	2844

    Here you can see that in the most significant halves of the high bytes match a lot better---they very often match the preceding sample's upper 4 bits exactly. The same isn't true of the high bits of the least significant bytes---even if we lop off the four lowest bits off, it still looks like noise.

  2. #2
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Interesting post and interesting tool.

    Or you can quantize the data without deinterleaving---just blindly zero out the lowest few bits of every byte---and various strides become much more obvious in the interleaved data:



    (Here I've zeroed the lower four bits of each byte and left the other four intact.)

    For an audio file, quantizing like that often brings out correlation between channels, which would be lost in a deinterleaved plot. The plain fv plot will have a strong stripe at say, 4 for 16-bit stereo audio, because within each channel, the most significant bytes are usually the same as for the previous sample. But if you quantize, you start to see more matches across the channels, with weaker stripes at 2, 6, etc. too.

    (I'd guess that's partly because (1) except when the music is loudest, the sample values tend to be closer to the center of their range than the extremes, and partly because (2) the overall amplitude is usually dominated by low frequencies, and bassy sources are usually not placed far off to the side of the stereo field.)


    I'd be suspicious that the new correlations were in the original signal and are not just artifacts from the quantization.

    How does fv decide which matches to display? Does it show only the longest and closest match, or does it show all matches?

    I can't help noticing that you're using this tool for something it wasn't designed for, and so I'm skeptical. Finding exact matches is mostly only useful for things like text. If you would like to look at audio signals, there have got to be much more advanced tools for doing that.

  3. #3
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts

    dissecting the mandrill with fv and simple filters

    nburns:

    I'd be suspicious that the new correlations were in the original signal and are not just artifacts from the quantization.


    That's a very good thing to be suspicious about in this sort of thing.
    It's good for making patterns pop out, but you want to be careful about sensitivity vs. specificity---are you just turning up the sensitivity and losing specificity, i.e., getting a lot of false positives?

    That's one reason why I want to post here about this---in case I'm just wrong and crazy and somebody else can see that I'm getting the kinds of images I'm getting for different reasons than I think.

    Often you can do some kind of cross-check to see if you're just hallucinating regularities in noise... like seeing if you get matches at places not consistent with the correlation you think you're seeing.

    Here's a different, simpler example.

    This is the standard mandrill (RGB, 8x3=24 bits) test image:

    Click image for larger version. 

Name:	mandrill.png 
Views:	252 
Size:	703.7 KB 
ID:	2851

    It's kind of a booger for stride detection and for compression generally because it has a lot of irregular, fine texture, continuously changing curves, etc. Most pixels are different from the ones before and after them in a raster, and from the ones above and below as well, due to the fine texture at various strides, angles, and fine-grained color variation. (That's why it's a standard image---it's a hard case.)

    If you just run the tiff file through fv, it looks like this:

    Click image for larger version. 

Name:	mandrill_fv.png 
Views:	223 
Size:	174.2 KB 
ID:	2849

    You can see the pixel and raster strides (3 and something above 1500) pretty clearly in the lower and upper part of the picture respectively, and in this case in black (one-byte matches) and red (two bytes) respectively. We get byte matches disproportionately in stripes at 3 and multiples of 3, and two-byte matches at 1500 or whatever and multiples of that. (Every major tick is a factor of 10 on the vertical scale.)

    As pictures go, though, the stridiness is pretty lame, especially at the beginning of the picture... if you think about an online algorithm noticing the stridiness in a left-to-right scan, that's less than ideal. The strides are there, but not immediately easy to see in a scan.

    Now let's try quantizing it, zeroing the least significant four bits of bytes, and preserving the most significant four (and running that through fv):

    Click image for larger version. 

Name:	mandrill_zl4_fv.png 
Views:	213 
Size:	234.1 KB 
ID:	2850

    That enhances the stridiness throughout, but perhaps most usefully at the beginning and in a couple of other places where it was less obvious than usual. (And less obvious than it is throughout most photographs.)

    Now we ask the question... are we imagining this? Of course we're going to get a lot more matches if we have fewer things to match, and reducing 8-bit bytes to 4-bit nybbles, because we're lumping things together into categories 16 times as big before we "match" them.

    One thing that argues the other way is that we don't get lots more matches at all distances---we're still mostly seeing strides of 3, but darker. We're not lumping things together so badly that we see mostly matches between random things at all distances, and we're still mostly matching red with red, green with green, and blue with blue.

    If the null hypothesis is that we're just seeing the effect of throwing away bits, and getting more matches because we're being less precise, we can test that. We can throw away different bits, instead of just the least-significant bits.

    We can throw away the high bits, instead, to see if we're right that this is about approximate numerical matching at strides, as opposed to just overgeneralizing to the point where everything seems to match.

    Here's mandrill through a filter that discards the high 4 bits instead of the low 4, and keeps only the low bits:

    Click image for larger version. 

Name:	mandrill_zm4_fv.png 
Views:	209 
Size:	188.2 KB 
ID:	2848

    If you compare this to the original, unfiltered original, it does exhibit some of what you'd expect from just being a lot less precise---you get a lot of 3-byte matches at distances where previously you only got two-byte matches, and two-byte matches where previously you only had one-byte matches. Exactly what you'd expect, in that respect, from looking at noise through a haze.

    But if you look at the patterns within those broad bands, they're way, way different. Keeping the high bits gives you vivid stripes, and keeping the low bits instead make stripes almost entirely go away. (There's a visible vestige of a stride of 3, near the bottom, but multiples of that pixel stride are gone, and so is the raster stride.)

    From that I think it's pretty clear that filtering out the low bits is effectively tuning in to a strong signal in the high bits, which is pretty strongly and directly related to both the pixel and raster strides. Looking at the low bits gives you something almost indistinguishable from noise, with random matches between unrelated channels about as likely as matches between the related ones.

    To bring that back to the Bjork audio example, we can do the same basic thing with her---compare and contrast high-bit and low-bit filters. Lemme go generate some pictures and see if I'm right... back in a bit.
    Last edited by Paul W.; 30th April 2014 at 07:34. Reason: trying to fix image links

  4. #4
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts

    Bjork's "Hunter" (reprise)

    Ok, back to Bjork:

    Here's the basic 16-bit stereo audio thing through fv again:

    Click image for larger version. 

Name:	bjork_fv.png 
Views:	204 
Size:	149.1 KB 
ID:	2863

    We can see an evident stride at four, and more weakly at multiples of 4.

    And here it is just zeroing the low 4 bits of bytes, to see patterns in the high bits

    Click image for larger version. 

Name:	bjork_zl4_fv.png 
Views:	197 
Size:	210.2 KB 
ID:	2864

    The stride of four becomes very striking---whatever else is going on, high nybbles often repeat in the next sample---and we seem to be seeing cross-correlations between channels at strides of 2 and 6, etc...

    But is that right? I don't know. It seems consistent with the fact that we see those even-but-not-multiple-of-four strides more than random matches at odd strides like 3, 5, and 7, but I'm not sure. Maybe there's another explanation.

    (Even assuming it's the right explanation, it doesn't necessarily mean that the two channels are very interestingly correlated. The high bytes of pretty much any two audio channels will be correlated, just given the properties of audio channels. The high byte values will tend to be distributed very modally around the zero, because any audio waveform with a net zero DC component will wobble across the zero regularly, and rarely hit near its extremes. The low bytes, in contrast, will frequently go all the way across their ranges and wrap around, so their values will be almost uniformly distributed unless the volume is often very low.)

    Anyway, here's what we get by zeroing the high bits of bytes to see patterns in the low bits, as a cross-check:

    Click image for larger version. 

Name:	bjork_zl4_fv.png 
Views:	197 
Size:	210.2 KB 
ID:	2864

    This is just the opposite of what we got by looking at the high bits. The stride at 4 is weakened rather than strengthened, though it's still visible, and the strides at multiples of four (or anything else) pretty much disappear. This looks pretty much like you'd expect if the data were random---just removing half the bits makes a lot more things match, but only at random, including odd strides as much as even ones.

    Here's the same data (just low bits) deinterleaved on the 4-byte stride:

    Click image for larger version. 

Name:	bjork_x4_zm4_fv.png 
Views:	201 
Size:	180.3 KB 
ID:	2866

    Remember that when we deinterleave, we can see the high and low byte pictures for each of the channels side by side, but here it's much harder to tell the high bytes from the low bytes. All four quarters of the picture look a lot more similar than before---two of them look just like noise, and two the other two look like noise most of the time.

    Contrast that to what we got before, deinterleaving with the high bits of bytes:

    Click image for larger version. 

Name:	bjork_x4_zl4_fv.png 
Views:	198 
Size:	193.3 KB 
ID:	2865

    The high bits of bytes, once deinterleaved, show an even more striking difference between the high bytes and the low bytes of channels, and than that we're really hitting a stride of 1 pretty often in the high bytes---the high four bits of successive samples in the same channel are usually the same, and we get a lot of fairly long string matches, four and eight (half-) bytes in a row, at distances that are orders of magnitude closer than we see in the (high bits of) low bytes of the same channels. The string matches in the high bits of the high bytes channels are clearly due to strong regularities---regularities so strong that even an inappropriate Markovish model can benefit from them---while the ones in the high bits of the low bytes of channels seem like occasional random luck.
    Last edited by Paul W.; 1st May 2014 at 02:54. Reason: fix picture includes, I hope

  5. #5
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Ok. The lowest bits are expected to look random. The highest bits are probably correlated almost perfectly. CD audio is 16-bit signed 2's complement, just like a short in C. So for small negative numbers, all of the high bits are 1, and for small positive numbers, all of the high bits are 0. In other words, above the actual data they all reflect the sign. I think I heard/read somewhere, maybe thorfdbg posted it, that an audio sample is like 1-2 bits of signal plus many bits of noise and many wasted bits at the top.

    Some of your data may be meaningful, in the sense that the matches are between things that are truly correlated. I was concerned before that you may be obliterating the actual data in the low bits and getting entirely spurious matches. If you're basically matching 1 bit-worth of information, though, that's not much better.

  6. #6
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    fv shows the closest match if there is more than one. And of course, this is exactly the kind of analysis that fv is useful for. For example, it shows that the low pixel bits of the mandrill image are essentially random.

  7. #7
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I just uploaded a bug fix to fv that caused some programs like gimp to fail to read the output file (fv.bmp). The update is at the same location, http://mattmahoney.net/dc/fv.zip and also includes the old (2006) version for historical purposes. The fix was to set the normally unused "colors" and "important colors" fields to 0 instead of 0x1000000 in the BMP header.

  8. #8
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    Thanks, Matt!

  9. #9
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    BTW, if you want my filters to zero n bits (high or low) or to destride on n bytes, you're welcome to them. They're handy adjuncts to fv and other things. The first two are little one-file C programs, the other a little one-file Python script. (I have a C version of that one too, but it needs a bug fix.)

  10. #10
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    nburns:

    Some of your data may be meaningful, in the sense that the matches are between things that are truly correlated. I was concerned before that you may be obliterating the actual data in the low bits and getting entirely spurious matches. If you're basically matching 1 bit-worth of information, though, that's not much better.
    I think the compare/contrast (e.g., between high and low bits of bytes) can usually help sanity-check how you're interpreting the fv pictures, but you can certainly use different filters to see different things, less lossily and with less risk of interesting-looking but spurious crap showing up.

    Just looking for patterns in high vs low bits of bytes is often very useful, but once you make a guess what general kind of thing you're looking at, you may want a different filter to see something more subtle.

    In the case of bjork, or the "geo" file (geologic sensor data) from the Calgary corpus, once you've guessed you're looking at numerical data and not text, found the evident stride and de-strided it, what should you look for next?

    One thing you should look for is the kinds of basic regularities that often occur in strided numerical data and which compressors can exploit---things like run-length encoders, delta coders, and linear predictors.

    One handy filter is a simple numerical delta coder, which just replaces each byte with its numeric difference from the previous byte. I can visualize that sort of thing with some fancier software I developed for memory hierarchy research, but it's not obvious how to do it with fv, which only accepts bytewise inputs and doesn't understand escape codes.

    The problem is that the output of a bytewise delta coder doesn't fit into bytes---the range of deltas is 2x the range of the bytes themselves.

    So I came up with a really ugly hack and a much less ugly hack for fv. The uglier one is to just expand the input to hex before and delta code the two-byte stream, and run the two-byte output of a delta coder. That's nasty because it introduces weird artifacts where things match at odd strides where they shouldn't, because it's really aligned double-byte data and fv doesn't understand that.

    The nicer and usually good-enough hack is to use a bytewise lossy delta-coder, which gets the deltas right if they're small, but messes them up if they're out of the range of -128 to 127. (Large deltas just wrap back into the range of short deltas.) This is lossy, in that you can get spurious matches between small deltas and very large ones, but it's way less lossy than just throwing away bits, and plenty good enough for most visualization purposes.

    Here's Bjork as we've seen her before, just deinterleaved on the obvious four-byte stride and run through fv:

    Click image for larger version. 

Name:	bjork_x4_fv.png 
Views:	204 
Size:	184.6 KB 
ID:	2868

    Now here's what we get when we run the deinterleaved file through the slightly-lossy delta coder before fv:

    Click image for larger version. 

Name:	bjork_x4_dlt_fv.png 
Views:	203 
Size:	188.0 KB 
ID:	2869

    The first and third hunks, which show the low bytes of the first and second audio channels, are unchanged, visually. They still look like random noise, and delta coding them essentially gives you a different random stream with the same statistical properties.

    That tells us that the stream of low bytes for a channel does NOT have much in the way of runs, frequently repeating strings, or repeating patterns of numerical difference as you'd see with recurring fixed slopes. (e.g., a flat line would give you a run of zeroes that fv could match. A recurring fixed slope, irrespective of actual amplitude, would give you runs of some other delta, and recurring curves at different absolute amplitudes would give you strings of deltas that recur.)

    The second and fourth hunks, which correspond to the streams of high bytes for the stereo channels, are very different. Delta coding them
    gives you more and closer matches of strings at all lengths, showing that there ARE recurring patterns of deltas even where there are not recurring patterns of literal bytes. That tells me that we're probably looking at a fairly direct integer representation of two channels of a more-or-less continuous function, and I'd guess that the "noise" bytes are the low bytes of the channels.

    At that point, we could do more with fv and combinations of simple filters, to try to narrow down the possibilities, but I'd just try some simple compressors on the guessed-at 16 bit channels. A delta coder would work better than a run-length encoder, a linear predictor would work better than a delta coder. And FFTs would show you all the typical traits of audio, and a lot of the properties of the music.

    A more interesting example would be recognizing "weird" data formats like 12-bit audio packed two samples per three bytes, or 30-bit color images with the pixels packed 10 / 10 / 10 with 2 left over, etc.

    I have ways to do that, but haven't automated the most useful pictures yet.

    The basic idea is that you can generally see the basic stride in the straight fv plot, or at worst a plot where you've zeroed the lowest bits of each byte. For example, for 10/10/10/2 color, the top byte of every four will look very much like an 8-bit color channel, because it's the high 8 bits of a 10-bit color channel.

    Once you have the pixel stride, all you need to do is try various masks & shifts to figure out which small ranges of bits in each 32 are the top bits of a color channel and that will tell you where all of your color channels start.

  11. #11
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Isn't delta coding bytes mod 256 lossless? That is, if the difference overflows, you only keep the low 8 bits. To decode, you add mod 256.

  12. #12
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    Doh! I originally thought that would work, but managed to have a brainfart and reason that it wouldn't, by conflating a couple of things in my head. (Different senses of "lossless"ness.)

    I will probably implement that and an alternative that reflects out-of-range values back toward zero, i.e., reversing direction rather than wrapping around and continuing in the same direction. Those two seem like pretty reasonable cross-checks on each other for most purposes.
    (So that if you're suspicious one is introducing significant artifacts into fv pictures, you can check it against the other, which would not cause the same spurious matches.)

  13. #13
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Isn't delta coding bytes mod 256 lossless? That is, if the difference overflows, you only keep the low 8 bits. To decode, you add mod 256.
    Yes. And if you store the values in chars, you get mod 256 entirely free. Signed and unsigned integers in C both just wrap around on overflow. Signed ints interpret the upper half of their range as negative, which mostly only matters for greater-than and less-than comparisons.

    Ps. One of my few bones to pick with C is the different behavior of >> on signed and unsigned integers. I wish >> always meant logical shift-right. If necessary, a new operator could have been created for shift-with-sign-extension. You don't expect a bitwise operation to respect signedness -- at least, not me -- and the resulting bugs are hard to catch.
    Last edited by nburns; 3rd May 2014 at 22:50.

  14. #14
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    EDIT: Sorry, I posted too soon, so I've deleted the post. There's clearly something going on with fv that I don't understand, and I'll sort that out and post again. Sorry for the false alarm.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	bjork_x4_zl4_dlt_fv.png 
Views:	177 
Size:	197.2 KB 
ID:	2875   Click image for larger version. 

Name:	bjork_x4_zl4_fv.png 
Views:	166 
Size:	193.3 KB 
ID:	2876   Click image for larger version. 

Name:	bjork_x4_zm4_dlt_fv.png 
Views:	169 
Size:	197.6 KB 
ID:	2877  
    Last edited by Paul W.; 4th May 2014 at 17:31.

  15. #15
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    Matt:

    And of course, this is exactly the kind of analysis that fv is useful for. For example, it shows that the low pixel bits of the mandrill image are essentially random.
    Yes, it shows that they are a lot more random-looking under Markovish assumptions than the high bits.

    More interestingly, with filters you can tell some ways that they're not so random. For example, if you look at the basic plot and see the two strides (3 for pixels and 3 x 512 = 1536 for rasters) you can deinterleave the data in rows or in columns.

    Then if you run either of those through the the delta filter, you can see that it works visibly better than you'd expect if the low nybbles were actually random. (And if it's visible at all, it's significant, because of the log scaling.)

    That tells you the whole 2D pixel layout, and that it's more-than-randomly delta-able or linearly predictable (because you get matches to recent deltas at distance 1) in both dimensions.

    That lets you know it's either an image or very image-like, and a 2D model that combines delta or linear prediction in both dimensions is likely to work pretty well, getting at least some of the low bits right.

    Alternatively, you can change the parameter to the zerolsbs filter, and only zero 2 low bits instead of 4. That visibly gives you more improvement than you'd expect if all 4 low bits were truly random, telling you that most of the entropy is in the lowest couple of bits. 4 is a good default (it's what Bulat uses in one of his stride detectors), but varying it a little can tell you subtler things.

  16. #16
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Now we ask the question... are we imagining this? Of course we're going to get a lot more matches if we have fewer things to match, and reducing 8-bit bytes to 4-bit nybbles, because we're lumping things together into categories 16 times as big before we "match" them.
    With audio, IIRC, perceived volume is proportional to the log of amplitude, so disproportionately-many samples will be very small in magnitude. Chopping off 4 bits at the bottom would probably obliterate the entire signal in a lot of places, leaving behind silence (actually, probably a sub-1-bit signal due to how 2's complement works). So fv would be matching samples that were basically zeroes. I guess it's a matter of perspective as to whether those matches are spurious or not. But you would not treat them as matches for the purpose of compressing them. You'd use an entropy compressor instead.

    Delta coding has a different sort of effect on audio than it has on most kinds of data. Normally, you get something that's basically meaningless, but delta-coding a sine wave results in another sine wave; so it's still sound, and probably still sounds fairly normal (I haven't tried it), but the frequency spectrum will be altered.
    Last edited by nburns; 5th May 2014 at 13:39.

  17. #17
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    Quote Originally Posted by nburns View Post
    With audio, IIRC, perceived volume is proportional to the log of amplitude, so disproportionately-many samples will be very small in magnitude. Chopping off 4 bits at the bottom would probably obliterate the entire signal in a lot of places, leaving behind silence (actually, probably a sub-1-bit signal due to how 2's complement works). So fv would be matching samples that were basically zeroes.
    I'm not sure what I should expect, exactly, and with atxd (my new colorizing hex dumper) I'm not seeing quite what I did expect for the few audio samples I've looked at---not at as many runs of high bytes channels that are 00 (white) or FF (black). That could be because the samples in the test corpus were chosen to be not-very-easy to compress, and/or that most of them are pop music. (They all have substantial bass and relatively constant volume in the bass. There's an orchestral piece, but it's Phillip Glass and it too is very repetitive and relatively constant volume and bassy.)

    Or it could be that I need to think harder about things like the implications of logarithmic volume perception, sample rates vs. bass frequencies, and Nyquist's theorem.

    (At least atxd turns out to be useful---it immediately showed me I didn't actually know what to expect these audio samples to "look like.")

    It has a similarly interesting effect on text. The delta of an alphabetic string matches any other delta of the same string, uppercase or lower case, except for the first letter, plus some other strings with the same pattern of skipping up and down through the character codes.

    For example, \0 a c a c a gives you 97 +2 -2 +2 -2 for the part after the zero---it's just the ASCII code for 'a' followed by wobbling up and down 2 places. \0 A B A C A gives you 65 +2 -2 +2 -2 instead. (The ASCII code for 'A' followed by the same wobble pattern.) But other strings match, too (after the first character); \0 m o m o m o is the same thing, except that it starts with the ASCII code for 'm'. It's like transposing a melody up or down to different musical key; the first character tells you the key, and the rest the melody. The same thing happens if they're all preceded by spaces rather than nulls (zeroes), except that the "key" (first letter) is offset by the ASCII code for a space.

    You can see this strikingly in fv plots and atxd dumps if you run text through a delta filter before visualizing it. (The fv plot hardly changes visibly at all, except for a color shift. It's still clearly text, and clearly the very same text, with all the major features in all the same places.)

    Given how cheap delta coding is (because it's very local and you can SIMDize it well), I think that may actually be useful for a couple of purposes, in combination with a couple of other bit-twiddly and SIMD tricks.

Similar Threads

  1. TinyLZP - A very simple LZP compressor
    By david_werecat in forum Data Compression
    Replies: 8
    Last Post: 15th October 2012, 03:05
  2. TinyCM - A simple CM compressor
    By david_werecat in forum Data Compression
    Replies: 19
    Last Post: 13th October 2012, 19:04
  3. Preprocessors and filters and Nanozip??
    By kampaster in forum Data Compression
    Replies: 18
    Last Post: 9th July 2010, 20:42
  4. Simple encryption (RC4 like)
    By encode in forum Forum Archive
    Replies: 37
    Last Post: 26th January 2008, 04:05
  5. Replies: 33
    Last Post: 24th October 2007, 12:39

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •