Results 1 to 10 of 10

Thread: MPEG-G

  1. #1
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts

    MPEG-G

    For those that don't know, this is MPEG's foray into genomic data compression. I engaged with it in the early days as I wanted to see a new, better engineered file format.

    Things went sour, to my personal taste and I withdrew. I would have left it at that if the PR campaign hadn't started to get so biased, always comparing their format against 10 year old legacy ones and ignoring everyone elses good work (including my own), so with a sad sinking feeling I started my first blog.

    https://datageekdom.blogspot.com/

    This makes a mockery of the great work the Pistoia Alliance did with their Sequence Squeeze contest (OK I'm biased, a little), as they demonstrated a better model.

    To be balanced, this is their side of the argument:

    https://www.biorxiv.org/content/early/2018/09/27/426353

  2. The Following 5 Users Say Thank You to JamesB For This Useful Post:

    JamesWasil (14th October 2018),Jarek (1st October 2018),Mike (28th September 2018),schnaader (29th September 2018),SolidComp (29th September 2018)

  3. #2
    Member
    Join Date
    Apr 2012
    Location
    Stuttgart
    Posts
    437
    Thanks
    1
    Thanked 96 Times in 57 Posts
    If I read through your blog, is it right to say that your main point of critique is the process and not the standard itself? I cannot really comment on MPEG-G, I'm neither an expert there, nor did I participate, nor contribute.
    I'm only next door, WG1 not WG11.

    Concerning patents: Very much a mixed blessing. The problem is that people run to the patent office with the most obvious and trivial ideas, and the patent offices no longer have the capacity to check claims and hunt for prior art. Most of the compression patents would probably not withstand in a court. On the other hand, patents also help to fund research. If we wouldn't have patents, I would not have a job, and my job is pretty much paid (or was paid) by mp3 patents (or rather, MPEG 2 audio layer 3). Then, if you talk about the open media alliance, and google making research available for free, this is not quite true either. You already paid for that, with your privacy and your personal data. If something is for free, you are not the customer - you are the product. In the end, there is no free lunch. You can either pay directly, or indirectly.

    To clear up the process a little bit: Within ISO itself, you do not and cannot talk about patents, and patents shall not have an influence on technical decisions. The idea is that the process should be driven by technical expertise, and not by commercial interests. Of course, you cannot prevent that in reality, as always. You declare patents only in the very last step of the process, just before publication, and not to the working group, but to the ISO office - just to make sure that you understand how things work. You neither defeat or justify patents within any ISO working group. That is the job of a patent pool, for example.

    Concerning reproducibility: Every ISO process should have a "verification model" that should also be available to WG members. I firmly hope MPEG-G has one, but I do not know. Hence, if you are an accredited member, you should have access and should ask the adhoc group chair to get access to it. If not, complain the test chair of WG11 that the process isn't right. Ideas do not enter into the standard because "someone say so", but because there is evidence on the table from core experiments that provides guidance to the decisions. Also, you need to make sure your ideas are heart, and enter into core experiments, to drive such decisions. It is therefore of importance that you come to meetings. Also, test results should be available to you, thus allowing you to reproduce the findings of the members. If you want to communicate your ideas, it is important to register input documents on your proposals. These documents are archived, and can very much help to fight a patent later on as they document your prior art. It would be the job of the patent office to verify such documents, but they are snowed under by so many applications that the patent system itself became close to worthless, and I believe there should be a higher entry barrier to avoid trivial patents - but see above.

    Concerning the specs: I only went over it very quickly, and I do not agree that it is overly long, rather the reverse: 57 pages is pretty slim. About the half of it is the usual ISO boilerplate, which leaves about 30 pages for actual content. That is pretty concise for a spec. Look at the HEVC spec or the JPEG 2000 spec and you'll see what I mean.

    The problematic part here is that an academic or freelance developer has a hard time contributing to the process. Patents cost money (and you have to have them), going to the meetings cost money, processing documents costs time, and hence money. But don't believe this is any different at the other side of fence, in the open media alliance. They also have to pay developers, money doesn't fall from the sky, and whether they are ready to accept outside ideas I have no idea. I do not know by which protocol they operate, but since they have commercial interest, I doubt they do accept anyone walking in either. Not that I have tried, not my field anyhow.

  4. #3
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts
    My problem is the many fold, but not actually (I think) with the spec itself although it's hard to tell without any implementations, data or results! Note the 57 page one doesn't seem to contain enough to actually implement it from what I can tell. There's a 250+ page one with the actual description of the models being used. Unless the spec is so meta that it permits any model and the "study" one isn't the spec, but just an exemplar?

    [Edit: cut the crap - it was waffle and this is the bit that matters:]

    Most damning of all frankly, is the blatant PR campaign. That's most definitely what hurts the most.

    They KNOW that BAM isn't state of the art. They know this because the developers of the real state of the art tools were invited to an MPEG conference to present their work, and indeed bits of those ended up in the specification. Yet when it comes to all public announcements, talks at conferences and even the preprint paper, it's "look how good we are vs a 10 year old legacy format" with no references to modern work, or even admission that it exists. Their defense I saw was (we're not ready yet to compare against other compressed formayts), which is pretty flimsy given you did compare to one compressed format - the one with the lowest compression ratio. It's just plain deception. Even worse some of the "puff pieces" come with false statements that none of the newer formats gained any acceptance. Again, this is pretty obviously not true. All the while, there is zero hard data on how well the new format actually works, and no demonstration files you can download and play with other than by prior agreement (which I assume comes with some NDA? I haven't tried to be fair, but why aren't they public?). All the while this PR has been going on, at no point have IP and patents ever been mentioned. In short, people are seeing the PR as horribly biased and underhanded advertising and it's annoyed the community.

    I'm sure there are some good ideas and benefits with the format, but the way it's been handled probably means it'll be a really uphill struggle to get any major centre to accept it, which is a huge lost opportunity. It's not as if we don't use a lot of commercial software too, but they don't tend to be so misrepresented.

  5. #4
    Member
    Join Date
    Apr 2012
    Location
    Stuttgart
    Posts
    437
    Thanks
    1
    Thanked 96 Times in 57 Posts
    Quote Originally Posted by JamesB View Post
    My problem is the many fold, but not actually (I think) with the spec itself although it's hard to tell without any implementations, data or results! Note the 57 page one doesn't seem to contain enough to actually implement it from what I can tell. There's a 250+ page one with the actual description of the models being used. Unless the spec is so meta that it permits any model and the "study" one isn't the spec, but just an exemplar?
    I haven't studied the spec, so it is hard to say. There are two possible ways how I can read your statement, just to give you my thoughts. First is that you need to understand that WG11 (and also WG1) only specifies the format, not the actual implementation. Hence, what the spec typically does is that it provides you sufficient information to decode a codestream, along a "hypothetical reference decoder". If the specs encode the model in the codestream, then it is up to the encoder how to arrive at the models, and this is a "somebody else's problem" as far as ISO is concerned. The reason for this is that you need to understand what an ISO standard is about: It is not about "opening a technology" (that is what a patent does - or should, originally, see above), but to ensure interoperability. If you can decode a codestream from a competitor, interoperability is reached. For that, you do not need to have full access to the probability models of your competitor. In fact, a standard *should* allow different implementations that differentiate through compression quality, speed, ... and so on.

    Option #2 is that this is a multi-part standard, and additional parts may have missing pieces of information. This is not at all uncommon. For example, WG1 specs typically come in around 5 parts, starting from codestream, extensions, file format, conformance testing and reference software. This standard may split up differently.

    One way or another, you probably misunderstand what the specs are good for - it is not to copy the technology of a competitor, but only interoperability.

    Quote Originally Posted by JamesB View Post
    Most damning of all frankly, is the blatant PR campaign. That's most definitely what hurts the most.
    Oh well, scientific dishonestly will hurt anyone in the long run, no worries. It will fire back. It's probably just the usual PR nonsense, and someone will also walk up and compare to state of the art sooner or later. It is not exactly my field, so I cannot say much, but if someone walks into a conference with a paper without comparing to the state of the art, I would know how to respond.

    However, are you saying that there is no internal verification model within WG11? This looks like a serious violation of the protocol and I wonder on which basis decisions have been taken. We, at least, don't put fingers in the air claiming A is better than B without having measured. I would be very astonished if WG11 acted so unprofessional.

  6. #5
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts
    No I'm not saying that. I'm saying nothing about the internal processes as they are all *internal* (and I'm not). I cannot say what they did and didn't do, but I'd be very surprised if they didn't follow the official procedures too.

    Let me put it this way - we look at the format and we see "standard thing A" + "standard thing B" + "something odd" + "standard thing C". "Something odd" is novel and gets patented, thus the entire format is patented. It's quite a natural question to ask - is the "something odd" part there because it actually helps, or is it there because without it there is zero way to monetise the format? That's the skeptic in me, but I want to see the actual evidence and keeping it all secret just fuels the rumour mill rather than actually making the world sit up and say "You know what? That's actually quite clever!".

    In this case - "something odd" is the separation of DNA sequences into degrees of how many differences they have to the reference. I can see that more classification can lead to better modelling of the data, but A) I'd expect the size benefit to be tiny and B) over-classification runs the risk of making it harder for the model to learn and increases the impact of any format overheads.

  7. #6
    Member
    Join Date
    Apr 2012
    Location
    Stuttgart
    Posts
    437
    Thanks
    1
    Thanked 96 Times in 57 Posts
    Quote Originally Posted by JamesB View Post
    No I'm not saying that. I'm saying nothing about the internal processes as they are all *internal* (and I'm not). I cannot say what they did and didn't do, but I'd be very surprised if they didn't follow the official procedures too.
    That's the problematic part - if you cannot attend the meetings for whatever reason, you loose half of the story, and all the decision-making process. WG11 tries to be transparent in the sense that their mailing lists are open to everyone, and everyone is invited to participate. That's certainly appreciated, but despite all this openness, the general public is excluded from the final decision making and balloting within WG11 (or, to be 100% correct, SC29) simply because the ISO process works this way. Votes are made by members, and members are countries, not individuals, and countries have to find an opinion within their national bodies. If you only follow the mailing list, you loose an important part of the story.
    Quote Originally Posted by JamesB View Post
    Let me put it this way - we look at the format and we see "standard thing A" + "standard thing B" + "something odd" + "standard thing C". "Something odd" is novel and gets patented, thus the entire format is patented. It's quite a natural question to ask - is the "something odd" part there because it actually helps, or is it there because without it there is zero way to monetise the format? That's the skeptic in me, but I want to see the actual evidence and keeping it all secret just fuels the rumour mill rather than actually making the world sit up and say "You know what? That's actually quite clever!".
    It may certainly be that "something odd" came it because some member needed it to monetize the specs. This can hardly be avoided in an industry consortium, unfortunately. However, decision making should not be driven by such aspects, at least this is what the regulations have to say. Decisions shall be taken from a purely technical aspect, i.e. is A better than B. Again, in reality, things might be less rosy, but I wouldn't know how to prevent that from happening in an industry forum. It boils down to the question whether there was sufficient evidence for including or excluding "something odd", and I cannot tell. There should have been a dispute about it, and there should have been a core experiment about it. This is how AhGs are supposed to operate.
    Quote Originally Posted by JamesB View Post
    In this case - "something odd" is the separation of DNA sequences into degrees of how many differences they have to the reference. I can see that more classification can lead to better modelling of the data, but A) I'd expect the size benefit to be tiny and B) over-classification runs the risk of making it harder for the model to learn and increases the impact of any format overheads.
    From an information theoretic p.o.v, you are right of course. The interesting question is, of course, how you classify and how many classes you build. Most likely, the specs do not tell you that, simply because they do not have to tell. It is a decision an encoder implementation has to make, and where proprietary knowledge of the field is required. This looks like a very typical ISO construction: You classify how to signal the model, but you do not specify how to arrive at the model such as to allow competitors to provide different models and different classifications as closed competing technologies. As I say, "there is a good way, there is a bad way, and there is the ISO way".

  8. #7
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts
    I understand the point about defining the format + decoder and not the encoder. This is infact just how CRAM already works (despite claims to the contrary). There have been several CRAM encoder implementations and these sometimes lead to substantially different (equally valid) file sizes due to encoder choices. It's a complex decision, which we all know of course because it's analogous to the way things like LZ works. I don't yet believe CRAM has optimal encoders, but sometimes the choice also comes down to speed, random access granularity, as well as the size.

    I found a bit more information on the spec and dates; it is indeed split up, and some if it has still to come: https://genomsoft.com/2018/04/20/mpe...122-san-diego/. The one you looked at was transport only. 23092-2 is the main compression specification. It looks like conformance is another year away or so still. If it's still at such a draft level, why are they lobbying governmental bodies already to adopt their format. That stinks. We need a public bake off first before anyone decides whether the format has the potential to be a good choice, but it's hard to do that when there's not a single public file in MPEG-G format out there yet, let alone characteristics (speed, granularity, etc).

    PS. I understand how much of the ISO system works as I looked into becoming a BSI member in order to be able to attend the meetings, etc, but the burden (not just MPEG, but BSI too) was too large - it'd turn into a full time job and needs a huge travel budget. This level of beaurocracy serves to keep it in the commercial domain only, which isn't a good thing when you're moving into a traditionally academic sector and want to get the best ideas into your format.

  9. #8
    Member
    Join Date
    Apr 2012
    Location
    Stuttgart
    Posts
    437
    Thanks
    1
    Thanked 96 Times in 57 Posts
    Quote Originally Posted by JamesB View Post
    I found a bit more information on the spec and dates; it is indeed split up, and some if it has still to come: https://genomsoft.com/2018/04/20/mpe...122-san-diego/. The one you looked at was transport only. 23092-2 is the main compression specification. It looks like conformance is another year away or so still. If it's still at such a draft level, why are they lobbying governmental bodies already to adopt their format. That stinks. We need a public bake off first before anyone decides whether the format has the potential to be a good choice, but it's hard to do that when there's not a single public file in MPEG-G format out there yet, let alone characteristics (speed, granularity, etc).
    That depends on which draft level. ISO has CD, DIS, optionally FDIS, and IS. From DIS on, the technology is frozen. That means you can only make editorial changes, i.e. fix typos, reformulate paragraphs that are hard to read, reformat. You cannot add new technology at this level, or the specs go back to CD level. IOWs, it is fairly safe that the technology does not change from DIS on, and making comparisons at this stage is quite a fair thing to do. We did the same with JPEG XS - publish results, even at CD level, to give people an idea what we are working on.
    Quote Originally Posted by JamesB View Post
    PS. I understand how much of the ISO system works as I looked into becoming a BSI member in order to be able to attend the meetings, etc, but the burden (not just MPEG, but BSI too) was too large - it'd turn into a full time job and needs a huge travel budget. This level of beaurocracy serves to keep it in the commercial domain only, which isn't a good thing when you're moving into a traditionally academic sector and want to get the best ideas into your format.
    Yes, traveling is a problem. But is the same problem for anyone, so it is a fair share. Reality shows that you need to organize face to face meetings to be efficient, and for that reason, the destination rotates over the globe, so to say. ISO states that meetings shall be free of charge, but MPEG has an exception. This is because they are so huge that it is hard to find meeting spots. We (JPEG) are a smaller team, and national body organizations can typically host us free of charge. But if you have to rent meeting rooms, the money for that has to come from somewhere, and ISO doesn't have it either.

  10. #9
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    645
    Thanks
    205
    Thanked 196 Times in 119 Posts
    Last edited by Jarek; 14th October 2018 at 16:40. Reason: hacker news

  11. #10
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts
    Thanks Jarek. At this point my goals are:

    1. To ensure all patents from MPEG-G that infringe on prior art get squashed. This is just dickish behaviour frankly. It's ongoing work, but an uphill struggle: you're in a maze of twisty patents, all alike.

    2. To make sure everyone who reads the propaganda is aware of the presence of patents (oddly always absent from the talks, posters, and articles), as well as the existing alternatives. The coverage is getting there, but needs more high profile statements than a mere CRAM developer with obvious biases.

    3. To have open and transparent comparisons between the technologies. This isn't something I can do, and so far it's falling on deaf ears to the extent that they're trying to defend the lack of comparisons to CRAM (while simultaneously somehow managing to compare against BAM). I'll keep trying though.

    2 and 3 combined are necessary for people to make an informed choice. I'm not against MPEG-G (although I am obviously against their specific patents), but people need to be able to assess all the technologies with all of the facts laid bare. It's been too one-sided to date. I wasn't happy with that, but was letting it slide nonetheless, until the pressure of it and patents just pushed me too far!

  12. The Following User Says Thank You to JamesB For This Useful Post:

    Jarek (15th October 2018)

Similar Threads

  1. Replies: 13
    Last Post: 7th November 2009, 01:02

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •