Page 1 of 2 12 LastLast
Results 1 to 30 of 33

Thread: Why did Rust become very popular?

  1. #1
    Member kassane's Avatar
    Join Date
    Jan 2016
    Location
    Brazil
    Posts
    4
    Thanks
    35
    Thanked 1 Time in 1 Post

    Lightbulb Why did Rust become very popular?

    Hi peoples!

    According to the communities that continue to use this programming language they intend to rewrite everything from C/C++ to Rust.
    In the case of this community focused on programming with data compression, is there the intention of some want to redo the software in rust?

    Thank you!
    "Knowledge is infinite, but the experience is parallel".

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    Rust uses LLVM, which is currently not the best code generator (especially with SIMD),
    and data compression really needs all the possible speed optimization from the compiler.
    So doesn't seem likely.

  3. The Following 2 Users Say Thank You to Shelwien For This Useful Post:

    kassane (23rd July 2018),khavish (3rd August 2018)

  4. #3
    Member kassane's Avatar
    Join Date
    Jan 2016
    Location
    Brazil
    Posts
    4
    Thanks
    35
    Thanked 1 Time in 1 Post
    In your opinion, what is the real difference between this new language and C/C++?
    So what do they say about high performance and security is fake?
    "Knowledge is infinite, but the experience is parallel".

  5. #4
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    I'm redoing my context mixing compressor in Rust: https://github.com/tarsa/demixer Old version was written in plain C: https://encode.ru/threads/1671-Demix...in-development It's not focused on maximum speed. OTOH cmix from Byron Knoll is written in C++, but it isn't a speed monster either.

    I haven't had a single segfault or other forms of instability when running my program written in Rust so at least that's a big plus. Performance difference between Rust and C++ shouldn't be high anyway: https://benchmarksgame-team.pages.de.../rust-gpp.html Both languages are close enough in mechanics that such benchmarks are of some value. There's one notable difference though: Rust implementations for those benchmarks don't use any SSE/ AVX/ whatever compiler intrinsics. That's why Rust loses badly on nbody.

    I like the safety of Rust. Rust compiler enforces many invariants that C++ compiler doesn't know about - e.g. when a smart pointer gets invalid and shouldn't be used. OTOH writing code is often painful compared to e.g. Java.

  6. #5
    Member kassane's Avatar
    Join Date
    Jan 2016
    Location
    Brazil
    Posts
    4
    Thanks
    35
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Piotr Tarsa View Post
    I like the safety of Rust. Rust compiler enforces many invariants that C++ compiler doesn't know about - e.g. when a smart pointer gets invalid and shouldn't be used. OTOH writing code is often painful compared to e.g. Java.
    In this case you are referring to borrowing and ownership. In which you can move the object in an implicit form and have control over mutability?
    "Knowledge is infinite, but the experience is parallel".

  7. #6
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    C++ with smart pointers also has borrowing, ownership, control over mutability, moving, copying, etc In fact borrowing and ownership concepts come from C++ smart pointers implementation: https://doc.rust-lang.org/stable/ref...nfluences.html

    From my perspective Rust is superior to C++ because correctness does not depend as much on discipline as in C++. Compiler checks many things by default, which gives me comfort - no compilation errors means no wrong pointers (unless I use unsafe blocks of course). If compiler checks (and also concise compilation error messages) that Rust offers are of no value for you then Rust is much less appealing, I think. But still it has some pretty nice syntax when compared to C++.

    It all depends on what you want. Personally, I got fed up with constant memory corruption bugs ( https://en.wikipedia.org/wiki/Memory_corruption ) that are super easy to make in C or C++ that I've decided to give Rust a try. Rust is not nearly as convenient as Scala to me (there's huge difference in convenience) but it feels better than C and C++.

  8. The Following User Says Thank You to Piotr Tarsa For This Useful Post:

    kassane (23rd July 2018)

  9. #7
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    To me it feels like the focus on pointers and memory management is a bit outdated.
    Sure, when you have 1M of total RAM and programming language without template equivalent,
    there may be no easy workaround.
    But now, at least in codec development, I don't feel the need to use dynamic memory at all.
    Even reflate with nested stream processing doesn't have any dynamic allocation at all.
    In fact, when I have to use dynamic allocation at local level, it usually means lazy design -
    I don't want to implement an algorithm that would work with given (reasonably large, but known in advance)
    amount of memory, and instead tell the user that he doesn't have enough, if something happens.

    Getting rid of dynamic allocation (and thus smart pointers etc) is also a good source
    of performance improvements. Uncached memory read can be slow like 100 divisions,
    and any memory management requires working with memory, so yeah...

    But if Rust developers can beat C++ compilers at code generation quality - sure, I'd switch.
    As it is, I don't like gcc and VS, but still port code for gcc, because it currently has the best scalar
    code generation, and for VS when I need small executables.

  10. The Following 2 Users Say Thank You to Shelwien For This Useful Post:

    Bulat Ziganshin (25th July 2018),kassane (25th September 2018)

  11. #8
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    What about game engines? Games have prescribed memory requirements (e.g. 8 GiB RAM required to run the game). Does that mean we can allocate everything statically in a game engine and we'll have less bugs then?

  12. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    I don't see why not, but its heavily dependent on design patterns.
    In fact, I only started getting all-static implementations after switching to coroutines - before that
    there were always these lazy design issues like "let's compress a block, but output size is unknown, so let's allocate something".
    Of course, its more a matter of thinking - templates can be well enough emulated with macros,
    and there can be all kinds of creative workarounds for buffer overflows - for example, lzma has a special "dummy"
    decoder which it starts to use once buffer boundary gets close enough.
    (so first the "dummy" checks the amounts of input/output, then real decoder is executed if it fits;
    only "dummy" has detailed boundary checks, so "real" is much faster; lzma source has separate implementations,
    but its obvious that they could be compiled from the same function template).

    In particular, I agree that it'd be hard to find a universal solution for usual string manipulations -
    something that would allow to compile a program with all kinds of 'func(filename+".txt");' expressions
    without relying on dynamic allocation.
    But this only matters when you have to write a compiler.
    When its your own code, its usually possible to have a single temp buffer for all string manipulations,
    although having to write 'strcpy(buf,filename); strcat(buf,".txt"); func(buf);" may be less convenient.

    Also, x64 + virtual memory means that if you want 1Tb for some kind of pool, you can just reserve it,
    and only have required number of pages actually mapped.
    As to x86, it also has an interesting workaround called "selectors" - on x86 there's not enough address space to fit all,
    but with virtual memory its possible to reallocate a pool without copying (just assign same pages to a different address range),
    and software using selector:offset pointers allows for system transparently changing the selector's base address.
    Unfortunately, selector idea was mostly discarded in favour of "flat model", but hardware still supports it, so it can be used.

    Btw, there was this "competition": http://web.archive.org/web/201402271...om/s_scan.html
    Original "archiver template" has 5x slower speed specifically because of string manipulation and dynamic alloc.

  13. #10
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    I would say your use case is pretty simple - a compressor does one job throughout its life. Its data structures are used from the start to the very end so there's no point in delaying allocating them or deallocating them early. I wouldn't say it's a common pattern though. It's often beneficial to release memory by one task to make room for another task (but I'm mostly working on business web apps for a living, so maybe that gives me such impression). Surely it's the case with Mozilla Firefox - a browser can't try to allocate a fixed amount of all needed memory at the beginning and try to get away with that.

    String concatenation seems to be a task where it's hard to come up with generic String class that would have a top performance, matching mundane manual optimizations. Java VM for example detects code sequences that do string concatenation and replace them using JITed specialized code (that reduce number of allocations and probably bounds checks). That's a fragile mechanism, but allows for dramatic speedups. Details are here: http://openjdk.java.net/jeps/280

    Lazy allocation of memory pages is OK if you don't initialize the memory, don't use it as a big hashmap (because it scatters elements throughout entire space, materializing all the pages) and don't need to release them. So again it has pretty limited use case. However, in case of my demixer compressor, where I have a big suffix tree and allocate nodes linearly it works pretty well. If someone allocate too much space for a tree, excess memory will get reserved but not touched, so in fact no excess physical pages will be used.

  14. #11
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > I would say your use case is pretty simple - a compressor does one job
    > throughout its life.

    Its not the case with reflate - it has detector, deflate encoding and decoding,
    diff, and two CM models for headers and diff data.
    There's also nested stream processing, which applies the same algorithms to
    detected and unpacked streams, recursively.
    While in zlib, even decoder already has plenty of malloc usage.
    So its clearly mostly a matter of programming style.

    > Its data structures are used from the start to the very end so there's no
    > point in delaying allocating them or deallocating them early.

    Its true. But most of dynamic allocation is done in cases where
    the amount of saved memory doesn't matter atm - for example, with unknown
    table size N=1..256 they would allocate a dynamic table of size N,
    instead of simply keeping a static table of size 256.

    > I wouldn't say it's a common pattern though.

    I think its very common, just a legacy of old times.
    With GBs of RAM and virtual address space though, it simply became inefficient.

    > It's often beneficial to release memory by one task to make room for
    > another task.

    Obviously not using any dynamic allocation at all is not the goal.
    On other hand, its the usual source of very obscure errors
    (for example, in systems without GC, its possible to cause denial-of-service
    by carefully crafting inputs to fragment memory) and performance degradation.
    Modern platforms also allow to totally avoid it -
    its just not the style supported by modern programming languages (somehow).

    > Surely it's the case with Mozilla Firefox - a browser can't try to allocate
    > a fixed amount of all needed memory at the beginning and try to get away
    > with that.

    They kinda can - browsers tend to use multiple OS processes these days,
    which have their own virtual address spaces.
    So if you want to safely free up resources, you can just terminate the process.

    > String concatenation seems to be a task where it's hard to come up with
    > generic String class that would have a top performance, matching mundane
    > manual optimizations.

    Main problem is the implied "infinite" (usually 4G anyway) length of strings.
    If we could use limited (at compile time) strings - eg. up to 32k wchars for windows filenames,
    the compiler would be able to share a few static buffers for all intermediate values,
    same as what I do manually.

    Some cases where strings are usually accumulated in memory also
    can be dealt by implementing them as stream processing with static buffers.

    And remaining cases, where we can expect potentially GBs of data,
    may be important enough to rely on dynamic allocation - though still,
    there's no point in using it per string.

    > Java VM for example detects code sequences that do string concatenation and
    > replace them using JITed specialized code

    Well, I guess we can expect more of that with time.
    There's also a trend to push popular algorithms into hardware,
    where dynamic allocation is obviously problematic.

    > Lazy allocation of memory pages is OK if you don't initialize the memory,
    > don't use it as a big hashmap (because it scatters elements throughout
    > entire space, materializing all the pages)

    Its actually possible to design a variable-size hashtable.
    One way is to put current hashtable size into cells, so when
    resolving collisions you'd know to look further.

    > However, in case of my demixer compressor, where I have a big suffix tree
    > and allocate nodes linearly it works pretty well.

    For specific data structures its also usually possible to implement
    an efficient custom memory pool, without OS calls and memory fragmentation.

    Its the same issue like with strings really.
    There's no syntax to tell the compiler what kind of sizes a variable-size structure
    can have and how much memory you need to store its instances, so compiler
    can't do it for you.

  15. #12
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Its the same issue like with strings really.
    There's no syntax to tell the compiler what kind of sizes a variable-size structure
    can have and how much memory you need to store its instances, so compiler
    can't do it for you.
    The invokedynamic bytecode instruction from Java kinda solves that problem, at least on a very local level, as I've already pointed: http://openjdk.java.net/jeps/280
    indy (invokedynamic) works by code generation at runtime. In case of string concatenation (because indy is a versatile mechanism with many use cases) types and handles of arguments to be concatenated are passed using varargs to code generation function (in Java it's called bootstrapping method IIRC) which in turn uses it to generate optimized code that doesn't use varargs at all, but operates on concatenated things directly. Code generation function can be upgraded in future versions of Java, so your old code can become faster without bytecode recompilation (indy instruction will stay as it is).

    Main problem is the implied "infinite" (usually 4G anyway) length of strings.
    If we could use limited (at compile time) strings - eg. up to 32k wchars for windows filenames,
    the compiler would be able to share a few static buffers for all intermediate values,
    same as what I do manually.
    The problem with buffers sharing is that you can have multithreaded application. In Java multithreading is the norm. So you would need synchronisation when acquiring the buffers from buffer pool and synchronisation in itself can kill the performance.

    OTOH you can use Thread Local Storage to keep buffers, but you would need to be careful about that. Acquire and release methods are still needed as well as a fallback do dynamic allocation (although it would be made infrequently in typical use cases, I think). Acquire and release doesn't need to be synchronised in case of TLS so at least that would be cheap.
    Last edited by Piotr Tarsa; 25th July 2018 at 14:07.

  16. #13
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    > operates on concatenated things directly

    I can do this with coroutines already.
    My coroutine class is designed mainly for stream processing -
    its this one actually:
    https://github.com/Shelwien/stegdict...Lib/coro2b.inc (coroutine class)
    https://github.com/Shelwien/stegdict...b/coro_fp2.inc (usage demo)
    so it won't have a problem whether you feed it multiple strings or
    a concatenated string in a single buffer.

    As to dynamic code generation, its surely interesting, but
    isn't it too heavy as a replacement for dynamic alloc? :)
    Also, JIT is supposed to work in realtime, so can't optimize that well.

    In theory, PGO should be able to do this... but in C++ strings are
    not part of language, so its hard to target them in compiler.

    > OTOH you can use Thread Local Storage to keep buffers,

    As I said, getting rid of dynamic allocation is not a goal on its own.
    Thread creation would use dynamic allocation anyway - for stack if nothing else.

    The idea is not to use it for every small thing, especially not for
    fields of a big object which is dynamically allocated anyway.

  17. #14
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    As to dynamic code generation, its surely interesting, but
    isn't it too heavy as a replacement for dynamic alloc?
    Also, JIT is supposed to work in realtime, so can't optimize that well.

    In theory, PGO should be able to do this... but in C++ strings are
    not part of language, so its hard to target them in compiler.
    Java VM profiles code all the time and can do pretty powerful optimizations, but those optimizations are limited to frequently used parts of code called "hot spots" (hence the name: https://en.wikipedia.org/wiki/HotSpot ) to limit the time spent on compiling. PGO in JIT compilation is much more needed and much more advanced than in AOT compilation.

    As I said, getting rid of dynamic allocation is not a goal on its own.
    Thread creation would use dynamic allocation anyway - for stack if nothing else.

    The idea is not to use it for every small thing, especially not for
    fields of a big object which is dynamically allocated anyway.
    Hmm, I'm not sure how TLS works in C/ C++, but in Java I can use thread local variables from any thread, including the main thread on which the application entry point function is run. Therefore you don't need to create another thread to use thread local storage. TLS IIUC is independent from thread stack.

  18. #15
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    In C++/Windows, TLS is based on this: https://docs.microsoft.com/en-us/win...-tls-directory
    I don't like it (its bloated and also not portable), so try to avoid it.


    __declspec(thread) int x = 0x12345678;

    int main( ) {
    printf( "%X\n", &x );
    }

    .text:00401000 _main proc near ; CODE XREF: start+DE↓p
    .text:00401000 mov eax, TlsIndex
    .text:00401005 mov ecx, large fs:2Ch
    .text:0040100C mov edx, [ecx+eax*4]
    .text:0040100F add edx, 4
    .text:00401015 push edx
    .text:00401016 push offset Format ; "%X\n"
    .text:0040101B call ds:printf
    .text:00401021 add esp, 8
    .text:00401024 xor eax, eax
    .text:00401026 retn
    .text:00401026 _main endp

  19. #16
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    Yup, so no new thread creation is needed - you have access to TLS from any thread, including starting one.

    __declspec(thread) may be non-portable, but there's Boost library and also thread_local keyword in C++11 which should be supported by every major compiler now.

    Whole logic could be something like:

    thread_local uint8_t tl_buffer[123456];
    thread_local bool tl_buffer_taken = false;

    void* get_buffer(size_t size) {
    if (size > 123456 || tl_buffer_taken) {
    return malloc(123456);
    } else {
    tl_buffer_taken = true;
    return tl_buffer;
    }
    }

    void release_buffer(void* buffer) {
    if (buffer == tl_buffer) {
    tl_buffer_taken = false;
    } else {
    free(buffer);
    }
    }
    Last edited by Piotr Tarsa; 29th July 2018 at 11:48.

  20. #17
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Rewriting Brotli decoder from C to Rust reduced the performance by 28%:

    https://blogs.dropbox.com/tech/2016/...n-with-brotli/

  21. The Following 3 Users Say Thank You to Bulat Ziganshin For This Useful Post:

    anormal (29th August 2018),kassane (19th September 2018),Shelwien (29th August 2018)

  22. #18
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    464
    Thanks
    202
    Thanked 81 Times in 61 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    Rewriting Brotli decoder from C to Rust reduced the performance by 28%:

    https://blogs.dropbox.com/tech/2016/...n-with-brotli/
    Still, they preferred Rust because of the security it provides....
    Brotli Decompression in Rust
    Once the files have been uploaded, they need to be durably persisted as long as the user wishes, and at a moment’s notice they may need to be restored to their original bits exactly in a repeatable, secure way.

    For Dropbox, any decompressor must exhibit three properties:

    1. it must be safe and secure, even against bytes crafted by modified or hostile clients,
    2. it must be deterministic—the same bytes must result in the same output,
    3. it must be fast.

    With these properties we can accept any arbitrary bytes from a client and have full knowledge that those bytes factually represent the file data.

    Unfortunately, the compressor supplied by the Brotli project only has the third property: it is very fast. Since the Brotli decompressor consists of a substantial amount of C code written by human beings, it is possibly neither deterministic nor safe and secure against carefully crafted hostile data. It could be both secure and deterministic, but there is simply too much code to reason through a mathematical proof of this hypothesis.

    Operating at Dropbox scale, we need to guarantee the security of our data, so our approach was to break down the problem into components. By writing a new Brotli decompressor in a language that is safe and deterministic, we only needed to analyze the language, not all the code written in it. This is because such a language would prevent us from executing unsafe code (eg. array out of bounds access) or nondeterministic code (eg reading uninitialized memory), so therefore we can trust the code to repeatably produce the same output without any security risks.

    The Rust programing language fits the bill perfectly: it’s a language that promises memory safety without garbage collection, concurrency without data races, and abstractions without overhead. It also has sufficient performance for our needs. That means that code written in Rust has the same memory requirements as the equivalent code written in C. At Dropbox, many of our services are actually memory bound, so this is a key advantage over a garbage collected language.

  23. #19
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts
    I'd say Rust didn't matter in the end.
    They just had to rewrite the decoder to prove its safety, because original one was too messy.
    But then they replaced language standard library and memory manager, and
    After the virtual memory is allocated, we enable a timer using the alarm syscall, to avoid a runaway process that never returns control.
    Finally, we enter the process into the secure computing (SECCOMP) mode, disabling any system calls except for read, write, sigreturn and exit.
    I'm pretty sure that at this point they could use C/C++ just as well, without any loss of security.

  24. #20
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Quote Originally Posted by Shelwien View Post
    I'm pretty sure that at this point they could use C/C++ just as well, without any loss of security.
    almost. in the days of Meltdown, you can't be sure in anything. So I like their double-check approach

  25. #21
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    222
    Thanks
    89
    Thanked 46 Times in 30 Posts
    One concern I've had about Rust's safety guarantees is that they depend on the compiler functioning as intended, without any bugs that would undermine those safety guarantees. This implies that Rust's compiler should be formally verified (like the CompCert C compiler), but I've heard nothing about it. I also don't understand enough about the boundary and interaction between the Rust compiler and LLVM, and whether LLVM bugs could also undermine Rust's safety guarantees.

  26. #22
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    I dont really understand the concern. The same thing can be said about e.g. Java VMs - if theyre broken then you wont get reliable null checking, bounds checking, garbage collection at the right time, etc. Azul made a JIT compiler for Java based on LLVM: https://www.azul.com/press_release/falcon-jit-compiler/

  27. #23
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    83
    Thanks
    25
    Thanked 15 Times in 13 Posts
    I think rust got "popular" because its community is just way too noisy and (being centered around Mozilla) is good at marketing bullshit - that's all Mozilla is recently all about, unfortunately, preferring marketing "solutions" to technical ones. So I would take their loud reasoning with grain of salt. Replacing everything with rust? Okay, show it? What about writing some secure yet usable OS, for example? Let's see how it performs and what all these loud claims are worth of. Especially if few millons of users would use that, thoroughly study that and try to break that all the ways they can. Nope, attackers wouldn't limit self to pointer math or memory allocation. That's what rust ppl seems to miss wildly.

    Some background on this thinking...
    1) What security problems are, to begin with? I would agree with https://en.wikipedia.org/wiki/Daniel_J._Bernstein definition of security problem he posted somewhere: he claimed these are just bugs in programs. Not every bug is a security issue. However every security issue is a bug. So vulnerabilities are subset of bugs.
    2) Generally, attacker misuses bug in program to provoke program into doing something "useful" to attacker. That's the whole point of attack. This said, I guess, tricking program into sending $5000 from your credit card to some wrong person could be viewed way more "useful" by most attackers.
    3) Note, DJB definition is very generic and does not includes any particular technologies, be it "pointers","memory allocation" or whatever. My experience suggests it being very true definition. That's where I'm getting very suspicious on security claims of Rust fans, to say the least. Because it seems they wildly miss this point, being overinclined on just one particular problem. Hardly worst or most abused to the date. This seems to be "tunnel vision". Or even intentional dirty play marketing.

    If I would compare all that to just C, C would be relatively simple, most problems are widely known, there're plenty of analyzers and instrumentation to check typical problems (e.g. ASAN and UBSAN in gcc and clang) and overall problems are more or less understood. When one needs ULTIMATE program reliability, e.g. for critical realtime control, safety and so on (yes, most of it is C now and I'm yet to see Rust performing equally well in the area where software failure can cause loss of life or injury - would Rust fans dare to hand over control of their car to their programs, just like C firmwares do it all the time?), there're some well-known design rules like MISRA C specification that allows one to code program the ways it supposed to be as free of bugs as far one can do that at their best. As side effect of former definition it implies such program is supposed to be secure as well. Though writing compression software under these specifications could be tricky since they tend to prohibit pointer math and many neat tricks :P.

    And now what if someone puts far more complicated language, larger runtime, plenty of libs, etc instead? With far more features, far less understood in terms of its own problems, far less analisys instruments and even aggressively pedalling it as "secure"? Would that really improve state of things? Somehow I doubt that. For a reason.

    As concrete example how similar logic of same community has failed already: at some point Mozilla (yes, Mozilla again) people gone for JavaScript to implement their PDF viewer. They thrown plenty of loud claims this going to secure, unlike other things. This made it slow and resource hog and they even gone for very strange idea of transforming incoming PDF data into JavaScript and then evaluating it, as one enormous script. So they got rather shitty performance, some PDFs can easily take up to 5 minutes to render whole thing to JS or even kill browser by OOM condition since trying to represent huge PDF as JS isn't terribly efficient thing to do. It wouldn't be huge security problem though, except maybe DoS attack on browser trying to show pdf that would exhaust all system resources. But Mozilla gone one step further playing with fire. The interface of browser and PDF viewer is also JavaScript. Unlike external JS, this have to be "privileged" to allow filesystem access and other things browser is supposed to be able to do. Does it rings bell already?

    At some point things gone really bad. First, attackers learned that JS PDF viewer can be fed up with pre-made JS - basically omitting stupid pdf transformation and just executing pre-made attacker's JS instead. So attacker had arbitrary JS execution at this point. Not a big problem on its own - it runs in sandboxed context. That's where more fun happened - attackers learned there is another bug. Far more interesting one - UI of PDF viewer is privileged to deal with filesystem and so on. And they found way to cross privilege separation, so JS they thrown as alleged "PDF document" not just would execute - but also elevate into privileged context. Being able to do ... virtually everything browser can do at all.

    This said Mozilla surely suffered from tunnel vision and haven't expected this at all, expecting JS to solve all their security woes. For example, Chrome goes way further in entrenching security, using OS-level facilities to sandbox on process level, as long as OS can do that. In Linux it uses rather efficient "containers" - so attacker would basically find self in empty boring "fake" system with no processes except attacked one and no user data. This makes attack outcome rather unrewarding. But Mozilla haven't bothered to implement all that. Despite the fact it very small amount of code - just a matter of few syscalls, order of magnitude easier than rewrite of PDF viewer or new programming language. Their overconfidence in JS security made them grossly disregard considerations like that though.

    Result? Really terrible 0day exploits were running in the wild - all Firefox users got their HDDs literally "shared on the internet". Exploit proven to be 100% reliable. Unlike overruns, it wouldn't fail on stack smashing protection or something. What more? Being JS it proven to be utterly crossplatform as well. Running equally well on any OS where Firefox could start at all. Suddenly Linux, Windows, Mac and something else were all equal - they all were pwned. Millions of keys, passwords and other sensitive data has been stolen by scripts scanning HDDs for valuable assets. This looting continued for weeks before mozilla spotted signs of trouble, when someone stumbled on such thing actively munching their whole HDD content and someone finally got suspicious where all this HDD activity comes from. Since it also used browser for all networking, no firewall or antivirus would save from this attack - for the fear of breaking legitimate browser activity in the process.

    Another nice nitpick would be BitMessage. Allegedly "secure" chat written in python. Sure, it can't have notion of buffer overruns. So it did eval() in a way attacker can run that on ... incoming message data! So specially crafted message would just execute arbitrary python code. Without need to overrun buffers at all. Btw, it enough to take a short glance on code of that thing to get idea it can't be secure no matter what language it is - some really chaotic prototype-quality code, written in a hurry. Such code doomed to have huge amount of bugs. Some of them would prove to be vulns. One just did.

    TL;DR systems security is very complicated and multidimensional problem. Trying to downplay it to just few memory management errors is so freakin lame. Furthermore, pedalling some language is "secure" makes programmers relaxed - at which point it getting convenient for attackers since tricking unwary program into doing something unexpected is far easier at this point.

    p.s. DJB showcased C programs can be secure - by writing some of them. They proven to be like that - that was nearly pioneering in paying bounty to security researchers who found vuln. Now this practice is far more common to encourage researchers to disclose vulns rather than sell them on black markets. There is price though. To stay bug free and secure program should stay small and simple. Just that. Huge runtimes and complicated languages are step away from this direction. Bringing more bugs. And therefore more vulns.

  28. #24
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    222
    Thanks
    89
    Thanked 46 Times in 30 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    I dont really understand the concern. The same thing can be said about e.g. Java VMs - if theyre broken then you wont get reliable null checking, bounds checking, garbage collection at the right time, etc. Azul made a JIT compiler for Java based on LLVM: https://www.azul.com/press_release/falcon-jit-compiler/
    The concern is based on the desire to have truly, verifiably secure software. Formal verification exists, it's just very labor-intensive and rare for currently popular programming languages. I think the current state of mainstream computing security is wildly unacceptable and unnecessarily so. It's fundamentally because our programming languages suck.

    For the record, I don't like Rust. We need something with similar features but with much cleaner syntax and teachability (and a clean-sheet, highly optimizing compiler toolchain like DJB proposed, not LLVM). I think programming in general is in an awful state from a usability and teaching standpoint, which deters large numbers of smart people from the profession. We've been lazy in this timeline, just totally complacent with ancient, terrible languages like C and C++, and their imitators. Just look at Dart and Go. Brand new 21st-century programming languages from a multibillion dollar company, and they're like C! Jesus man. (Though to be fair, Go's compiler and garbage collector are extremely impressive, solid work. There's still a lot of room for optimization though, if they didn't demand that all programs compile in a couple of seconds. They've barely scratched the surface of modern CPU instructions...)

  29. #25
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    222
    Thanks
    89
    Thanked 46 Times in 30 Posts
    @xcrh, when did the Firefox vulnerability happen?

  30. #26
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    TL;DR systems security is very complicated and multidimensional problem. Trying to downplay it to just few memory management errors is so freakin lame. Furthermore, pedalling some language is "secure" makes programmers relaxed - at which point it getting convenient for attackers since tricking unwary program into doing something unexpected is far easier at this point.
    Mozilla talks about memory safety on rust-lang.org. Its not the same as safety in general. Checking more things at compile time is generally a good thing (unless you overengineer that).

    The concern is based on the desire to have truly, verifiably secure software. Formal verification exists, it's just very labor-intensive and rare for currently popular programming languages.
    Bear in mind that Rusts standard library is full of code marked as unsafe, i.e. code that does ordinary unchecked pointer arithmetic and dereferencing. And there are no formal proofs for correctness of that code. So even if compiler was 100% correct then you would have to ensure correctness of standard librarys unsafe code.

    Rust syntax embeds directly into language what is a library in C++. Im talking about smart pointers and their ownership, borrowing and lifecycle. Rust compiler checks for correctness of them while C++ compilers produce UB (undefined behaviour) when smart pointers are used incorrectly. In my opinion, by knowing immediately that you used pointer in a wrong way makes Rust easier than C++ in that regard. Of course the easiest way do deal with pointers is to avoid them (i.e. pointer arithmetic and management) by using language with proper garbage collection. Thats what most programmers do - they rely on GC to free themselves from worrying about pointer correctness. GC also often work faster than malloc/free. Look here: https://benchmarksgame-team.pages.de...narytrees.html Java version takes 8.28s while fastest C version using malloc/free (as opposed to memory pools which are a special case optimization) takes 18.05s

  31. #27
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    222
    Thanks
    89
    Thanked 46 Times in 30 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    Mozilla talks about memory safety on rust-lang.org. Its not the same as safety in general. Checking more things at compile time is generally a good thing (unless you overengineer that).


    Bear in mind that Rusts standard library is full of code marked as unsafe, i.e. code that does ordinary unchecked pointer arithmetic and dereferencing. And there are no formal proofs for correctness of that code. So even if compiler was 100% correct then you would have to ensure correctness of standard librarys unsafe code.

    Rust syntax embeds directly into language what is a library in C++. Im talking about smart pointers and their ownership, borrowing and lifecycle. Rust compiler checks for correctness of them while C++ compilers produce UB (undefined behaviour) when smart pointers are used incorrectly. In my opinion, by knowing immediately that you used pointer in a wrong way makes Rust easier than C++ in that regard. Of course the easiest way do deal with pointers is to avoid them (i.e. pointer arithmetic and management) by using language with proper garbage collection. Thats what most programmers do - they rely on GC to free themselves from worrying about pointer correctness. GC also often work faster than malloc/free. Look here: https://benchmarksgame-team.pages.de...narytrees.html Java version takes 8.28s while fastest C version using malloc/free (as opposed to memory pools which are a special case optimization) takes 18.05s
    The rules on the benchmark say memory pools are not allowed. Are you saying that the five C/C++ entries and two Rust entries that beat Java's best entry (8.28 sec) are using memory pools?

  32. #28
    Member
    Join Date
    Nov 2014
    Location
    California
    Posts
    122
    Thanks
    36
    Thanked 33 Times in 24 Posts
    Quote Originally Posted by SolidComp View Post
    The rules on the benchmark say memory pools are not allowed. Are you saying that the five C/C++ entries and two Rust entries that beat Java's best entry (8.28 sec) are using memory pools?
    Frankly these tests are pretty pointless, especially those running for a few seconds (E.G. the JVM needs warm up).

  33. #29
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    The rules on the benchmark say memory pools are not allowed.
    They allow library pools, but disallow custom pools.
    Variance
    Use default GC, use per node allocation or use a library memory pool.

    As a practical matter, the myriad ways to tune GC will not be accepted.

    As a practical matter, the myriad ways to custom allocate memory will not be accepted.

    Please don't implement your own custom "arena" or "memory pool" or "free list" - they will not be accepted.
    Are you saying that the five C/C++ entries and two Rust entries that beat Java's best entry (8.28 sec) are using memory pools?
    Yes. Just look into them.
    C/ C++ versions use:
    - apr_pools.h (memory pool from Apache Portable Runtime)
    - boost/pool/object_pool.hpp
    Swift also uses memory pool from APR.
    Rust uses typed arena, which is a memory pool. Quoting description:
    Arenas are a type of allocator that destroy the objects within, all at once, once the arena itself is destroyed. They do not support deallocation of individual objects while the arena itself is still alive. The benefit of an arena is very fast allocation; just a vector push.

  34. #30
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    222
    Thanks
    89
    Thanked 46 Times in 30 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    They allow library pools, but disallow custom pools.


    Yes. Just look into them.
    C/ C++ versions use:
    - apr_pools.h (memory pool from Apache Portable Runtime)
    - boost/pool/object_pool.hpp
    Swift also uses memory pool from APR.
    Rust uses typed arena, which is a memory pool. Quoting description:
    Ah, I see. I'm not much of a programmer so I didn't take a close look at the code. I'm very surprised by two things:


    1. Go's incredibly poor performance on this benchmark (>28 sec for its fastest entry)
    2. F-sharp's great performance: two F# entries (and C#) were better than Java's best, and F# consistently beats OCaml in most of the other benchmarks on that site. (The default comparison on the site for F# is OCaml, presumably because F# was based on or inspired by OCaml when it was created at Microsoft Research.)


    It may be possible to precompile F# at this point, but I don't think these were. C# and F# used to get poor treatment on the Benchmarks Game site because he only tests on Linux, and so they always had to run in Mono instead of their more optimal native VMs on Windows. I think this recent higher performance showing may be due to the release and evolution of the open-source .NET Core, since that's what the benchmark is using now instead of Mono.

    In fact both C# and F# beat Java now on most of the benchmarks on the site (though not by wide margins), which I think is a reversal from a couple of years ago on Mono. I'm still surprised, because I thought the JVM was legendarily optimized over many years, and I expected it would have to be more advanced and optimized than Microsoft's VM, especially on Linux, but that doesn't seem to be the case.

Page 1 of 2 12 LastLast

Similar Threads

  1. orz - an optimized ROLZ data-compressor written in rust
    By RichSelian in forum Data Compression
    Replies: 5
    Last Post: 27th February 2019, 08:43
  2. DEFLATE-ing popular Javascript libraries
    By stbrumme in forum Data Compression
    Replies: 1
    Last Post: 30th January 2016, 21:43
  3. Replies: 2
    Last Post: 18th April 2011, 04:13

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •