Complimentary to my earlier post on Apache Ignite in-memory file-system and caching capabilities I would like to cover the main differentiation points of the Ignite and Spark. I see questions like this coming up repeatedly. It is easier to have them answered, so you don't need to fish around the Net for the answers.
- The main different is, of course, that Ignite is an in-memory computing system, e.g. the one that treats RAM as the primary storage facility. Whereas others - Spark included - only use RAM for processing. The former, memory-first approach, is faster because the system can do better indexing, reduce the fetch time, avoid (de)serializations, etc.
- Ignite's mapreduce is fully compatible with Hadoop MR APIs which let everyone to simply reuse existing legacy MR code, yet run it with >30x
performance improvement. Check this short video demoing an Apache Bigtop in-memory stack, speeding up a legacy MapReduce code
- Also, unlike Spark's the streaming in Ignite isn't quantified by the size
of RDD. In other words, you don't need to form an RDD first before
processing it; you can actually do the real streaming. Which means there's no delays in a stream content processing in case of Ignite
- Spill-overs are a common issue for in-memory computing systems: after all memory is limited. In Spark where RDDs are immutable, if an RDD got created with its size > 1/2 node's RAM then a transformation and generation of the consequent RDD' will likely to fill all the node's memory. Which will cause the spill-over. Unless the new RDD is created on a different node. Tachyon was essentially an attempt to address it, using old RAMdrive tech. with all its limitations.
Ignite doesn't have this issue with data spill-overs as its caches can be updated in atomic or transactional manner. However, spill-overs are still possible: the strategies to deal with it are explained here
- as one of its components Ignite provides the first-class citizen
file-system caching layer. Note, I have
already addressed the differences between that and Ignite, but for some reason my post got deleted from their user list. I wonder why? ;)
- Ignite's uses off-heap memory to avoid GC pauses, etc. and does it highly
- Ignite guarantees strong consistency
- Ignite supports full SQL99 as one of the ways to process the data w/ full
support for ACID transactions
- Ignite supports in-memory SQL indexes functionality, which lets to avoid full-scans of data sets, directly leading to very significant performance improvements (also see the first paragraph)
- with Ignite a Java programmer shouldn't learn new ropes of Scala. The programming model also encourages the use of Groovy. And I
will withhold my professional opinion about the latter in order to keep
this post focused and civilized ;)
I can keep on rumbling for a long time, but you might consider reading this and that, where Nikita Ivanov - one of the founders of this project - has a good
reflection on other key differences. Also, if you like what you read - consider joining Apache Ignite (incubating) community and start contributing!
"The bank killed me"
5 months ago