Advantages and Disadvantages of Spark

Apache Spark is one of the popular tools for Real-time data processing in Data world. Technology is shifting from ancient Map-reduce to Apache Spark because of its high speed. Apache Spark is up to 100 times faster than Map-reduce which becomes the main reason for many users to move towards Spark. But there are some disadvantages also. Let’s check out Advantages of Spark first:

Advantages of Spark:

  • Spark is fast in data processing as it has In-memory computation technology whereas Map-reduce has Disk computation technology. Spark contains all its data in memory for all the computations so the number of Input/output operations is less in the case of Spark which enormously┬áincreases the processing speed of the tool.
  • Spark support many languages in which user can write spark application. Spark supports Java, Python and Scala. All languages have a vast community so it can be easily adaptable to many users.
  • Spark can easily integrate with almost all Big data technologies which make it best tool for Hadoop ecosystem.
  • It is fault tolerant.
  • It is easily scalable.
  • Spark provides 128-bit encryption and SSL support for its network.

Connect Spark with Kafka for real-time processing

Along with advantages, every tool has disadvantages too. Here are some disadvantages of spark:

Disadvantages of Spark:

  • As I already told you Spark supports in-memory computation which makes it fast but it makes it very costly. As it stores all data in memory so it needs large memory to store all the data which makes hardware very costly. This is the main problem of Apache Spark.
  • Spark is not real-time but it is near real-time tool. It processes data in the batch which we can make it as small as 1 second. This small batch makes it near real-time but it is not real-time processing tool.
  • Spark doesn’t have its own file system. It uses the file system of other technology like HDFS, Hive, etc.
  • Spark doesn’t support record based window operation but it supports time-based window operation. It means you cannot integrate data according to records but you can integrate data according to time (number of seconds).

Suggestions BY Google

Be the first to comment

Leave a Reply

Your email address will not be published.