Big Data

https://www.infoworld.com/article/2606657/open-source-software/119424-Bossie-Awards-2013-The-best-open-source-big-data-tools.html

Apache Bigtop (bundled ecosystem of software that is all intended to enable lots of software to work with Hadoop)

Apache Hadoop (cluster distributed-data framework, distributes data among node in a cluster, useful for data-intensive computing)

Apache Spark (cluster computing framework, performs computations on data, separate layer from Hadoop that can sit on top of Hadoop or can use some other cluster distributed-data framework; operates fast, designed to read data from cluster, perform operations, write results to cluster, all in one pass)

Apache MapReduce (similar to Spark, but operates differently - reads data from cluster, performs operation, writes results to cluster, reads updated data from cluster, performs operation, writes next results to cluster, etc.)

Apache Pig

Apache Hive

Apache HBase

Apache Mahout (general machine learning engine, like R but for big data sets; does not implement comprehensive ML algorithms; check Apache Spark MLlib for algorithms not implemented by Mahout)

Cassandra (distributed NoSQL database)

Apache TinkerPop/Gremlin (Apache TinkerPop and Gremlin are to graph databases what the JDBC and SQL are to relational databases)

MongoDB (NoSQL document-based database)

Apache CloudStack (software that enables management/deployment of large numbers of nodes or virtual machines; basically, this is the back-end software used to run a cloud service provider)

Apache Sqoop

Talend

Apache Hama

Cloudera Impala

Apache Drill

Gephi

Neo4j

Couchbase

Paradigm4 SciDB

Big Data

From charlesreid1