Big Data
From charlesreid1
Apache Software
Apache Bigtop (bundled ecosystem of software that is all intended to enable lots of software to work with Hadoop)
Apache Hadoop (cluster distributed-data framework, distributes data among node in a cluster, useful for data-intensive computing)
Apache Spark (cluster computing framework, performs computations on data, separate layer from Hadoop that can sit on top of Hadoop or can use some other cluster distributed-data framework; operates fast, designed to read data from cluster, perform operations, write results to cluster, all in one pass; useful for real-time analytics that need to be fast, or perform multiple operations, like most machine learning algorithms do)
Apache MapReduce (similar to Spark, but operates differently - reads data from cluster, performs operation, writes results to cluster, reads updated data from cluster, performs operation, writes next results to cluster, etc.)
Apache Pig
Apache Hive
Apache HBase
Apache Mahout (general machine learning engine, like R but for big data sets; does not implement comprehensive ML algorithms; check Apache Spark MLlib for algorithms not implemented by Mahout)
Cassandra (distributed NoSQL database)
Apache TinkerPop/Gremlin (Apache TinkerPop and Gremlin are to graph databases what the JDBC and SQL are to relational databases)
MongoDB (NoSQL document-based database)
Apache CloudStack (software that enables management/deployment of large numbers of nodes or virtual machines; basically, this is the back-end software used to run a cloud service provider)
Apache Sqoop (SQL to Hadoop HDFS tool)
Apache Drill (equivalent to Google BigQuery; supports non-relational datastores, and can stitch together different datastores like Amazon S3, HBase, MongoDB, Azure Blobs, etc; implements a JSON-based document model)
Apache Hama (short for Hadoop-Matrix, distributed computing framework based on bulk synchronous parallel computing for scientific computations, mainly matrix/graph/network algorithms; with Hama, unlike Hadoop, bandwidth is a major bottleneck and communications overhead is accounted in algorithms and algorithm costs)
Other Software
Talend
Cloudera Impala
Gephi
Neo4j
Couchbase
Paradigm4 SciDB