From charlesreid1

No edit summary
No edit summary
Line 1: Line 1:
https://www.infoworld.com/article/2606657/open-source-software/119424-Bossie-Awards-2013-The-best-open-source-big-data-tools.html
https://www.infoworld.com/article/2606657/open-source-software/119424-Bossie-Awards-2013-The-best-open-source-big-data-tools.html


Apache Hadoop
Apache Bigtop (bundled ecosystem of software that is all intended to enable lots of software to work with Hadoop)
 
Apache Hadoop (cluster distributed-data framework, distributes data among node in a cluster, useful for data-intensive computing)
 
Apache Spark (cluster computing framework, performs computations on data, separate layer from Hadoop that can sit on top of Hadoop or can use some other cluster distributed-data framework; operates fast, designed to read data from cluster, perform operations, write results to cluster, all in one pass)
 
Apache MapReduce (similar to Spark, but operates differently - reads data from cluster, performs operation, writes results to cluster, reads updated data from cluster, performs operation, writes next results to cluster, etc.)
 
Apache Pig
 
Apache Hive
 
Apache HBase
 
Apache Mahout (general machine learning engine, like R but for big data sets; does not implement comprehensive ML algorithms; check Apache Spark MLlib for algorithms not implemented by Mahout)
 
Cassandra (distributed NoSQL database)
 
Apache TinkerPop/Gremlin (Apache TinkerPop and Gremlin are to graph databases what the JDBC and SQL are to relational databases)
 
MongoDB (NoSQL document-based database)
 
Apache CloudStack (software that enables management/deployment of large numbers of nodes or virtual machines; basically, this is the back-end software used to run a cloud service provider)


Apache Sqoop
Apache Sqoop
Line 16: Line 38:


Neo4j
Neo4j
MongoDB


Couchbase
Couchbase


Paradigm4 SciDB
Paradigm4 SciDB
Cassandra
Apache Spark
Apache TinkerPop/Gremlin (Apache TinkerPop and Gremlin are to graph databases what the JDBC and SQL are to relational databases)
Apache CloudStack (software that enables management/deployment of large numbers of nodes or virtual machines; basically, this is the back-end software used to run a cloud service provider)

Revision as of 20:30, 15 September 2017

https://www.infoworld.com/article/2606657/open-source-software/119424-Bossie-Awards-2013-The-best-open-source-big-data-tools.html

Apache Bigtop (bundled ecosystem of software that is all intended to enable lots of software to work with Hadoop)

Apache Hadoop (cluster distributed-data framework, distributes data among node in a cluster, useful for data-intensive computing)

Apache Spark (cluster computing framework, performs computations on data, separate layer from Hadoop that can sit on top of Hadoop or can use some other cluster distributed-data framework; operates fast, designed to read data from cluster, perform operations, write results to cluster, all in one pass)

Apache MapReduce (similar to Spark, but operates differently - reads data from cluster, performs operation, writes results to cluster, reads updated data from cluster, performs operation, writes next results to cluster, etc.)

Apache Pig

Apache Hive

Apache HBase

Apache Mahout (general machine learning engine, like R but for big data sets; does not implement comprehensive ML algorithms; check Apache Spark MLlib for algorithms not implemented by Mahout)

Cassandra (distributed NoSQL database)

Apache TinkerPop/Gremlin (Apache TinkerPop and Gremlin are to graph databases what the JDBC and SQL are to relational databases)

MongoDB (NoSQL document-based database)

Apache CloudStack (software that enables management/deployment of large numbers of nodes or virtual machines; basically, this is the back-end software used to run a cloud service provider)

Apache Sqoop

Talend

Apache Hama

Cloudera Impala

Apache Drill

Gephi

Neo4j

Couchbase

Paradigm4 SciDB