1. Hadoop

It would be impossible to talk about open source data analytics without mentioning Hadoop. This Apache Foundation project has become nearly synonymous with big data, and it enables large-scale distributed processing of extremely large data sets. A survey conducted by TDWI and SAS found that nearly 60 percent of enterprises expected to have Hadoop clusters in production by the end of 2016.

However, it should be noted that Hadoop on its own doesn't enable data analytics. It's usually part of a larger solution for gathering insights from big data.

2. Spark

Also an Apache project, Spark promises fast big data processing. In fact, it claims to "run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk." As a result of this fast performance, it is often used to analyze streaming data or in applications that require interactive analysis capabilities. Companies frequently use it alongside Hadoop or Mesos although it can also run on its own. It has recently experienced a dramatic rise in popularity, and a 2016 survey conducted by Syncsort found that nearly 70 percent of enterprise big data staffers surveyed were interested in Spark.

3. Talend

Unlike the first two projects in this slideshow, Talend is managed by a for-profit company rather than a foundation. As a result, paid support is available. Talend offers a mix of free and paid products. Its free, open source solution is called Talend Open Studio, and it has been downloaded more than 2 million times.

Market research firm Gartner recently named Talend a "Leader" in data integration. The company boasts that it can help enterprises analyze their big data five times faster and at one-fifth the cost compared to competing solutions.

4. Jaspersoft

Like Talend, Jaspersoft comes in multiple editions both free and paid. Its Community edition is free and open source while the Reporting, AWS, Professional and Enterprise editions require a fee but come with support included.

Jaspersoft is an open source business intelligence tool that aims to allow business users to self-serve their own needs. The company claims that its technology powers more than 130,000 apps with embedded BI capabilities.

5. Pentaho

Pentaho describes itself as a "comprehensive data integration and business analytics platform." The company primarily promotes the commercial versions of its software, which are based on the open source Community version. Companies can use it alongside tools like Hadoop and Spark to enable reporting and visualizations for their big data. This software boasts a long list of well-known customers that includes BT, Caterpillar, Nasdaq, The U.S. Dept. of Homeland Security, NOAA, The New York Times, EMC and many others.

6. RapidMiner

RapidMiner claims to be the "#1 open source data science platform," and Gartner named it a leader in its Magic Quadrant report for advanced analytics. It enables self-service predictive analytics and promises lightning-fast performance. Its users include BMW, Lufthansa, Domino's Pizza, Sony, Ford, Salesforce, Amnesty International and GE.The complete RadiMiner Platform includes three separate pieces: RapidMiner Studio, RapidMiner Server and RapidMiner Radoop. All three are available under open source or commercial licenses, and commercial prices depend on the number of users.

7. Storm

Used by companies like Yahoo, Twitter, Spotify, Yahoo, Yelp, Flipboard and Groupon, Apache Storm is a real-time big data processing engine. Its website explains, "Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing." Customers can use it with any database and any programming language. It's scalable, fault-tolerant and easy to deploy. Users should note however, that Storm has not yet reached the 1.0 release level.

8. H2O

Used by more than 60,000 data scientists at more than 7,000 organizations, H2O claims to be "the world's leading open source machine learning platform." Thanks to its in-memory technology, it offers extremely fast performance. It also integrates with many other open source data analytics tools like Hadoop and Spark, and it supports all of the most popular databases. Paid support is available.

In addition to the standard version of H2O, the company also offers Sparkling Water, a version that incorporates Spark, and Steam, and end-to-end artificial intelligence application engine.

9. Lumify

Created by a company called Altamira Technologies, Lumify describes itself as an "open source big data analysis and visualization platform." It makes it easy to create 2D or 3D graphs that show the relationship between entities or to overlay data on maps. For those who are interested in learning more about how it works, the website offers several videos that show Lumify in action, and it also has a demo site that allows users to upload their own data and try out the software.

10. Drill

Apache Drill allows users to use SQL queries for non-relational data storage systems. It supports a range of NoSQL and cloud-based data storage systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage and Swift. It also allows users to search through multiple datasets stored with different technologies using a single query. In addition, it supports many popular BI tools.

11. MongoDB

One of the best-known NoSQL databases, MongoDB is an open-source non-relational data storage solution. Its customers include MetLife, the city of Chicago, Expedia, Google, The Weather Channel, BuzzFeed and Facebook. In addition to the free open source version, the company also offers a paid Enterprise version and MongoDB Atlas, a cloud-hosted version. Forrester has named MongoDB a "Leader" for big data NoSQL.

12. SpagoBI

SpagoBI is an open source business intelligence and big data analytics platform. The software is completely free, but paid user support, maintenance, consulting and training are available for purchase. It includes tools for reporting, multidimensional analysis (OLAP), charts, location intelligence, data mining, ETL and more. It also integrates with popular in-memory processing engines and enables real-time processing.

12 Top Open Source Data Analytics Apps的更多相关文章

  1. Toward Scalable Systems for Big Data Analytics: A Technology Tutorial (I - III)

    ABSTRACT Recent technological advancement have led to a deluge of data from distinctive domains (e.g ...

  2. Big Data Analytics for Security(Big Data Analytics for Security Intelligence)

    http://www.infoq.com/articles/bigdata-analytics-for-security This article first appeared in the IEEE ...

  3. 35 Top Open Source Companies

    https://www.datamation.com/open-source/35-top-open-source-companies-1.html If you think of open sour ...

  4. IAB303 Data Analytics Assessment Task

    Assessment TaskIAB303 Data Analyticsfor Business InsightSemester I 2019Assessment 2 – Data Analytics ...

  5. 解决Cannot find config.m4 Make sure that you run '/home/php/bin/phpize' in the top level source directory of the module

    oot@DK:/home/daokr/downfile/php-7.0.0/ext/mysqlnd# /home/php/bin/phpizeCannot find config.m4. Make s ...

  6. 【转】12 TOP Command Examples in Linux

    12个top命令 1. # top 2. # top,后输入shift+O,在“Current Sort Field:”中选左边的field对应的字母进行排序. 3. # top -u tecmint ...

  7. Top Open Source Projects to Watch in 2017

    https://opensource.com/article/16/12/yearbook-projects-watch-2017 No one has a crystal ball to see t ...

  8. 12.2 中的Data Guard Standby 密码文件自动同步 (Doc ID 2307365.1)

    Data Guard Standby Automatic Password file Synchronization in 12.2 (Doc ID 2307365.1) APPLIES TO: Or ...

  9. Flink-v1.12官方网站翻译-P008-Streaming Analytics

    流式分析 事件时间和水印 介绍 Flink明确支持三种不同的时间概念. 事件时间:事件发生的时间,由产生(或存储)该事件的设备记录的时间 摄取时间:Flink在摄取事件时记录的时间戳. 处理时间:您的 ...

随机推荐

  1. FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)

    自从继承了hibernate ,全都是些奇葩问题. 努力解决中,先发布,以备忘

  2. FreeMarker 获取页面appplication、request、session

    使用Request里的Attribute值最简单的方法就是直接${AttributeName}或者安全一点:${AttributeName!"default Value"} 1.取 ...

  3. 洛谷P2426 删数 [2017年4月计划 动态规划12]

    P2426 删数 题目描述 有N个不同的正整数数x1, x2, ... xN 排成一排,我们可以从左边或右边去掉连续的i(1≤i≤n)个数(只能从两边删除数),剩下N-i个数,再把剩下的数按以上操作处 ...

  4. 【react】react-bookManager

    作者可能是本意想要做一个图书管理系统,不过添加书籍的时候报错,所以简单的页面我们简单的看看 先上github地址:https://github.com/hesisi/react-bookManager ...

  5. phpBOM头(字符)出现的原因以及解决方法_PHP程序员博客|高蒙个人博客

    今天在项目中发现,客户端在使用ajax得到返回值时,无法匹配字符串.总是报错,打开页面接口发现,页面的头部出现了的字符(BOM头),找到问题了,那么直接用代码清除掉即可. php隐形字符 // 如 ...

  6. truncate 、delete、drop的区别

    TRUNCATE TABLE 在功能上与不带 Where 子句的 Delete 语句相同:二者均删除表中的全部行.但 TRUNCATE TABLE 比 Delete 速度快,且使用的系统和事务日志资源 ...

  7. 【JZOJ5071】【GDSOI2017第二轮模拟】奶酪 树形dp

    题面 CJY很喜欢吃奶酪,于是YJC弄到了一些奶酪,现在YJC决定和CJY分享奶酪. YJC弄到了n-1块奶酪,于是他把奶酪挂在了一棵n个结点的树上,每根树枝上挂一块奶酪,每块奶酪都有重量. YJC和 ...

  8. python3.7的celery报错TypeError: wrap_socket() got an unexpected keyword argument '_context'

    原启动方法为: 起执行任务的服务 elery worker -A celery_task -l info -P eventlet 起提交任务的服务 celery beat -A celery_task ...

  9. 【水滴石穿】rnTest

    其实就是一个小的demo,不过代码分的挺精巧的 先放地址:https://github.com/linchengzzz/rnTest 来看看效果 确实没有什么可以说的,不过代码部分还行 先入口文件 / ...

  10. 阿里云王广芳:5G时代,我们需要怎样的边缘计算?

    7月24日阿里云峰会开发者大会的IT基础设施云化专场中,阿里云边缘计算高级技术专家王广芳进行了边缘节点服务重大升级发布,同时与现场观众一同探讨了5G时代边缘计算的思考与技术实践. 5G时代,我们需要怎 ...