Skip to main content

Frequently Asked Questions

ODP (Open Data Platform) is an open-source Big Data solution that offers a set of tools to store, analyze, and visualize big data. It aims to simplify data management using big data technologies based on the Apache Hadoop ecosystem.

CLEMLAB is a company created to distribute and promote ODP. It allows users to easily access scalable and 100% open-source big data technologies.

  • 1. Download ODP from the official repositories
  • 2. Configure the required settings (system, storage, network).
  • 3. Install Apache Ambari on the nodes where you want to install ODP.
  • 4. Install the necessary components, Hadoop, Spark, Ranger via Ambari's UI or REST API.
  • 5. Start the services and check the status of the components.

ODP follows a semi-annual release cycle. Each new version brings performance improvements, bug fixes, and new features based on community feedback.

Yes, ODP is 100% open source. Its code is freely available and can be used, modified, and redistributed under the terms of the Apache 2.0 license.

To contribute to the Open Source Big Data distribution, you can:

  • Submit issues or proposals via GitHub.
  • Share feedback or ideas on community forums.
  • Submit issues or proposals via GitHub.
  • Share feedback or ideas on community forums.
  • Create pull requests to improve the code or documentation.

The source code of ODP components is available on GitHub. You can access it at the following address: https://github.com/clemlabprojects/hive-odp-release.

ODP provides Big Data components based on the Apache Hadoop ecosystem such as:

  • Hadoop for distributed storage
  • Spark for fast data processing.
  • Kafka for real-time streaming.
  • Hive for SQL queries on large datasets.
  • Nifi for data flow automation.
  • Flink for stream processing.
  • Ambari for cluster management.
  • Atlas for data governance.
  • Ozone for scalable object storage.

Yes, ODP supports creating an open datalake house by combining Hadoop's data storage capabilities with Spark's data processing capabilities and Hive's data management capabilities. This allows for efficient and flexible management and analysis of large amounts of data. ODP 1.2.40 is compiled with Iceberg in Spark, Kafka, and Flink. ODP 1.2 comes with Apache Hive 3.1.3, but Hive is not yet compatible with Iceberg.