Tag: data/engineering

25 items with this tag.

Jul 13, 2024
How do you Implement Chaos Engineering In Any Data Projects?
Are there any learnings from Chaos Engineering which can be used in any of the data projects like Data Engineering, Data Science etc. Can we use Chaos Engineering to fool-proof your end-to-end project?
May 01, 2024
Big Data
Big data refers to extremely large and complex datasets that require advanced tools and techniques for storage, processing, and analysis.
- 🌱seedling
- data/engineering
May 01, 2024
Data Quality
Process to ensure that data is accurate, complete, reliable, and fit for its intended purpose throughout its lifecycle.
May 01, 2024
Data Validation
Data validation ensures the accuracy and quality of data by checking its compliance with defined rules and constraints before processing or storing it.
- 🌱seedling
- data/engineering
May 01, 2024
Databricks
Databricks is a cloud-based platform that provides a unified environment for big data analytics and machine learning, built on Apache Spark.
- 🌱seedling
- data/engineering
Apr 30, 2024
MESOS
An open-source cluster manager that abstracts resources across a cluster of machines, enabling efficient resource allocation and management for distributed applications
- 🌱seedling
- data/engineering
Apr 30, 2024
Yet Another Resource Negotiator (YARN)
Yet Another Resource Negotiator (YARN) is a resource management and job scheduling framework used in Apache Hadoop for managing resources and running distributed applications on a cluster of machines.
- 🌱seedling
- data/engineering
Apr 29, 2024
Apache Arrow
An open-source framework designed for high-performance columnar data processing and efficient data interchange between systems.
- 🌱seedling
- data/engineering
Apr 29, 2024
Apache Avro
A data serialization system that provides compact, fast binary data format and rich data structures for serializing, transporting, and storing data in a language-neutral way.
- 🌱seedling
- data/engineering
Apr 29, 2024
Apache ORC
A highly efficient and optimized columnar storage file format used in the Hadoop ecosystem to improve performance in big data processing.
- 🌱seedling
- data/engineering
Apr 29, 2024
Change Data Capture
Change Data Capture (CDC) is a method used to automatically track and capture changes in data in a database, enabling real-time data integration and analysis.
Apr 29, 2024
Data Lake
A data lake is a centralized repository that stores large volumes of raw and unstructured data in its native format, enabling organizations to store diverse data types at scale and perform advanced analytics, machine learning, and other data processing tasks for insights and decision-making.
- 🌱seedling
- data/engineering
Apr 29, 2024
Data Mart
A data mart is a specialized subset of a data warehouse that focuses on specific business functions or departments, containing structured data optimized for analysis and reporting to support decision-making within those areas.
Apr 29, 2024
Data Mesh
Data mesh is an architectural paradigm that advocates for a decentralized approach to data management, where data ownership, access, and governance are distributed across different domain-oriented teams, enabling scalability, flexibility, and agility in managing and leveraging data assets within organizations.
Apr 29, 2024
Iceberg Table
Iceberg tables are a high-performance, open table format for large analytic datasets that support complex data management and enable ACID transactions.
- 🌱seedling
- data/engineering
Apr 29, 2024
Parquet
A columnar storage file format designed for efficient data processing, optimized for use with big data processing frameworks like Apache Spark and Apache Hadoop.
- 🌱seedling
- data/engineering
Apr 29, 2024
Reverse ETL
The process of extracting data from a data warehouse and loading it into operational systems, enabling organizations to leverage analytical insights in day-to-day operations.
- 🌱seedling
- data/engineering
Apr 29, 2024
Slowly Changing Dimension (SCD)
A concept in data warehousing that refer to how data in a database changes over time while preserving historical information.
- 🌱seedling
- data/engineering
Apr 28, 2024
Apache Spark
A powerful open-source unified analytics engine for large-scale data processing and machine learning, designed to handle both batch and streaming data efficiently.
- 🌱seedling
- data/engineering
Apr 27, 2024
Data Engineering
Data engineering involves designing, building, and maintaining the infrastructure and systems that enable the acquisition, storage, processing, and analysis of data at scale, ensuring data quality, reliability, and accessibility for downstream analytics and applications.
Apr 27, 2024
Data Lakehouse
A data lakehouse combines the benefits of a data lake (scalability, flexibility, and cost-effectiveness for storing raw and unstructured data) with those of a data warehouse (structured querying, transactional integrity, and performance optimizations), providing a unified platform for both operational and analytical workloads in modern data architectures.
- 🌱seedling
- data/engineering
Apr 27, 2024
Data Pipelines
A data pipeline is a series of processes that automate the flow of data from source systems to storage or analytical tools.
- 🌱seedling
- data/engineering
Aug 07, 2023
Goals of a Data Warehouse, Business Intelligence System or a Data Lakehouse
Learn about the inception of our unique framework, designed to streamline and democratize the Data Engineering process. Understand how this innovation in Data Engineering has enhanced our development workflow, promoting efficiency and collaboration. However, innovation isn't without its challenge.
Aug 07, 2023
Building a Low-Code/No-Code Data Engineering Framework - Part II
Learn about the inception of our unique framework, designed to streamline and democratize the data engineering process. Understand how this innovation in data engineering has enhanced our development workflow, promoting efficiency and collaboration. However, innovation isn't without its challenges.
Jul 30, 2023
Building a Low-Code/No-Code Data Engineering Framework - Part I
Explore the transformative potential of Low-Code/No-Code Data Engineering in this detailed blog post. Learn about the inception of our unique framework, designed to streamline and democratize the Data Engineering process. Understand how this innovation in Data Engineering has enhanced our development workflow, promoting efficiency and collaboration. However, innovation isn't without its challenges.

Rituraj's Digital Garden

Recent

Tag: data/engineering

How do you Implement Chaos Engineering In Any Data Projects?

Big Data

Data Quality

Data Validation

Databricks

MESOS

Yet Another Resource Negotiator (YARN)

Apache Arrow

Apache Avro

Apache ORC

Change Data Capture

Data Lake

Data Mart

Data Mesh

Iceberg Table

Parquet

Reverse ETL

Slowly Changing Dimension (SCD)

Apache Spark

Data Engineering

Data Lakehouse

Data Pipelines

Goals of a Data Warehouse, Business Intelligence System or a Data Lakehouse

Building a Low-Code/No-Code Data Engineering Framework - Part II

Building a Low-Code/No-Code Data Engineering Framework - Part I