Structured vs. Semi-Structured vs. Unstructured Data
Learn to differentiate structured data stored in relational databases, semi-structured data formats like JSON and XML, and unstructured data including images, videos, and text. Understand how these types impact data science workflows and challenges.
We'll cover the following...
Structured Data
It comes with a predefined format and structure. Structured data is usually stored in relational databases. It is easy to deal with in the data science domain.
Examples
- Here’s a simple table with columns of diverse types. It can be stored in any relational database, Excel file, etc.
| Sepal_length | Sepal_width | Petal_length | Petal_width | Species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | versicolor |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | virginica |
Semi-structured data
It comes with a predefined format and structure, but is not stored in the relational database.
Examples
- JSON (JavaScript Object Notation)
- XML (Extensible Markup Language)
Unstructured data
It lacks a specific format and structure. It is the type of data that poses many challenges in the data science domain.
Examples
- Images:
- Speech:
- An email or article such as the following is an example of unstructured data:
“Quantum computing uses quantum-mechanical phenomena such as superposition and entanglement, to perform computation. A quantum computer performs such computation, which can be implemented theoretically or physically. There are currently two main approaches to physically implementing a quantum computer: analog and digital. Analog approaches are further divided into the quantum simulation, quantum annealing, and adiabatic quantum computation.”
-
Any webpage like the Educative home page.
-
Videos: Movie clips or YouTube videos are examples of unstructured datasets.