Search⌘ K
AI Features

Structured vs. Semi-Structured vs. Unstructured Data

Learn to differentiate structured data stored in relational databases, semi-structured data formats like JSON and XML, and unstructured data including images, videos, and text. Understand how these types impact data science workflows and challenges.

Structured Data

It comes with a predefined format and structure. Structured data is usually stored in relational databases. It is easy to deal with in the data science domain.

Examples

  • Here’s a simple table with columns of diverse types. It can be stored in any relational database, Excel file, etc.
Sepal_length Sepal_width Petal_length Petal_width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 versicolor
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 virginica

Semi-structured data

It comes with a predefined format and structure, but is not stored in the relational database.

Examples

  • JSON (JavaScript Object Notation)

  • XML (Extensible Markup Language)

Unstructured data

It lacks a specific format and structure. It is the type of data that poses many challenges in the data science domain.

Examples

  • Images:
  • Speech:
  • An email or article such as the following is an example of unstructured data:

“Quantum computing uses quantum-mechanical phenomena such as superposition and entanglement, to perform computation. A quantum computer performs such computation, which can be implemented theoretically or physically. There are currently two main approaches to physically implementing a quantum computer: analog and digital. Analog approaches are further divided into the quantum simulation, quantum annealing, and adiabatic quantum computation.”

  • Any webpage like the Educative home page.

  • Videos: Movie clips or YouTube videos are examples of unstructured datasets.