Software Defined Storage

Gluster Overview

DaehanCNI 2020. 8. 10. 13:38

Overview

  • What is GlusterFS?

  • Terms

  • GlusterFS Volume Type 

  • Accessing Data - Client

  • Other Feature

 

What is GlusterFS?

  • Distributed scalable network filesystem

  • Automatic failover

  • Without a centralized metadata server – Fast file access

  • No Hot-Spots/Bottlenecks – Elastic Hashing Algorithm

  • XFS file system format to store data (ext3, ext4, ZFS, btrfs)

  • NFS, SMB/CIFS, GlusterFS(POSIX compatible)

  • How Does GlusterFS Work Without Metadata

    • All storage nodes have an algorithm built-in

    • All native clients have an algorithm built-in

    • Files are placed on a brick(s) in the cluster based on a calculation

    • Files can then be retrieved based on the same calculation

    • For non-native clients, the server handles retrieval and placement

 

Term

Elastic Hashing Algorithm

Every folder in a volume is assigned a equal segment of the 32bit number space across bricks.
For example, each folder across 12 bricks, 

 

brick1 = 0- 357913941
brick2 = 357913942 – 715827883
brick3 = 715827884 – 1073741823
…brick12 = 3937053354 – 4294967295


Every folder and sub-folder gets the full range, 0 – 4294967295.

The EHA hashes the name of the file being read | written.
ex) ………\GlusterFSrules.txt = 815827884


We use the Davies-Meyer hash.

The file is read | written on the brick that matches the path and filename hash.
The avalanche effect of a hashing algorithm prevents hotspots.
Regardless of the similarities between filenames/path the hash result is sufficiently different.
Additional, redundant steps are taken to prevent hotspots.
Adding a brick to a volume updates the EHA graph on each node.
Running a volume rebalance after adding a brick physically moves files to match the new EHA graph. Adding a brick (GlusterFS volume add brick) without running a rebalance results in new files being written to the new bricks and link files being created on first access for existing files. This can be slow and isn’t recommended.

Brick

A brick is the combination of a node and a file system.
ex) Hostname: /Directoryname


Each brick inherits limits of the underlying filesystem(ext3/ext4/xfs)
No limit to the number bricks per node.
Gluster operates at the brick level, not at the node level.
Ideally each brick in a cluster should be the same size.

 

Cluster

 A set of Gluster nodes in a trusted pool

Trusted Pool

 Storage nodes that are peers and associated in a single cluster.

Node

 A single Gluster node in a cluster.

Self Heal

 Self-heal is process of self-correcting mechanism built inside GlusterFS. Self-heal process is initiated when ‘client’ detects discrepancies in directory structure, directory metadata, file metadata, file sizes etc. This detection process is initiated during the first access of such a corrupted directory or files.

Split Brain

 This is a scenario which happens in a replicated volume when there is a network partition from the client perspective. Resulting in wrong attributes on files which are essential for ‘Gluster’ for its volume consistency and making files available This results in a manual intervention by user/customer to find the old copy and remove it from the back end. Gluster will then perform a “self heal” to sync both the copies.

 

GlusterFS volume Type

Distributed

No data redundancy
Failure of a brick results in data access issues
Distributes files across bricks in the volume
Cuts hardware, software costs in half.
Failure of a brick or node results in loss of access to the data on those bricks.
Writes destined to the failed brick will fail.
Redundant RAID, hardware is strongly recommended.

 

Replicated / Distributed Replicated

Redundant at the brick level through synchronous writes
High availability
N replicas are supported
Replicates files across bricks in the volume
Failure of a brick or node does not affect I/O.
Failure of a brick or node results in loss of access to the data on those bricks.
Writes destined to the failed brick will success.

 

Example of a Two-way Replicated Volume

 

Example of a Two-way Distributed Replicated Volume

 

Arbitrated Replicated

High availability and less disk space required
N replicas are supported
Similar to a two-way replicated volume, in that it contains two full copies of the files in the volume. However, volume has an extra arbiter brick for every two data bricks in the volume
Arbiter bricks do not store file data, only store file names, structure, and metadata.
Arbiter bricks use client quorum to compare metadata on the arbiter with the metadata of the other nodes to ensure consistency in the volume and prevent split-brain conditions

 

Example of a dedicated configuration

 

Example of a chained configuration

 

Dispersed / Distributed Dispersed

Erasure Coding(EC)
Limited use case(scratch space, very large files, some HPC)
Problems with small files
Disperses the file’s data across the bricks in the volume
Based on erasure coding.
This allows the recovery of the data stored on one or more bricks in case of failure.
n = k + m (n= total number of bricks, k=require bricks, m=out of bricks for recovery)
requires less storage space when compared to a replicated volume

 

Example of a Dispersed Volume
Example of a Distributed Dispersed Volume

 

Accessing Data – Client

Native Client

FUSE-based client running in user space.

NFS

NFS ACL v3 is supported

SMB

The Server Message Block (SMB) protocol can be used to access Red Hat Gluster Storage volumes by exporting directories in GlusterFS volumes as SMB shares on the server

 

 

 

Other feature

Geo-replication

Geo-replication provides a distributed, continuous, asynchronous, and incremental replication service from one site to another over Local Area Networks (LANs), Wide Area Networks (WANs), and the Internet

 

 

Managing Tiering

Hot tier, cold tier

 

 

Security

Enabling Management Encryption
Enabling I/O encrypt ion for a Volume
Disk Encryption
Set up SELinux

 

Managing Containerized Red Hat Gluster Storage

Used docker

 

Managing Directory Quotas

Set limits on disk space used by directories or the volume
Supported hardlimit, softlimit

 

Snapshot

Enables you to create point-in-time copies
Users can directly access Snapshot copies which are read-only to recover from accidental deletion, corruption, or modification of the data.

'Software Defined Storage' 카테고리의 다른 글

CEPH-Cluster 확장을 위한 이유와 시기에 대한 고찰  (0) 2020.08.17
Ceph Use Case  (0) 2020.08.13
Ceph Overview  (0) 2020.08.13
Gluster Use Case  (0) 2020.08.12
Why choose Software Defined Storage?  (0) 2020.08.04