Connecting HDFS
This guide walks you through connecting an HDFS (Hadoop Distributed File System) storage destination to Pluton.
Prerequisites
Before connecting HDFS, you need:
- A running Hadoop cluster with HDFS accessible from the Pluton server
- A valid Hadoop user account with read/write access to the target HDFS directories
- The namenode address(es) and port(s) for your cluster
- If using Kerberos authentication: a properly configured Kerberos environment and service principal
Getting Your HDFS Credentials
Step 1: Find the Namenode Address
The namenode is the master server in your Hadoop cluster that manages the filesystem metadata. You can find its address in your Hadoop configuration:
- On your Hadoop cluster, open the
core-site.xmlconfiguration file (typically located at/etc/hadoop/conf/core-site.xmlor$HADOOP_HOME/etc/hadoop/core-site.xml) - Look for the
fs.defaultFSproperty — it contains the namenode address:<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode-host:8020</value>
</property> - The default namenode port is 8020. Note down the hostname and port.
For High Availability (HA) clusters with multiple namenodes, check hdfs-site.xml for the dfs.namenode.rpc-address properties to get all namenode addresses.
Step 2: Verify Your Hadoop User
- Confirm you have a valid Hadoop user account that can access HDFS:
hdfs dfs -ls / - If your cluster uses simple authentication, the username is typically your OS user or can be set via the
HADOOP_USER_NAMEenvironment variable. - Ensure your user has the necessary permissions on the directories you plan to use for backups.
Step 3: Kerberos Configuration (If Applicable)
If your Hadoop cluster uses Kerberos authentication:
- Verify you have a valid Kerberos ticket:
klist - Note down the service principal name for the namenode (commonly
nn/_HOST@REALMorhdfs/_HOST@REALM) - Ensure the Kerberos client is configured on the machine running Pluton
Connecting to Pluton
Step 1: Add Storage
- In Pluton, navigate to Storages
- Click Add Storage button
- Select HDFS from the provider list
Step 2: Configure Connection
Fill in the required fields:
| Field | Description | Example |
|---|---|---|
| Namenode Hosts | Comma-separated list of namenode hostnames and ports | namenode-1:8020 or namenode-1:8020,namenode-2:8020 |
| Username | Your Hadoop user name | hdfs or backupuser |
| Description | A friendly name to identify this storage in Pluton | Production HDFS Cluster |
- For a single namenode, enter one host and port (e.g.,
namenode:8020) - For an HA cluster, list all namenodes separated by commas (e.g.,
namenode-1:8020,namenode-2:8020)
Step 3: Advanced Options (Optional)
If your cluster uses Kerberos authentication, configure these additional settings:
| Field | Description | Example |
|---|---|---|
| Service Principal Name | Kerberos service principal name for the namenode | nn/[email protected] |
| Data Transfer Protection | Kerberos data transfer protection level | authentication, integrity, or privacy |
| Encoding | Backend encoding for special characters | Leave default unless needed |
Data Transfer Protection levels:
- authentication — Verifies identities only (fastest)
- integrity — Verifies identities and ensures data is not tampered with
- privacy — Verifies identities, ensures integrity, and encrypts all data transfers (most secure, slowest)
Step 4: Test and Save
- Click Test Connection to verify your HDFS credentials and connectivity
- If successful, click Save
- Your HDFS storage is now ready for backup plans
Common Issues
Connection Refused
- Verify the namenode hostname and port are correct
- Ensure the namenode is running:
hdfs dfsadmin -report - Check that the Pluton server can reach the namenode over the network (port 8020 by default)
- Confirm there are no firewall rules blocking the connection
Permission Denied
- Verify the username has read/write access to the target HDFS directory
- Check HDFS permissions:
hdfs dfs -ls -d /path/to/directory - If using Kerberos, ensure the ticket is valid and not expired
Kerberos Authentication Failures
- Confirm the Kerberos client is properly configured on the Pluton server
- Verify the service principal name matches your cluster configuration
- Check that the Kerberos ticket has not expired (
klist) - Ensure the system clock is synchronized — Kerberos is sensitive to clock skew
HA Failover Issues
- List all namenode addresses in the Namenode Hosts field
- Ensure both active and standby namenodes are reachable from the Pluton server
Best Practices
- Use a dedicated Hadoop user for Pluton backups rather than sharing a general-purpose account. This makes it easier to manage permissions and audit backup activity.
- Store backups in a dedicated HDFS directory (e.g.,
/backups/pluton) to keep them organized and separate from other data. - Set appropriate HDFS replication on your backup directories to balance durability and storage cost.
- Use Kerberos authentication in production environments for secure access to your cluster.
- Choose the right data transfer protection level — use
privacyfor sensitive data, orauthenticationwhen performance is a priority. - Monitor HDFS capacity regularly. Use the Pluton storage dashboard or
hdfs dfs -df -hto check available space. - Note HDFS limitations — HDFS does not support server-side moves or erasure coding through Pluton. Time accuracy is limited to 1 second.