Skip to main content

Connecting HDFS

This guide walks you through connecting an HDFS (Hadoop Distributed File System) storage destination to Pluton.

Prerequisites

Before connecting HDFS, you need:

  1. A running Hadoop cluster with HDFS accessible from the Pluton server
  2. A valid Hadoop user account with read/write access to the target HDFS directories
  3. The namenode address(es) and port(s) for your cluster
  4. If using Kerberos authentication: a properly configured Kerberos environment and service principal

Getting Your HDFS Credentials

Step 1: Find the Namenode Address

The namenode is the master server in your Hadoop cluster that manages the filesystem metadata. You can find its address in your Hadoop configuration:

  1. On your Hadoop cluster, open the core-site.xml configuration file (typically located at /etc/hadoop/conf/core-site.xml or $HADOOP_HOME/etc/hadoop/core-site.xml)
  2. Look for the fs.defaultFS property — it contains the namenode address:
    <property>
    <name>fs.defaultFS</name>
    <value>hdfs://namenode-host:8020</value>
    </property>
  3. The default namenode port is 8020. Note down the hostname and port.

For High Availability (HA) clusters with multiple namenodes, check hdfs-site.xml for the dfs.namenode.rpc-address properties to get all namenode addresses.

Step 2: Verify Your Hadoop User

  1. Confirm you have a valid Hadoop user account that can access HDFS:
    hdfs dfs -ls /
  2. If your cluster uses simple authentication, the username is typically your OS user or can be set via the HADOOP_USER_NAME environment variable.
  3. Ensure your user has the necessary permissions on the directories you plan to use for backups.

Step 3: Kerberos Configuration (If Applicable)

If your Hadoop cluster uses Kerberos authentication:

  1. Verify you have a valid Kerberos ticket:
    klist
  2. Note down the service principal name for the namenode (commonly nn/_HOST@REALM or hdfs/_HOST@REALM)
  3. Ensure the Kerberos client is configured on the machine running Pluton

Connecting to Pluton

Step 1: Add Storage

  1. In Pluton, navigate to Storages
  2. Click Add Storage button
  3. Select HDFS from the provider list

Step 2: Configure Connection

Fill in the required fields:

FieldDescriptionExample
Namenode HostsComma-separated list of namenode hostnames and portsnamenode-1:8020 or namenode-1:8020,namenode-2:8020
UsernameYour Hadoop user namehdfs or backupuser
DescriptionA friendly name to identify this storage in PlutonProduction HDFS Cluster
  • For a single namenode, enter one host and port (e.g., namenode:8020)
  • For an HA cluster, list all namenodes separated by commas (e.g., namenode-1:8020,namenode-2:8020)

Step 3: Advanced Options (Optional)

If your cluster uses Kerberos authentication, configure these additional settings:

FieldDescriptionExample
Service Principal NameKerberos service principal name for the namenodenn/[email protected]
Data Transfer ProtectionKerberos data transfer protection levelauthentication, integrity, or privacy
EncodingBackend encoding for special charactersLeave default unless needed

Data Transfer Protection levels:

  • authentication — Verifies identities only (fastest)
  • integrity — Verifies identities and ensures data is not tampered with
  • privacy — Verifies identities, ensures integrity, and encrypts all data transfers (most secure, slowest)

Step 4: Test and Save

  1. Click Test Connection to verify your HDFS credentials and connectivity
  2. If successful, click Save
  3. Your HDFS storage is now ready for backup plans

Common Issues

Connection Refused

  • Verify the namenode hostname and port are correct
  • Ensure the namenode is running: hdfs dfsadmin -report
  • Check that the Pluton server can reach the namenode over the network (port 8020 by default)
  • Confirm there are no firewall rules blocking the connection

Permission Denied

  • Verify the username has read/write access to the target HDFS directory
  • Check HDFS permissions: hdfs dfs -ls -d /path/to/directory
  • If using Kerberos, ensure the ticket is valid and not expired

Kerberos Authentication Failures

  • Confirm the Kerberos client is properly configured on the Pluton server
  • Verify the service principal name matches your cluster configuration
  • Check that the Kerberos ticket has not expired (klist)
  • Ensure the system clock is synchronized — Kerberos is sensitive to clock skew

HA Failover Issues

  • List all namenode addresses in the Namenode Hosts field
  • Ensure both active and standby namenodes are reachable from the Pluton server

Best Practices

  • Use a dedicated Hadoop user for Pluton backups rather than sharing a general-purpose account. This makes it easier to manage permissions and audit backup activity.
  • Store backups in a dedicated HDFS directory (e.g., /backups/pluton) to keep them organized and separate from other data.
  • Set appropriate HDFS replication on your backup directories to balance durability and storage cost.
  • Use Kerberos authentication in production environments for secure access to your cluster.
  • Choose the right data transfer protection level — use privacy for sensitive data, or authentication when performance is a priority.
  • Monitor HDFS capacity regularly. Use the Pluton storage dashboard or hdfs dfs -df -h to check available space.
  • Note HDFS limitations — HDFS does not support server-side moves or erasure coding through Pluton. Time accuracy is limited to 1 second.