User Tools

Site Tools


Sidebar

HPL/SQL is included to Apache Hive since version 2.0

copy-from-ftp

COPY FROM FTP Statement

COPY FROM FTP statement allows to copy files from a FTP server to local or any Hadoop compatible file system. Using this statement you can easily copy FTP subdirectories into HDFS i.e.

The NEW option helps you build a ETL process and download only new files from FTP.

Syntax:

COPY FROM FTP host [USER user [PWD password]] [DIR directory] [FILES files_wildcard] 
  [TO [LOCAL] target_directory] [options]
 
options:
  OVERWRITE | NEW
  SUBDIR
  SESSIONS num  

Notes:

  • host, user and pwd specify the FTP host name, user name and password (identifier, string literal, variable or expression can be specified).
  • DIR option specifies the directory to get files, optional. If skipped, the current working FTP directory is used
  • FILES option specifies a wildcard (Java regular expression) to choose which files to transfer. By default, all files from the specified directory are transferred.
  • LOCAL keyword means that files are copied to the local file system. By default files are copied to HDFS compatible file system.
  • OVERWRITE means that the existing files will be overwritten, this is the default.
  • NEW means that only new files will be transferred, and existing files will be skipped.
  • SUBDIR option specifies to transfer files in sub-directories. The directory structure is recreated in the target. By default, the command transfers files only from the directory specified by DIR option.
  • SESSIONS specifies the number of concurrent FTP sessions to transfer the files. Each session transfers the whole file. By default, files are copied in the single session.

Example:

Copy new files including files in subdirectories from a FTP server to HDFS location using 3 concurrent connections:

copy from ftp 'ftp.myserver.com' user 'paul' pwd '***' dir data/sales/in subdir 
  files '.*' to /data/sales/raw sessions 3 new

Compatibility: HPL/SQL Extension

Version: HPL/SQL 0.3.17