Unix Commands for ETL Developers
This page contains practical Unix commands used regularly in ETL projects for file handling, job automation, logs analysis, and scheduling.
🔹 1. Basic Commands
ls → list files
pwd → print current directory
cd → change directory
mkdir → create folder
rm → remove file
🔹 2. File Handling Commands
cat file.txt
head -10 file.txt
tail -50 file.txt
grep "error" log.txt
Useful for checking source data or ETL job logs.
🔹 3. Text Processing (Important for ETL)
cut -d',' -f1,3 filename.csv
awk -F',' '{print $2,$5}' file.csv
sed '3p' file.txt
sort file.txt
uniq file.txt
wc -l file.txt
Used to validate file formats & record counts.
🔹 4. Working With Files
cp source target
mv file1.txt archive/
chmod 755 script.sh
🔹 5. Create and Run Shell Scripts
#!/bin/bash
echo "Start ETL Job"
date
# your commands here
Run using:
sh script.sh
🔹 6. Cron Jobs (Scheduling)
crontab -e
0 2 * * * sh /home/scripts/run_jobs.sh
(Runs script every day at 2 AM.)
🔹 7. Useful Commands for DataStage
dsjob -run -mode NORMAL -warn 0 project jobname
dsjob -jobinfo project jobname
dsjob -logsum project jobname
🔹Sample unix script to convert excel_to_csv file
excel_to_csv.sh
#!/bin/bash
INPUT_DIR=/data/input
OUTPUT_DIR=/data/output
LOG_DIR=/data/logs
FILE_NAME=$1
mkdir -p $OUTPUT_DIR $LOG_DIR
echo "Started conversion at $(date)" >> $LOG_DIR/excel_convert.log
xlsx2csv "$INPUT_DIR/$FILE_NAME" "$OUTPUT_DIR/${FILE_NAME%.xlsx}.csv"
if [ $? -ne 0 ]; then
echo "Conversion failed for $FILE_NAME" >> $LOG_DIR/excel_convert.log
exit 1
fi
echo "Conversion successful for $FILE_NAME" >> $LOG_DIR/excel_convert.log
exit 0
Run Script
sh excel_to_csv.sh sales.xlsx
______________________________________________________________________________________
8. Unix Interview Questions
-
What is softlink vs hardlink
-
Difference between grep, egrep, fgrep
-
What is AWK used for
-
How to view large log files
-
How to schedule jobs using cron
No comments:
Post a Comment