Unix Commands for ETL Developers
This page contains practical Unix commands used regularly in ETL projects for file handling, job automation, logs analysis, and scheduling.
🔹 1. Basic Commands
ls → list files
pwd → print current directory
cd → change directory
mkdir → create folder
rm → remove file
🔹 2. File Handling Commands
cat file.txt
head -10 file.txt
tail -50 file.txt
grep "error" log.txt
Useful for checking source data or ETL job logs.
🔹 3. Text Processing (Important for ETL)
cut -d',' -f1,3 filename.csv
awk -F',' '{print $2,$5}' file.csv
sed '3p' file.txt
sort file.txt
uniq file.txt
wc -l file.txt
Used to validate file formats & record counts.
🔹 4. Working With Files
cp source target
mv file1.txt archive/
chmod 755 script.sh
🔹 5. Create and Run Shell Scripts
#!/bin/bash
echo "Start ETL Job"
date
# your commands here
Run using:
sh script.sh
🔹 6. Cron Jobs (Scheduling)
crontab -e
0 2 * * * sh /home/scripts/run_jobs.sh
(Runs script every day at 2 AM.)
🔹 7. Useful Commands for DataStage
dsjob -run -mode NORMAL -warn 0 project jobname
dsjob -jobinfo project jobname
dsjob -logsum project jobname
🔹 8. Unix Interview Questions
-
What is softlink vs hardlink
-
Difference between grep, egrep, fgrep
-
What is AWK used for
-
How to view large log files
-
How to schedule jobs using cron
No comments:
Post a Comment