Unix

 

Unix Commands for ETL Developers

This page contains practical Unix commands used regularly in ETL projects for file handling, job automation, logs analysis, and scheduling.


🔹 1. Basic Commands

ls → list files pwdprint current directory cd → change directory mkdir → create folder rm → remove file

🔹 2. File Handling Commands

cat file.txt head -10 file.txt tail -50 file.txt grep "error" log.txt

Useful for checking source data or ETL job logs.


🔹 3. Text Processing (Important for ETL)

cut -d',' -f1,3 filename.csv awk -F',' '{print $2,$5}' file.csv sed '3p' file.txt sort file.txt uniq file.txt wc -l file.txt

Used to validate file formats & record counts.


🔹 4. Working With Files

cp source target mv file1.txt archive/ chmod 755 script.sh

🔹 5. Create and Run Shell Scripts

#!/bin/bash echo "Start ETL Job" date # your commands here

Run using:

sh script.sh

🔹 6. Cron Jobs (Scheduling)

crontab -e 0 2 * * * sh /home/scripts/run_jobs.sh

(Runs script every day at 2 AM.)


🔹 7. Useful Commands for DataStage

dsjob -run -mode NORMAL -warn 0 project jobname dsjob -jobinfo project jobname dsjob -logsum project jobname

🔹 8. Unix Interview Questions

  • What is softlink vs hardlink

  • Difference between grep, egrep, fgrep

  • What is AWK used for

  • How to view large log files

  • How to schedule jobs using cron



No comments:

Post a Comment

Most Recent posts

How to configure DB Connector Stages –