Wednesday, April 16, 2008

Group By Counts in a File (Generally csv files)

You can use following snippet to get GROUP BY - COUNT as in SQL
query. Like

If you want number of students in a department:

SELECT department, COUNT(*) FROM Student GROUP BY department;

Suppose that this file contains:

Mitesh Singh Jat,CSA
Mahesh Singh Sonal,CSA
Paneendra B A,SSA
Shrikant Joshi,SERC
Rupesh Bajaj,CSA
Chandrakant,SSA

1 #!/bin/bash
2
3 if [[ $# <> 4 ]]
4 then
5 echo "Usage: $0 file delim field_pos
6 exit
7 fi
8
9 CMD=`echo $1 | sed 's/.*\.gz/zcat/'`
10
11 if [[ $CMD != "zcat" ]]
12 then
13 CMD="cat"
14 fi
15
16 echo "========================"
17 echo "Buckets for $1"
18 echo "========================"
19 echo "Bucket_id Count"
20 echo "------------------------"
21 $CMD $1 | cut -d$2 -f$3 | awk '{sum[$1]++} END {for (x in sum) {out+=sum[x]; print x" \t"sum[x];} print "========================"; print "Total records : " out;}'
22 echo "========================"

$ ./group_by_count.sh Student , 2
========================
Buckets for Student
========================
Bucket_id Count
------------------------
SSA 2
SERC 1
CSA 3
========================
Total records: 6
========================


No comments: