Friday, June 26, 2009

Finding Block Size of Filesystem

In order to, find block size of a filesystem on Linux,

$ sudo tune2fs -l /dev/sda1 | grep -i 'block size'
Block size: 4096

OR

$ echo "mitesh"> test && du test | awk '{print $1}' && rm -f test
4K

Thursday, June 25, 2009

Editing Very Very Large File

Suppose we want to do changes in few lines in a very very large file. It is not possible to open such a big file(say size in GBs > RAM+Swap size) in a editor. Even sed/awk takes very long time, because they do pattern matching if mentioned on every line, otherwise, we can do one line editing with a line number. I have written a Perl Script to edit multiple lines independently. It uses sed commands to edit a line.

Format of Config file is:
line_number:sed_command


#!/usr/bin/perl -w
#===============================================================================
#
# FILE: ed_large_file.pl
#
# USAGE: ./ed_large_file.pl <config_file> <file_name> [overwrite}
#
# DESCRIPTION: Edit very[ very] large file
#
# OPTIONS: ---
# REQUIREMENTS: ---
# BUGS: ---
# NOTES: ---
# AUTHOR: Mitesh Singh Jat (mitesh), <mitesh[at]yahoo-inc[dot]com>
# VERSION: 1.0
# CREATED: Thursday 25 June 2009 02:32:37 IST IST
# REVISION: ---
#===============================================================================

use strict;
use warnings;

if (@ARGV < 2)
{
print STDERR "$0: <config_file> <file_name> [overwrite]\n";
print STDERR "!!!Be careful while using [overwrite] option,\n";
print STDERR "because original file will be deleted.\n";
exit(-1);
}

my $conf_file = $ARGV[0];
my $large_file = $ARGV[1];
my $overwrite = 0;
if (@ARGV >= 3 && $ARGV[2] eq "overwrite")
{
$overwrite = 1;
}

my $temp_file = `dirname $large_file`;
chomp($temp_file);
if ($temp_file eq "" || (!(-d $temp_file)))
{
print STDERR "$0: Cannot find dirname for temporary file.\n";
print STDERR "Please check path of file '$large_file'\n";
exit(-1);
}

$temp_file = $temp_file . "/temp";
print "Temporary file is '$temp_file'\n";

## Read config file
print "Reading config file '$conf_file'\n";
open(CFH, "$conf_file") or die("Cannot read Config file '$conf_file'\n");
my $line;
my %lineno_sedcmd;
while ($line = <CFH>)
{
chomp($line);
my ($lineno, $sedcmd) = split /:/, $line, 2;
if (defined($sedcmd))
{
$lineno_sedcmd{$lineno} = $sedcmd;
print "$lineno $lineno_sedcmd{$lineno}\n";
# Verifying sedcmd before running it;
# it gives a chance to reedit config file
my $cmd = "echo \"Mitesh Singh Jat\" | sed '$sedcmd' 1> /dev/null 2>&1";
if (!(system($cmd) == 0))
{
print STDERR "$0: sed command '$sedcmd' for line '$lineno'";
print STDERR "is having error. Please recheck with \$ man sed\n";
close(CFH);
exit(-1);
}
}
}
close(CFH);

my @line_nos;
foreach (sort keys (%lineno_sedcmd))
{
push(@line_nos, $_);
}

## Open large file
open(LFH, "$large_file") or die("$0: Cannot open file '$large_file'");
## Temporary File
open(OFH, ">$temp_file") or die("$0: Cannot create temporary file '$temp_file'");
my $nline = 0;
my $i = 0;
my $end_idx = @line_nos - 1;
print "Processing...";
while ($line = <LFH>)
{
++$nline;
if ($line_nos[$i] == $nline) # now edit
{
++$i; # This config line is over
if ($i > $end_idx)
{
$i = $end_idx;
}
chomp($line);
my $cmd = "echo \"$line\" | sed '$lineno_sedcmd{$nline}'";
#print "$cmd\n";
my $out_line = `$cmd`;
print OFH "$out_line";
print " $nline"; #sleep 1; # to see progress :)
}
else
{
print OFH "$line";
}
}

print "\n";

close(OFH);
close(LFH);

if ($overwrite == 0)
{
print "done\n";
exit(0);
}

## Overwite original file by deleting it and moving temp
print "Overwriting...\n";
my $cmd = "rm -f $large_file \&\& mv $temp_file $large_file";
print "$cmd\n";
system($cmd) == 0
or die("Problem in overwriting. '$cmd' failed: $?\n");
print "done\n";
exit(0);


Sample Run:


--(0 : 618)> ./ed_large_file.pl
./ed_large_file.pl:
<config_file> <file_name> [overwrite]
!!!Be careful while using [overwrite] option,

because original file will be deleted.

--(mitesh@roundduck-lm)-(~/Programming/Perl/Editing_Large_Files)--
--(255 : 619)> cat large_file.txt
Shree Ganeshay Namah
Shri Bharat Singh Jat
Smt Amita Jat
Mitesh Jat
Shikha Jat
Shilpa Jat
This is garbage line. Please delete it.
--(mitesh@roundduck-lm)-(~/Programming/Perl/Editing_Large_Files)--
--(0 : 620)> cat large_file.conf
1:s/^.*$/!!&!!/
4:s/ / Singh /
7:/.*/d
--(mitesh@roundduck-lm)-(~/Programming/Perl/Editing_Large_Files)--
--(0 : 621)> ./ed_large_file.pl large_file.conf large_file.txt
Temporary file is './temp'
Reading config file 'large_file.conf'
1 s/^.*$/!!&!!/
4 s/ / Singh /
7 /.*/d
Processing... 1 4 7
done
--(mitesh@roundduck-lm)-(~/Programming/Perl/Editing_Large_Files)--
--(0 : 622)> cat ./temp
!!Shree Ganeshay Namah!!
Shri Bharat Singh Jat
Smt Amita Jat
Mitesh Singh Jat
Shikha Jat
Shilpa Jat
--(mitesh@roundduck-lm)-(~/Programming/Perl/Editing_Large_Files)--
--(0 : 623)> ./ed_large_file.pl large_file.conf large_file.txt overwrite
Temporary file is './temp'
Reading config file 'large_file.conf'
1 s/^.*$/!!&!!/
4 s/ / Singh /
7 /.*/d
Processing... 1 4 7
Overwriting...
rm -f large_file.txt && mv ./temp large_file.txt
done
--(mitesh@roundduck-lm)-(~/Programming/Perl/Editing_Large_Files)--
--(0 : 624)> cat large_file.txt
!!Shree Ganeshay Namah!!
Shri Bharat Singh Jat
Smt Amita Jat
Mitesh Singh Jat
Shikha Jat
Shilpa Jat
--(mitesh@roundduck-lm)-(~/Programming/Perl/Editing_Large_Files)--
--(0 : 625)>

Tuesday, June 2, 2009

Web Access through Proxy Server by Terminal Applications

In many companies/Universities, the web access is granted through Proxy Server (Usually SQUID; hence port 3128).
There are many terminal applications (run on command line interface), which access Internet/Web. For example:
wget (to download file), ftp, lynx/links (to access website), apt/yum (to download and install package). If we are behind
proxy, these applications do not work. The easy solution is to set some shell environment variables, explained below:

For accessing web(lynx/links) using a non-authenticated proxy:
$ export http_proxy="http://proxy.yourcompany.com:3128"

Verify that the setting took place
$ echo $http_proxy
http://proxy.yourcompany.com:3128

For accessing web(lynx/links) using a authenticated proxy:
$ export http_proxy="http://username:password@proxy.yourcompany.com:3128"

If you want the change to be permanent (there each time you open a terminal),
add the export line to .bashrc in your 'home' directory.
$ echo 'export http_proxy="http://proxy.yourcompany.com:3128"'″ >> ~/.bashrc

Secure HTTP (over SSL) access
$ export https_proxy="https://proxy.yourcompany.com:3128"

FTP access
$ export ftp_proxy="ftp://proxy.yourcompany.com:3128"