mog project: 2012.01

1.25.2012

Perl: Return value of glob() built-in function

Perl: glob 関数の戻り値について

glob 関数は、OSのシェルを起動しワイルドカードを含む文字列に対してパス展開を行うときに使う。

しかし、ワイルドカードを使用しない場合はパス展開処理が行われず、そのまま文字列がリストに格納される。
空文字列も同様、そのままリストに格納されることとなる。

glob 関数の戻り値に含まれているからといって、決してそのファイルの存在が保証されている訳ではない。
空文字列が含まれている可能性さえある。

従って、glob の結果に対して stat する場合などは常にその結果を検査するべきである。

・xで始まる名前のファイルが無い前提

#!/usr/bin/env perl
use strict;
use warnings;

sub p {
    print '(', @_ ? "'" . join("', '" , @_) . "'" : q(), ")\n";
}

p(glob '');
p(glob ' ');
p(glob 'x');
p(glob 'x?');
p(glob 'x*');
p(glob '[x]');
p(glob '{x}');
p(glob 'x x? x* [x] {x} ""');

・出力結果

('')
('')
('x')
()
()
()
('x')
('x', 'x', '')

・stat の例

use Carp;
use English qw( -no match vars );

my $path = 'x';

for my $file (glob $path) {
    my $result = stat $file
        or croak "Couldn't stat '$file': $OS_ERROR";
}

1.22.2012

Perl: How to get the month without leading zero in strftime

Perl: strftime によるゼロ埋めなしの「月」の取得

strftime を実行する際、フォーマット指定子「%m」利用すると、結果が1桁の場合ゼロ埋めが行われる。（01, 02, 03 … 12）

一部の環境では'%-m' (UNIX)や '%#m' (Windows) を指定することで、ゼロ埋めなしの月（1, 2, 3, … 12）を取得できるが、
汎用的に変換を行うには、以下のようなコードを書く必要がある。

コード

・拡張版strftime
制約： '%-m' の部分が最初に評価されるため、'%%-m' のようなフォーマットが指定された場合は
　　　　期待（文字列としての % – m の3文字を出力したい）と異なる結果となる可能性がある。

use POSIX qw( strftime );
sub strftime_ext {
    my ($fmt, $sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $isdst)
        = @_;
    my $m = $mon+ 1;
    $fmt =~ s/ %-m /$m/gxms;
    return strftime($fmt, $sec, $min, $hour, $mday, $mon, $year, $wday, $yday,
                    $isdst);
}

・呼び出し側

print strftime('%Y-%m-%d', localtime ), "\n";
print strftime('%Y-%-m-%d', localtime ), "\n";
print strftime_ext('%Y-%m-%d', localtime ), "\n";
print strftime_ext('%Y-%-m-%d', localtime ), "\n";

・出力結果（例）

2012-01-22
2012-%-m-22
2012-01-22
2012-1-22

所要時間の測定

他の方法も試し、所要時間を測ってみた。
測定には、Time::HiRes モジュールの gettimeofday および tv_interval を使えばよい。

・拡張版strftime (別の実装)

sub strftime_ext2 {
    my ($fmt, $sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $isdst)
        = @_;
    $fmt =~ s{ %-m }{ $mon + 1 }gexms;
    return strftime($fmt, $sec, $min, $hour, $mday, $mon, $year, $wday, $yday,
                    $isdst);
}

sub strftime_ext3 {
    my ($fmt, @args) = @_;
    my $m = $args[4] + 1;
    $fmt =~ s/ %-m /$m/gxms;
    return strftime($fmt, @args);
}

sub strftime_ext4 {
    my ($fmt, @args) = @_;
    $fmt =~ s{ %-m }{ $args[4] + 1 }gexms;
    return strftime($fmt, @args);
}

・測定用コード

use Time::HiRes qw( gettimeofday tv_interval );

sub test {
    my ($fnc_ref) = @_;
    
    my @time = localtime;
    my $start = [gettimeofday];
    for my $i (1..100000) {
        $fnc_ref->('%-m', @time );
    }
    print tv_interval($start), "\n";
}

test \&strftime_ext;
test \&strftime_ext2;
test \&strftime_ext3;
test \&strftime_ext4;

・結果

以下の順で性能が良いことがわかった。
strftime_ext > strftime_ext3 > strftime_ext2 > strftime_ext4

参考：
http://jabnz.blog69.fc2.com/blog-entry-728.html

1.20.2012

Perl: Recognizing time from text with a specified format

Perl: 時刻が表記された文字列の認識

任意の正規表現パターンと、そのパターンに沿って記述された時刻表記を与えられたとき、
その文字列を読み取り午前0:00からの経過秒数を取得したい。

ただし Perl 5.6 でも動作するものとしたいため、名前付きバッファ（Named Captures）は使えない。
車輪の再発明のような気もするが実装。

実行例

"%H:%M:%S", "01:02:03"　⇒　1時2分3秒　=　1 * 3600 + 2 * 60 + 3 = 3723
"^\d{8}%H%M%S", "20120119010203456789"　⇒　1時2分3秒　=　1 * 3600 + 2 * 60 + 3 = 3723
"%H : %M (:? : %S )?, "01:02"　⇒　1時2分 = 1 * 3600 + 2 * 60 = 3720
"%S [ ] %H – %M", "03 01-02"　⇒　1時2分3秒　=　1 * 3600 + 2 * 60 + 3 = 3723

コード

i番目に出現したフォーマット識別子に対して、単位あたりの秒数 W_iを設定してゆく。
読み込んだ数値をv_iとすれば、求める値はΣw_i × v_i である。

#!/usr/bin/env perl
use strict;
use warnings;

#-----------------------------------------------------------------------------
# Time Reader class
#-----------------------------------------------------------------------------
package TimeReader;

# Define format specifiers.
# hash to a list of regular expression pattern and how many seconds in a spec.
my %_SPEC_FOR = (
    '%H' => [ '\\d{2}', 60 * 60 ],
    '%M' => [ '\\d{2}',      60 ],
    '%S' => [ '\\d{2}',       1 ],
);

##############################################################################
# Constructor: new()
#
sub new {
    my ($class, $format) = @_;  # format pattern
    my @weights;

    SEARCH_SPEC:
    for my $i (0 .. length($format) - 2) {
        my $str = substr($format, $i, 2);
        next SEARCH_SPEC if !exists $_SPEC_FOR{$str};
        push @weights, $_SPEC_FOR{$str}->[1];
    }

    while (my ($key, $value) = each %_SPEC_FOR) {
        $format =~ s/ $key / ( $value->[0] ) /xms;
    }

    # Create instance.
    my $new_object = bless {}, $class;
    $new_object->{reg_exp} = $format;
    $new_object->{weights} = \@weights;
    return $new_object;
}

##############################################################################
# Method read()
#
# Read text, then returns elapsed seconds from 0:00 am.
#
sub read {
    my ($self, $text) = @_;  # text includes time representation

    my @ret = $text =~ m/ $self->{reg_exp} /xms or return;

    my $seconds = 0;
    for my $i (0 .. $#{ $self->{weights} }) {
        if ($ret[$i]) {
            $seconds += $ret[$i] * $self->{weights}[$i];
        }
    }
    return $seconds;
}

package main;

print TimeReader->new('%H:%M:%S')->read('01:02:03'), "\n";
print TimeReader->new('^\d{8}%H%M%S')->read('20120119010203456789'), "\n";
print TimeReader->new('%H : %M (:? : %S )?')->read('01:02'), "\n";
print TimeReader->new('%S [ ] %H - %M')->read('03 01-02'), "\n";

# unexpected result?
print TimeReader->new('%H:%H')->read('01:02'), "\n";
print TimeReader->new('%H*')->read('0102'), "\n";

制約事項

・捕捉すべき同種のフォーマット識別子が複数回マッチしないようにすること

　(a) パターンの中に捕捉すべき同種のフォーマット識別子が複数回使われた場合、
　　捕捉文字列に置き換わるのは最初の一回のみである。

　"%H:%H", "01:02"　⇒　( \d{2} ) :%H　として捕捉　⇒　マッチしないため undef を返す

　(b) 補足すべきフォーマット識別子に対して2回以上の量指定子を利用した場合、
　　最後にマッチした部分の値が評価される。

　"%H*", "0102"　⇒ ( \d{2} )* として捕捉　⇒　最後にマッチするのは「02」　⇒　2時として評価　⇒　2 * 3600 = 7200

・フォーマット識別子の定義は2文字であること（通常は '%' と他1文字とする）

1.15.2012

Roman numerals conversion in C++

C++: ローマ数字の変換処理

ためしに実装してみた。
整数と文字列2種類のコンストラクタを持ち、入力チェックは行なっていない。
整数であれば、1以上3,999,999以下の範囲、文字列は正しいローマ数字である場合にのみ正しく動作する。

・RomanNumeral クラス

#include <string>
#include <cmath>

//------------------------------------------------------------------------------
// Roman numerals conversion
//------------------------------------------------------------------------------
std::string const kRomanNumeralTable = "IVXLCDMvxlcdm";
class RomanNumeral {
 public:
  RomanNumeral(int decimal) : _decimal(decimal) {
    std::string roman;
    int i = 0;
    while (decimal) {
      std::string s;
      int d = decimal % 10;
      while (d) {
        if ((d - 1) & 4) {
          s.push_back(kRomanNumeralTable[i * 2 + 1]);
          d -= 5;
          continue;
        }
        if (d == 10) {
          s.push_back(kRomanNumeralTable[i * 2 + 2]);
          break;
        }
        s.push_back(kRomanNumeralTable[i * 2]);
        d -= d % 5 == 4 ? -1 : 1;
      }
      roman = s + roman;
      ++i;
      decimal /= 10;
    }
    _roman = roman;
  }
  RomanNumeral(std::string roman) : _roman(roman){
    int decimal = 0;
    int prev = 1000000000;

    for (std::string::const_iterator i = roman.begin(); i != roman.end(); ++i) {
      int pos = kRomanNumeralTable.find(*i);
      int weight = static_cast<int>(pow(10.0, (pos + 1) / 2)) / (pos % 2 + 1);
      if (prev < weight) decimal -= prev * 2;
      decimal += prev = weight;
    }
    _decimal = decimal;
  }

  int decimal() const { return _decimal; }
  std::string roman() const { return _roman; }

 private:
  int _decimal;
  std::string _roman;
};

・呼び出し側コード

#include <iostream>
#include <cassert>

int main() {
  for (int i = 1; i <= 3999999; ++i) {
    std::string s = RomanNumeral(i).roman();
    int x = RomanNumeral(s).decimal();
    assert(i == x);
  }

  for (int i = 1; i <= 20; ++i) {
    std::string s = RomanNumeral(i).roman();
    int x = RomanNumeral(s).decimal();
    std::cout << x << ": " << s << std::endl;
  }

  return 0;
}

・出力結果

1: I
2: II
3: III
4: IV
5: V
6: VI
7: VII
8: VIII
9: IX
10: X
11: XI
12: XII
13: XIII
14: XIV
15: XV
16: XVI
17: XVII
18: XVIII
19: XIX
20: XX

Setting Eclipse CDT color theme

Eclipse: CDT カラーテーマの設定

個別に色設定を行うのは骨が折れるが、Eclipse プラグインの「Eclipse Color Theme」を利用すれば簡単にエディタのカラーテーマを変えられる。
インストールには Marketplace Client が必要である。

1. Marketplace Client のインストール

Help – Install New Software… を開き、
Eclipse のメインサイト（http://download.eclipse.org/releases/xxxxxx）へ接続。
General Purpose Tools - Marketplace Client を選択し、インストール

2. Eclipse Color Theme のインストール

Help – Eclipse Marketplace… を開き、「Eclipse Color Theme 」で検索。
「Install」ボタンを押してインストール。
途中で Security Warning （unsigned content）が出るが続行。

3. Eclipse Color Theme の設定

再起動して Windows – Preferences を開くと、General – Appearance – Color Theme という項目が追加されている。
ここから好きなテーマを選択すればよい。

ここからテーマを追加することも可能。
http://www.eclipsecolorthemes.org/
ただし、テーマを設定しただけでは多少違和感のある部分もある。そのような箇所は個別に調整が必要だ。
分かりづらくてハマったのはこのあたり。

・オカレンスの背景色
Prefference - General - Editors - Text Editors - Annotations - C/C++ Occurences
※CDT用の個別設定があるので注意

・エラー/警告の波線の色
Prefference - General - Editors - Text Editors - Annotations – Errors
Prefference - General - Editors - Text Editors - Annotations – Warnings

・非アクティブなコードの背景色
C/C++ - Editor の右ペイン中央部 Appearance color options - Inactive code highlight

尚、設定ファイルの実体は、ワークスペース配下の以下のディレクトリの中に存在している。
.metadata/.plugins/org.eclipse.core.runtime/.settings

参考：
Eclipse Color Theme
http://marketplace.eclipse.org/content/eclipse-color-theme

1.11.2012

Oracle: How to find out the usage of tablespaces

Oracle: 表領域使用率の取得

dba_data_files と dba_free_space を結合して、表領域全体のサイズと空き容量（いずれもMB単位）を得る
SQLスクリプトの例。

出力は表領域名、全体の容量、空き容量の3項目。
ヘッダーは敢えて表示しない。

   1: set linesize     1000   2: set pagesize     0   3: set heading      off   4: set feedback     off   5: set serveroutput on   6:     7: SELECT a,   8:        to_char(nvl(x / 1024 / 1024, 0), '999999990.00'),   9:        to_char(nvl(y / 1024 / 1024, 0), '999999990.00')  10: FROM   ( SELECT tablespace_name a, sum(bytes) x FROM dba_data_files  11:          GROUP BY tablespace_name ),  12:        ( SELECT tablespace_name b, sum(bytes) y FROM dba_free_space  13:          GROUP BY tablespace_name )  14: WHERE  a = b(+)  15: ORDER  BY a;  16:    17: exit

Effects of the environment variables 'GZIP' and 'BZIP2'

UNIX: GZIP/BZIP2 環境変数の罠

環境変数 GZIP / BZIP2 には特別な効果があり、定義するとそれぞれ gzip / bzip2 コマンドを実行したときに
デフォルトのパラメータとして評価されるようになる。

これを知らずにこのようなシェルを書いてしまうと・・・

GZIP=/usr/bin/gzip
$GZIP foo.txt

実際にはこのようなコマンドを実行したのと同じである。

/usr/bin/gzip /usr/bin/gzip foo.txt

図らずも、gzip プログラム本体の圧縮が試みられる結果となる。

参考：
http://www.gnu.org/software/gzip/manual/gzip.html#Environment

1.09.2012

Perl: How to verify 'system()' function on Windows

Perl: Windows環境における system() 組み込み関数のエラー検証について

Perl ベストプラクティス本では、システムエラーの検証について以下のようにするのではなく

system $cmd
    and croak "Couldn't run: $cmd ($OS_ERROR)";

このようにすべしとある。

use POSIX qw( WIFEXITED );

WIFEXITED(system $cmd)
    and croak "Couldn't run: $cmd ($OS_ERROR)";

しかし、後者のコードを Windows 環境で実行するとこのようなエラーが通知される。

POSIX::WIFEXITED not implemented on this architecture at ...

まだPOSIXモジュールのソースを読み込んではいないが、結局前者の方法を取るしかないのか……

尚、Perl 6 では戻りのブール値が逆になり、ステータスコードが 0 の場合に true が返ってくるそうだ。

1.07.2012

How to get full path to the script in HTA, etc..

HTA、他: 自分自身のフルパスを取得する方法

スクリプトファイルの自分自身のパスと、その親ディレクトリを取得する方法の一覧。
実現方法はいずれも一例である。

・HTA (HTML Applications) – VBScript

path = unescape(window.location.pathname)If window.location.host <> "" Then path = "\\" & window.location.host & pathdir = CreateObject("Scripting.FileSystemObject").GetParentFolderName(path)

　パスに空白が含まれている場合、unescape を行わないと「%20」のように
　パーセントエンコーディングされたままである可能性がある。
　<script>タグの間では、window. を省略可能。
　※2012/06/16　追記： location.host を参照し、UNCパスに対応

・VBScript

path = WScript.ScriptFullNamedir = CreateObject("Scripting.FileSystemObject").GetParentFolderName(path)WScript.Echo path & vbCrLf & dir

・Bourne Shell

path=`(cd \`dirname $0\` && pwd)`/`basename $0`dir=`dirname $path`echo "${path}\n${dir}"

　2011/12/21 の記事の再掲。

・Perl

use File::Basename;use File::Spec;$path = File::Spec->rel2abs(__FILE__);$dir = dirname $path;print "$path\n$dir\n";

　__FILE__ に替えて $0 を使う方法もある。それらに直接フルパスが格納されるかは処理系依存の模様。
　Cwd モジュールなどを使った他のやり方も多く存在。

・Python

import ospath = os.path.abspath(__file__)dir = os.path.dirname(path)print '%s\n%s' % (path, dir)

　__file__ にフルパスが入るかは処理系依存の模様。
import sys を行なってから sys.argv[0] を参照する方法でも代替可能。

・Windows バッチファイル　※2012/04/01追記

@echo offset full_path=%~f0set dir=%~dp0echo %full_path%echo %dir%

ディレクトリには、末尾の '\' が付加されているので注意。
詳細情報は for /? で確認することもできる。

参考：
http://www.jazoka.info/index.php?db=so&id=84932

1.05.2012

Perl: Subroutines in a hash

Perl: ハッシュの中のサブルーチン

任意のキーによって処理内容自体を変えたい場合、if 文の連鎖よりもハッシュを使ったほうがエレガントだ。

単純な処理なら、無名サブルーチンでよい。

#!/usr/bin/env perl

use strict;
use warnings;

my %comp = (
    'eq' => sub{ return $_[0] == $_[1] },  # equality
    'lt' => sub{ return $_[0] <  $_[1] },  # less than
    'le' => sub{ return $_[0] <= $_[1] },  # less than or equal to
);

for my $op ('eq', 'lt', 'le') {
    for my $x (1 .. 3) {
        print "$x $op 2: ", $comp{$op}($x, 2) ? 'YES' : 'NO', "\n";
    }
}

・出力

1 eq 2: NO
2 eq 2: YES
3 eq 2: NO
1 lt 2: YES
2 lt 2: NO
3 lt 2: NO
1 le 2: YES
2 le 2: YES
3 le 2: NO

ちなみに、6行目を次のように書いてしまうとコンパイルエラーとなる。
無名ハッシュの参照をとると、その戻り値は 1個だけであり、ハッシュ初期化のための要素が奇数になってしまうためだ。

my %comp = {
    'eq' => sub{ return $_[0] == $_[1] },  # equality
    'lt' => sub{ return $_[0] <  $_[1] },  # less than
    'le' => sub{ return $_[0] <= $_[1] },  # less than or equal to
};

エラーメッセージは以下のとおり。

Reference found where even-sized list expected at ... line 10.
Use of uninitialized value in subroutine entry at ... line 14.
Can't use string ("") as a subroutine ref while "strict refs" in use at ... line 14.

1.03.2012

How to copy text to the clipboard in Python

Python: クリップボードへ文字列を貼り付け

下記サイトより転記。Windows/Mac/Linux に対応。
http://www.pasteall.org/14794/python

# Code from
# http://blenderscripts.googlecode.com/svn-history/r41/trunk/scripts/sketch_export.py
def copy_to_clipboard(text):
    """
   Copy text to the clipboard
   Returns True if successful. False otherwise.
   """
   
    # =============================================================================
    # win32 (Windows)
    try:
        import win32clipboard
        win32clipboard.OpenClipboard()
        win32clipboard.EmptyClipboard()
        win32clipboard.SetClipboardText(text)
        win32clipboard.CloseClipboard()
        return True
    except:
        pass
   
    # =============================================================================
    # clip (Windows)
    try:
        import subprocess
        p = subprocess.Popen(['clip'], stdin=subprocess.PIPE)
        p.stdin.write(text)
        p.stdin.close()
        retcode = p.wait()
        return True
    except:
        pass
       
    # =============================================================================
    # pbcopy (Mac OS X)
    try:
        import subprocess
        p = subprocess.Popen(['pbcopy'], stdin=subprocess.PIPE)
        p.stdin.write(text)
        p.stdin.close()
        retcode = p.wait()
        return True
    except:
        pass
       
    # =============================================================================
    # xclip (Linux)
    try:
        import subprocess
        p = subprocess.Popen(['xclip', '-selection', 'c'], stdin=subprocess.PIPE)
        p.stdin.write(text)
        p.stdin.close()
        retcode = p.wait()
        return True
    except:
        pass
       
    # =============================================================================
    # xsel (Linux)
    try:
        import subprocess
        p = subprocess.Popen(['xsel'], stdin=subprocess.PIPE)
        p.stdin.write(text)
        p.stdin.close()
        retcode = p.wait()
        return True
    except:
        pass
       
    # =============================================================================
    # pygtk
    try:
        # Code from
        # http://www.vector-seven.com/2007/06/27/passing-data-between-gtk-applications-with-gtkclipboard/
        import pygtk
        pygtk.require('2.0')
        import gtk
        # get the clipboard
        clipboard = gtk.clipboard_get()
        # set the clipboard text data
        clipboard.set_text(text)
        # make our data available to other applications
        clipboard.store()
        return True
    except:
        pass
       
    return False

1.02.2012

Refactoring punctuation variables in Perl

Perl: 句読点変数のリファクタリング

Perl には $! $? $@ $| $" $, などのいわゆる句読点変数が多く定義されている。
これがコードの視認性の悪さにつながっているのは疑いの余地もない。

English モジュールを利用することで、以下のような名前付きの変数を使えるようになる。

・名前付き変数の例

組み込みグローバル変数	名前付き変数(English モジュール)	概要
$!	$OS_ERROR $ERRNO	エラー番号またはエラーメッセージ
$?	$CHILD_ERROR	最後に実行されたコマンドのステータス
$@	$EVAL_ERROR	直前の eval コマンドのエラーメッセージ
$\|	$OUTPUT_AUTOFLUSH	出力の自動フラッシュ
$"	$LIST_SEPARATOR	リストセパレータ
$,	$OUTPUT_FIELD_SEPARATOR $OFS	出力フィールドセパレータ

ただし、『Perl ベストプラクティス』 (O'REILLY) によれば、English モジュールの使い方は以下のようにすべしとある。

use English qw( -no_match_vars );

これは、マッチ変数を定義することによる弊害（正規表現の性能劣化）を防ぐためである。

参考：
『Perl ベストプラクティス』 (O'REILLY)
http://www.tutorialspoint.com/perl/perl_special_variables.htm

登録: 投稿 (Atom)