<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Perl Swirl</title>
	<atom:link href="http://perlswirl.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://perlswirl.wordpress.com</link>
	<description>Just another WordPress.com weblog</description>
	<lastBuildDate>Tue, 03 Jun 2008 16:40:46 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='perlswirl.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Perl Swirl</title>
		<link>http://perlswirl.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://perlswirl.wordpress.com/osd.xml" title="Perl Swirl" />
	<atom:link rel='hub' href='http://perlswirl.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Updated Proposal (2)</title>
		<link>http://perlswirl.wordpress.com/2008/05/29/updated-proposal/</link>
		<comments>http://perlswirl.wordpress.com/2008/05/29/updated-proposal/#comments</comments>
		<pubDate>Thu, 29 May 2008 18:16:51 +0000</pubDate>
		<dc:creator>unni</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://perlswirl.wordpress.com/?p=10</guid>
		<description><![CDATA[I&#8217;ve updated my proposal just a bit. again Check it out. Proposal: Add full text-search support to Bricolage. KinoSearch is a search utility written in Perl. Currently at version 0.162, it supports full text search. Support for this utility will &#8230; <a href="http://perlswirl.wordpress.com/2008/05/29/updated-proposal/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=perlswirl.wordpress.com&amp;blog=3728920&amp;post=10&amp;subd=perlswirl&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve updated my proposal just a bit. <strong>again</strong></p>
<p>Check it out.<br />
Proposal: Add full text-search support to Bricolage.</p>
<p>KinoSearch is a search utility written in Perl.<br />
Currently at version 0.162, it supports full text search.<br />
Support for this utility will be added to Bricolage.</p>
<p>Synopsis:</p>
<p>The aim of this project is to include full text-search support to the content mangement system, Bricolage. Bricolage already has search capabilities. A full text-search will be integrated to the already existent search API.</p>
<p>To implement this four major requirements will have to be fulfilled:</p>
<p>An InvIndex will have to be generated for the all the data placed in each table or for the documents. An indexer function will be written which creates an InvIndex for all the documents. I intend to begin with adding full text search to objects like stories, media and templates.</p>
<p>Next a method will have to be added to the search API which facilitates the conversion of the search query to an Invindex using the same analyzer that was used to create the Invindex for the documents so that the complete text that is to be searched for can be recognized within any table simply by using the Invindex for the search query.</p>
<p>A UI widget already exists in Bricolage. The UI widget will have to be adapted<br />
to the modified search API. Callbacks already exist for all the basic functionalities<br />
of a search UI. A controller function will be used to utilize these functios and add<br />
additional full text search feature. The search will required an updated set of fields<br />
for which the search has to be executed. The name field should now also be able to take<br />
in full text strings. When the results are found, the contexts of the corresponding<br />
results will also need to be displayed</p>
<p>Since Bricolage currently boasts an array of SOAP integrated APIs, the modified<br />
search API, or rather the full-text search API, will naturally be integrated too.<br />
So the API should be further appended to hold functions to code the users&#8217; search<br />
request into XML and send it across to the server and a function that recieves the<br />
result from the server and decodes it before displaying it the user.</p>
<p>The project will support not only postgresql but mysql and so on. The function that generates the InvIndex and will be designed as though it were a plug-in and if someone wanted to use a search other than KinoSearch, like Lucene or even tsearch, could use an alternate indexer function which would generate the necessary index. The search function could also be designed as a plug-in or the parameters passed to the search function could vary according to the indexer function used.</p>
<p>Also there The installer script will have be modified to recognize and use an already installed version of KinoSearch or install KinoSearch if its not found.</p>
<p>Proposed Road Map:</p>
<p>milestone: Start project</p>
<p>&lt;May 28&gt;.Begin Research on implementation of the full text search and the current<br />
state of the search API in the Bricolage System</p>
<p>&lt;June 2&gt; Begin Designing the changes to the table structure and the changes to the<br />
search API</p>
<p>milestone: Submission of initial design and specification.</p>
<p>&lt;June 5&gt; The design of planned additions and changes and their respective changes will<br />
be submitted.</p>
<p>Coding begins</p>
<p>milestone: Function for incrementing  Invindex</p>
<p>&lt;June 12&gt; Submission of code that generate the InvIndex for the tables holding data or documents</p>
<p>testing and verification of given code.</p>
<p>milestone: Modification of the Search API</p>
<p>&lt;June 20&gt; Submission of modifications and additions to the existing search API</p>
<p>testing and verification of submitted changes</p>
<p>milestone: Integration to the UI widget</p>
<p>&lt;June 25&gt; The Search UI widget is integrated with the modified API</p>
<p>&lt;June 27&gt; The UI widget is modified to hold the full text search results</p>
<p>testing and verification</p>
<p>milestone: SOAP Integration</p>
<p>&lt;July 7&gt; The modified search API will be integrated with SOAP</p>
<p>testing and verification</p>
<p>milestone: Final Project Submission</p>
<p>&lt;July 17&gt; The Final testing and verification process will come to an end and<br />
the full text search feature will be fully enabled.</p>
<p>################################################################################################</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/perlswirl.wordpress.com/10/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/perlswirl.wordpress.com/10/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/perlswirl.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/perlswirl.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/perlswirl.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/perlswirl.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/perlswirl.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/perlswirl.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/perlswirl.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/perlswirl.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/perlswirl.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/perlswirl.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/perlswirl.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/perlswirl.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/perlswirl.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/perlswirl.wordpress.com/10/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=perlswirl.wordpress.com&amp;blog=3728920&amp;post=10&amp;subd=perlswirl&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://perlswirl.wordpress.com/2008/05/29/updated-proposal/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/e06142390bd11bed85e1e0e21b75ce59?s=96&#38;d=identicon" medium="image">
			<media:title type="html">unni</media:title>
		</media:content>
	</item>
		<item>
		<title>KinoSearch vs Tsearch2 contd.</title>
		<link>http://perlswirl.wordpress.com/2008/05/19/kinosearch-vs-tsearch2-contd/</link>
		<comments>http://perlswirl.wordpress.com/2008/05/19/kinosearch-vs-tsearch2-contd/#comments</comments>
		<pubDate>Mon, 19 May 2008 18:17:46 +0000</pubDate>
		<dc:creator>unni</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://perlswirl.wordpress.com/?p=7</guid>
		<description><![CDATA[Report2: The working principle of tsearch2 is thus: The document is reduced into a tsvector, which is a data type which holds words in the document and also positional information of each word in the document. As mentioned in my &#8230; <a href="http://perlswirl.wordpress.com/2008/05/19/kinosearch-vs-tsearch2-contd/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=perlswirl.wordpress.com&amp;blog=3728920&amp;post=7&amp;subd=perlswirl&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Report2:</p>
<p>The working principle of tsearch2 is thus:</p>
<p>The document is reduced into a tsvector, which is a data type which holds words in the document and also positional information of each word in the document. As mentioned in my earlier report this data type can hold only up to 256  positions  per  document.  Each  word  in  the  tsvector  is represented as a <a href="http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsALexeme.htm">lexeme</a>. Then a list of indexes according to the lexeme list is made.</p>
<p>The searches to be performed is stored in a data type called tsquery, which stores your search terms as lexemes again. Then we query for the necessary vector from the table with the tsvector column and the necessary documents are returned.</p>
<p>I tried building a very simple search engine from an example cited on the user&#8217;s guide to tsearch2.</p>
<pre>=# <strong>CREATE TABLE docs ( id SERIAL, doc TEXT, vector tsvector )</strong>
=# <strong>CREATE INDEX docs_index ON docs USING gist(vector);
</strong># <strong>CREATE FUNCTION insdoc(text) RETURNS void LANGUAGE sql AS
  'INSERT INTO docs (doc, vector) VALUES ($1, to_tsvector($1));'

</strong>=# <strong>SELECT insdoc('This is a test document I made up.')</strong>
=# <strong>SELECT insdoc('For my summer of code project for google')</strong>
=# <strong>SELECT insdoc('Which is to implement full text search for Bricolage.')</strong>
=# <strong>SELECT insdoc('I am testing tsearch2 right now')</strong>
=# <strong>SELECT insdoc('And I'm running out of sentences to put in.')</strong>
=# <strong>SELECT insdoc('So I guess I'm gonna stop here.')</strong>
=# <strong>SELECT insdoc('Okay, so maybe one more then... .')</strong>
=# <strong>CREATE TYPE finddoc_t AS (id INTEGER, headline TEXT, rank REAL)</strong>
=# <strong>CREATE FUNCTION finddoc(text) RETURNS SETOF finddoc_t LANGUAGE sql AS '
   SELECT id, headline(doc, q), rank(vector, q)
     FROM docs, to_tsquery($1) AS q
     WHERE vector @@ q ORDER BY rank(vector, q) DESC'

</strong>This was my test case:
=# <strong>SELECT * FROM finddoc('text|search')</strong>
 id |                       headline                        | rank
----+-------------------------------------------------------+------
  3 | Which is to implement full &lt;b&gt;text&lt;/b&gt; &lt;b&gt;search&lt;/b&gt;  | 0.19
  4 | &lt;b&gt;tsearch2&lt;/b&gt; right now                             |  0.1

(2 rows)
=# <strong>SELECT doc FROM docs WHERE id = 3</strong>
                       doc
-------------------------------------------------
 Which is to implement full text search for Bricolage.
(1 row)</pre>
<p>Something I noticed was that tsearch returned only those leximes which were exact matches. While converting search cases into leximes for tsquery, this has to be paid attention to.</p>
<p>I ran into some problems when I first tried to install KinoSearch .</p>
<p>So I checked for a list of Kinosearch users:<br />
I started out with <a href="www.evo.com">www.evo.com</a>. The search was user-friendly and fast.<br />
( I just thought I&#8217;ll mention all the steps I followed)</p>
<p>After I installed KinoSearch using cpan I tried out a few examples from this <a href="http://search.cpan.org/~creamyg/KinoSearch-0.162/lib/KinoSearch.pm#Getting_Started">page</a>.<br />
I made no real changes to the example search engine or the test case mentioned on the page.</p>
<p>Check the code I used:</p>
<pre><em><strong>invindex.plx

</strong></em>    #!/usr/bin/perl
    use strict;
    use warnings;

    use File::Spec;
    use KinoSearch::InvIndexer;
    use KinoSearch::Analysis::PolyAnalyzer;

    my $source_dir       = '';
    my $path_to_invindex = '';
    my $base_url         = '/us_constitution';

    opendir( my $source_dh, $source_dir )
        or die "Couldn't opendir '$source_dir': $!";
    my @filenames = grep {/\.html/} readdir $source_dh;
    closedir $source_dh or die "Couldn't closedir '$source_dir': $!";

    ### Analyzer.
    my $analyzer = KinoSearch::Analysis::PolyAnalyzer-&gt;new(
        language =&gt; 'en',
    );

    ### Create a InvIndexer object.
    my $invindexer = KinoSearch::InvIndexer-&gt;new(
        analyzer =&gt; $analyzer,
        invindex =&gt; $path_to_invindex,
        create   =&gt; 1,
    );

    ### fields.
    $invindexer-&gt;spec_field( name =&gt; 'title' );
    $invindexer-&gt;spec_field(
        name       =&gt; 'bodytext',
        vectorized =&gt; 1,
    );
    $invindexer-&gt;spec_field(
        name    =&gt; 'url',
        indexed =&gt; 0,
    );

    foreach my $filename (@filenames) {
        next if $filename eq 'index.html';
        my $filepath = File::Spec-&gt;catfile( $source_dir, $filename );
        open( my $fh, '&lt;', $filepath )
            or die "couldn't open file '$filepath': $!";
        my $content = do { local $/; &lt;$fh&gt; };

        ### new document.
        my $doc = $invindexer-&gt;new_doc;

        $content =~ m#&lt;title&gt;(.*?)&lt;/title&gt;#s
            or die "couldn't isolate title in '$filepath'";
        my $title = $1;
        $content =~ m#&lt;div id="bodytext"&gt;(.*?)&lt;/div&gt;&lt;!--bodytext--&gt;#s
            or die "couldn't isolate bodytext in '$filepath'";
        my $bodytext = $1;
        $bodytext =~ s/&lt;.*?&gt;/ /gsm;    

        ### value for each field.
        $doc-&gt;set_value( url      =&gt; "$base_url/$filename" );
        $doc-&gt;set_value( title    =&gt; $title );
        $doc-&gt;set_value( bodytext =&gt; $bodytext );

        ### Add the document to the invindex.
        $invindexer-&gt;add_doc($doc);

    }

    $invindexer-&gt;finish;

<em><strong>SEARCH.CGI</strong></em></pre>
<h2><a class="u" title="click to go to top of document" name="search.cgi" href="http://search.cpan.org/%7Ecreamyg/KinoSearch-0.162/lib/KinoSearch/Docs/Tutorial.pod#___top"></a></h2>
<pre>    #!/usr/bin/perl -T
    use strict;
    use warnings;

    use CGI;
    use List::Util qw( max min );
    use POSIX qw( ceil );
    use KinoSearch::Searcher;
    use KinoSearch::Analysis::PolyAnalyzer;
    use KinoSearch::Highlight::Highlighter;

    my $cgi           = CGI-&gt;new;
    my $q             = $cgi-&gt;param('q');
    my $offset        = $cgi-&gt;param('offset');
    my $hits_per_page = 10;
    $q      = '' unless defined $q;
    $offset = 0  unless defined $offset;

    my $path_to_invindex = '';
    my $base_url         = '/us_constitution';

    ### specify Analyzer
    my $analyzer = KinoSearch::Analysis::PolyAnalyzer-&gt;new(
        language =&gt; 'en',
    );

    ### Searcher object.
    my $searcher = KinoSearch::Searcher-&gt;new(
        invindex =&gt; $path_to_invindex,
        analyzer =&gt; $analyzer,
    );

    ### query
    my $hits = $searcher-&gt;search($q);

    my $highlighter = KinoSearch::Highlight::Highlighter-&gt;new(
        excerpt_field =&gt; 'bodytext' );
    $hits-&gt;create_excerpts( highlighter =&gt; $highlighter );

    $hits-&gt;seek( $offset, $hits_per_page );

    # create result list
    my $report = '';
    while ( my $hit = $hits-&gt;fetch_hit_hashref ) {
        my $score = sprintf( "%0.3f", $hit-&gt;{score} );
        $report .= qq|
            &lt;p&gt;
                &lt;a href="$hit-&gt;{url}"&gt;&lt;strong&gt;$hit-&gt;{title}&lt;/strong&gt;&lt;/a&gt;
                &lt;em&gt;$score&lt;/em&gt;
                &lt;br&gt;
                $hit-&gt;{excerpt}
                &lt;br&gt;
                &lt;span class="excerptURL"&gt;$hit-&gt;{url}&lt;/span&gt;
            &lt;/p&gt;
            |;
    }

    $q = CGI::escapeHTML($q);

    my $total_hits = $hits-&gt;total_hits;
    my $num_hits_info;
    if ( !length $q ) {
        # no query, no display
        $num_hits_info = '';
    }
    elsif ( $total_hits == 0 ) {
     $num_hits_info = qq|&lt;p&gt;No matches for &lt;strong&gt;$q&lt;/strong&gt;&lt;/p&gt;|;
    }
    else {
           my $last_result = min( ( $offset + $hits_per_page ), $total_hits );
        my $first_result = min( ( $offset + 1 ), $last_result );

        $num_hits_info = qq|
            &lt;p&gt;
                Results &lt;strong&gt;$first_result-$last_result&lt;/strong&gt;
                of &lt;strong&gt;$total_hits&lt;/strong&gt; for &lt;strong&gt;$q&lt;/strong&gt;.
            &lt;/p&gt;
            &lt;p&gt;
                Results Page:
            |;

        my $current_page = int( $first_result / $hits_per_page ) + 1;
        my $last_page    = ceil( $total_hits / $hits_per_page );
        my $first_page   = max( 1, ( $current_page - 9 ) );
        $last_page = min( $last_page, ( $current_page + 10 ) );

        my $href = $cgi-&gt;url( -relative =&gt; 1 ) . "?" . $cgi-&gt;query_string;
        $href .= ";offset=0" unless $href =~ /offset=/;

        for my $page_num ( $first_page .. $last_page ) {
            if ( $page_num == $current_page ) {
                $num_hits_info .= qq|$page_num \n|;
            }
            else {
                my $new_offset = ( $page_num - 1 ) * $hits_per_page;
                $href =~ s/(?&lt;=offset=)\d+/$new_offset/;
                $num_hits_info .= qq|&lt;a href="$href"&gt;$page_num&lt;/a&gt;\n|;
            }
        }

        $num_hits_info .= "&lt;/p&gt;\n";
    }

    print "Content-type: text/html\n\n";
    print &lt;&lt;END_HTML;
    &lt;!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
        "<a class="podlinkurl" href="http://www.w3.org/TR/html4/loose.dtd">http://www.w3.org/TR/html4/loose.dtd</a>"&gt;
    &lt;html&gt;
    &lt;head&gt;
        &lt;meta http-equiv="Content-type"
            content="text/html;charset=ISO-8859-1"&gt;
        &lt;link rel="stylesheet" type="text/css" href="$base_url/uscon.css"&gt;
        &lt;title&gt;KinoSearch: $q&lt;/title&gt;
    &lt;/head&gt;

    &lt;body&gt;

        &lt;div id="navigation"&gt;
            &lt;form id="usconSearch" action=""&gt;
                &lt;strong&gt;
                Search the &lt;a href="$base_url/index.html"&gt;US Constitution&lt;/a&gt;:
                &lt;/strong&gt;
                &lt;input type="text" name="q" id="q" value="$q"&gt;
                &lt;input type="submit" value="=&amp;gt;"&gt;
                &lt;input type="hidden" name="offset" value="0"&gt;
            &lt;/form&gt;
        &lt;/div&gt;&lt;!--navigation--&gt;

        &lt;div id="bodytext"&gt;

        $report

        $num_hits_info

        &lt;p style="font-size: smaller; color: #666"&gt;
            &lt;em&gt;Powered by
                &lt;a href="<a class="podlinkurl" href="http://www.rectangular.com/kinosearch/">http://www.rectangular.com/kinosearch/</a>"&gt;
                    KinoSearch
                &lt;/a&gt;
            &lt;/em&gt;
        &lt;/p&gt;
        &lt;/div&gt;&lt;!--bodytext--&gt;

    &lt;/body&gt;

    &lt;/html&gt;
    END_HTML
</pre>
<p>Details of this code are available <a href="http://search.cpan.org/~creamyg/KinoSearch-0.162/lib/KinoSearch/Docs/Tutorial.pod">here</a></p>
<p>Once the testing was over, I compared the both of them.</p>
<p>The first criteria checked was speed; the tsearch engine I built performed much faster performed much faster.<br />
I wrote a script to populate the tsvector tables with the US constitution as the document, which was my test case while using KinoSearch. I checked both the searches for the same phrases for both the searches and KinoSearch returned detailed results faster. Though for elementary phrases tsearch performed really well. And a more sophisticated search engine might do a better job.</p>
<p>Kino Search uses an invindex for the document body and the search cases are checked for using the invindex.</p>
<p>There are a list of additional features of KinoSearch mentioned on that page. I cite them here anyway,</p>
<ul>
<li>Incremental indexing (addition/deletion of documents to/from an existing index).</li>
<li>Full support for 12 Indo-European languages.</li>
<li>Support for boolean operators AND, OR, and AND NOT; parenthetical groupings, and prepended +plus and -minus</li>
<li>Algorithmic selection of relevant excerpts and highlighting of search terms within excerpts</li>
<li>Highly customizable query and indexing APIs</li>
</ul>
<p>Building the invindex for the document in question was much easier when I used the Kino Engine.<br />
The code too was more understandable and in Perl.</p>
<p>Please check this report and let me know. The question to be decided now is whether or not we intend to support mysql. If not implementing tsearch2 would be much easier as detailed in the original application.<br />
Implementing KinoSearch would mean more changes to the installer script and bringing in unwanted dependencies.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/perlswirl.wordpress.com/7/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/perlswirl.wordpress.com/7/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/perlswirl.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/perlswirl.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/perlswirl.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/perlswirl.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/perlswirl.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/perlswirl.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/perlswirl.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/perlswirl.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/perlswirl.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/perlswirl.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/perlswirl.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/perlswirl.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/perlswirl.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/perlswirl.wordpress.com/7/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=perlswirl.wordpress.com&amp;blog=3728920&amp;post=7&amp;subd=perlswirl&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://perlswirl.wordpress.com/2008/05/19/kinosearch-vs-tsearch2-contd/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/e06142390bd11bed85e1e0e21b75ce59?s=96&#38;d=identicon" medium="image">
			<media:title type="html">unni</media:title>
		</media:content>
	</item>
		<item>
		<title>KinoSearch vs TSearch2</title>
		<link>http://perlswirl.wordpress.com/2008/05/17/kinosearch-vs-tsearch2/</link>
		<comments>http://perlswirl.wordpress.com/2008/05/17/kinosearch-vs-tsearch2/#comments</comments>
		<pubDate>Sat, 17 May 2008 20:42:40 +0000</pubDate>
		<dc:creator>unni</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://perlswirl.wordpress.com/?p=5</guid>
		<description><![CDATA[Report 1: KinoSearch vs Tsearch2 The original application submitted to Google was to incorporate tsearch2 to the existing search API of Bricolage. KinoSearch seems to be another option. There has been a discussion on which of these searches would be &#8230; <a href="http://perlswirl.wordpress.com/2008/05/17/kinosearch-vs-tsearch2/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=perlswirl.wordpress.com&amp;blog=3728920&amp;post=5&amp;subd=perlswirl&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Report 1:</p>
<p>KinoSearch vs Tsearch2</p>
<p>The original application submitted to Google was to incorporate tsearch2 to the existing search API of Bricolage. KinoSearch seems to be another option. There has been a discussion on which of these searches would be better suited for the purpose of integrating full text search into Bricolage.<br />
I read up on both KinoSearch and Tsearch2.</p>
<p>KinoSearch is currently at version 0.162<br />
and the file format may undergo changes in the future. This might lead to a crash of the application if these changes are not taken into consideration when an upgrade happens.</p>
<p>Tsearch2 is compatible only with postgresql and only for versions greater than or equal 8.3.</p>
<p>KinoSearch is admittedly faster and has support for incremental indexing.<br />
It can handle a large number of documents. Tsearch has limitations when it comes to handling large documents.</p>
<p>Tsearch uses a datatype called tsvector which represents a document, which holds a set of unique words and their positional information. But it can only hold up to 256 positions in a single document.</p>
<p>On the other hand KinoSearch is an adaptation of a completely Java based search engine (Lucene). I don&#8217;t know if that is really a problem, but it would be better to avoid dependencies that are uncalled for.</p>
<p>If implemented tsearch can provide stability. But the only hint of instability on using Kinosearch would be if the file format undergoes a change and the alpha version (which is the current KinoSearch version) is upgraded. Though this is probable, it is not something that cannot be handled; The new file format can be checked and handled when we upgrade to a higher version of Kinosearch.</p>
<p>Both KinoSearch and Tsearch have communities that can provide enough support. Tsearch has complete online updates and support. The online support for Kinosearch also serves our purpose. The documentation for both these project are extensive, though I found the Kinosearch to be a bit better, especially on the understandability.</p>
<p>KinoSearch is after all Perl based, and after an initail encounter with both tsearch and kinoseach, I am of the opinion KinoSearch might lend itself to easier integration. Also Tsearch does not provide support for mysql and depends heavily on postgresql, while Kinosearch has a core based on Java.</p>
<p>Kinosearch can provide more support than tsearch and requires less compromises from the users, like a mandatory upgrade of their postgresql versions. The KinoSearch API is much more tuned to customization and is more user-friendly.</p>
<p>Conclusion: I think we should go with KinoSearch</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/perlswirl.wordpress.com/5/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/perlswirl.wordpress.com/5/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/perlswirl.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/perlswirl.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/perlswirl.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/perlswirl.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/perlswirl.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/perlswirl.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/perlswirl.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/perlswirl.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/perlswirl.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/perlswirl.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/perlswirl.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/perlswirl.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/perlswirl.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/perlswirl.wordpress.com/5/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=perlswirl.wordpress.com&amp;blog=3728920&amp;post=5&amp;subd=perlswirl&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://perlswirl.wordpress.com/2008/05/17/kinosearch-vs-tsearch2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/e06142390bd11bed85e1e0e21b75ce59?s=96&#38;d=identicon" medium="image">
			<media:title type="html">unni</media:title>
		</media:content>
	</item>
		<item>
		<title>Full Text Search implementation for Bricolage</title>
		<link>http://perlswirl.wordpress.com/2008/05/17/full-text-search-implementation-for-bricolage/</link>
		<comments>http://perlswirl.wordpress.com/2008/05/17/full-text-search-implementation-for-bricolage/#comments</comments>
		<pubDate>Sat, 17 May 2008 20:40:58 +0000</pubDate>
		<dc:creator>unni</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://perlswirl.wordpress.com/?p=4</guid>
		<description><![CDATA[This is my Google Summer of Code project. My original Application to Google. Proposal: Add full text-search support to Bricolage. Postgresql versions 8.3 and above have built-in full text search support. Support for this utility will be added to Bricolage. &#8230; <a href="http://perlswirl.wordpress.com/2008/05/17/full-text-search-implementation-for-bricolage/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=perlswirl.wordpress.com&amp;blog=3728920&amp;post=4&amp;subd=perlswirl&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This is my Google Summer of Code project.</p>
<p>My original Application to Google.<br />
Proposal: Add full text-search support to Bricolage.</p>
<p>Postgresql versions 8.3 and above have built-in full text search support.<br />
Support for this utility will be added to Bricolage.</p>
<p>Synopsis:</p>
<p>The aim of this project is to include full text-search support to the content mangement system, Bricolage.<br />
Bricolage already has search capabilities. A full text-search will be integrated to the already existent<br />
search API.</p>
<p>To implement this four major requirements will have to be fulfilled:</p>
<p>A tsearch index will have to be generated for the data placed in each table.<br />
An additional field will have to be added holding the tsearch index corresponding<br />
to the data it holds. A row-level trigger function to generate the tsearch index and add it to<br />
each table will have to be written. I intend to begin with adding full text search to objects like<br />
stories, media and templates.</p>
<p>Next a method will have to be added to the search API which facilitates recognition of<br />
the tsearch index column that has been added to each table so that the complete<br />
text that is to be searched for can be recognized within any table simply using<br />
the tsearch index.</p>
<p>A UI widget already exists in Bricolage. The UI widget will have to be adapted<br />
to the modified search API. Callbacks already exist for all the basic functionalities<br />
of a search UI. A controller function will be used to utilize these functios and add<br />
additional full text search feature. The search will required an updated set of fields<br />
for which the search has to be executed. The name field should now also be able to take<br />
in full text strings. When the results are found, the contexts of the corresponding<br />
results will also need to be displayed</p>
<p>Since Bricolage currently boasts an array of SOAP integrated APIs, the modified<br />
search API, or rather the full-text search API, will naturally be integrated too.<br />
So the API should be further appended to hold functions to code the users&#8217; search<br />
request into XML and send it across to the server and a function that recieves the<br />
result from the server and decodes it before displaying it the user.</p>
<p>The project will support only postgresql 8.3 upwards, as the full text search is<br />
natively supported only by postgresql 8.3 upwards. So there will have to be a provision<br />
to disable the full text search when lower versions are used. The installer code will<br />
have to be modified to recognize the RDBMS and the version, so that we can disable the<br />
full text search option for lower versions at install time. The users can modify this<br />
setting if they upgrade. A migration script will be written which can be run to generate<br />
the tsearch index and populate the relevant tables so that the users can enable full<br />
text search once they upgrade. The script will function in a similar manner to the trigger<br />
function</p>
<p>Proposed Road Map:</p>
<p>milestone: Start project</p>
<p>&lt;Day 1&gt;.Begin Research on implementation of the full text search and the current<br />
state of the search API in the Bricolage System</p>
<p>&lt;Day 4&gt; Begin Designing the changes to the table structure and the changes to the<br />
search API</p>
<p>milestone: Submission of initial design and specification.</p>
<p>&lt;Day 7&gt; The design of planned additions and changes and their respective changes will<br />
be submitted.</p>
<p>Coding begins</p>
<p>milestone: Function for incrementing tsearch index</p>
<p>&lt;Day 15&gt; Submission of code that modifies the tables holding data to append a generated<br />
tsearch index for corresponding data.</p>
<p>testing and verification of given code.</p>
<p>milestone: Modification of the Search API</p>
<p>&lt;Day 23&gt; Submission of modifications and additions to the existing search API</p>
<p>testing and verification of submitted changes</p>
<p>milestone: Integration to the UI widget</p>
<p>&lt;Day 28&gt; The Search UI widget is integrated with the modified API</p>
<p>&lt;Day 30&gt; The UI widget is modified to hold the full text search results</p>
<p>testing and verification</p>
<p>milestone: SOAP Integration</p>
<p>&lt;Day 40&gt; The modified search API will be integrated with SOAP</p>
<p>testing and verification</p>
<p>milestone: Disabling FT Search for versions &lt; postgresql 8.3</p>
<p>&lt;Day 45&gt; Provision to disable the feature if a version of postgresql less than 8.3 is used</p>
<p>milestone: Final Project Submission</p>
<p>&lt;Day 50&gt; The Final testing and verification process will come to an end and<br />
the full text search feature will be fully enabled.</p>
<p>About Me:</p>
<p>I, Krishnanunni P.N., am a sixth semester student of Computer Science and Engineering from<br />
Kerala, India. The last two years have seen me as an active promoter and member of the<br />
Free and Open Source Software Community.<br />
I am an avid Perl, Python and C programmer. I am well versed in Lisp,Java and x86 assembly<br />
languages.<br />
Latest Projects :<br />
- A PC suite for mobile phones using the Symbian 60 OS.<br />
the connecting medium being bluetooth.</p>
<p>- The development of an IDE that facilitates easy program development<br />
and debugging for various microcontrollers.</p>
<p>################################################################################################</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/perlswirl.wordpress.com/4/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/perlswirl.wordpress.com/4/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/perlswirl.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/perlswirl.wordpress.com/4/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/perlswirl.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/perlswirl.wordpress.com/4/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/perlswirl.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/perlswirl.wordpress.com/4/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/perlswirl.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/perlswirl.wordpress.com/4/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/perlswirl.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/perlswirl.wordpress.com/4/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/perlswirl.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/perlswirl.wordpress.com/4/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/perlswirl.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/perlswirl.wordpress.com/4/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=perlswirl.wordpress.com&amp;blog=3728920&amp;post=4&amp;subd=perlswirl&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://perlswirl.wordpress.com/2008/05/17/full-text-search-implementation-for-bricolage/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/e06142390bd11bed85e1e0e21b75ce59?s=96&#38;d=identicon" medium="image">
			<media:title type="html">unni</media:title>
		</media:content>
	</item>
		<item>
		<title>Howdy world !&#8230;&#8230;.</title>
		<link>http://perlswirl.wordpress.com/2008/05/14/hello-world/</link>
		<comments>http://perlswirl.wordpress.com/2008/05/14/hello-world/#comments</comments>
		<pubDate>Wed, 14 May 2008 21:08:45 +0000</pubDate>
		<dc:creator>unni</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[Perl makes easy Jobs easy, without making the hard jobs impossible Well writing small hacks in Perl is a breeze. Though to be frank I must admit; I find the larger programs a bit of a hazzle. I applied for &#8230; <a href="http://perlswirl.wordpress.com/2008/05/14/hello-world/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=perlswirl.wordpress.com&amp;blog=3728920&amp;post=1&amp;subd=perlswirl&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<blockquote><p>Perl makes easy Jobs easy, without making the hard jobs impossible</p></blockquote>
<p>Well writing small hacks in Perl is a breeze. Though to be frank I must admit; I find the larger programs a bit of a hazzle.</p>
<p>I applied for the<a href="http://code.google.com/soc/2008/"> Google Summer of Code</a> this year.</p>
<p>And <a href="http://code.google.com/soc/2008/perl/appinfo.html?csaid=43EC72DAED270028">my application</a> to implement full text search for <a href="http://www.bricolage.cc/">Bricolage</a> got accepted.</p>
<p>Hence this blog, which shall initially contain all my Summer of Code work and whatever I find fit to put in (related to Perl, of course )</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/perlswirl.wordpress.com/1/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/perlswirl.wordpress.com/1/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/perlswirl.wordpress.com/1/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/perlswirl.wordpress.com/1/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/perlswirl.wordpress.com/1/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/perlswirl.wordpress.com/1/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/perlswirl.wordpress.com/1/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/perlswirl.wordpress.com/1/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/perlswirl.wordpress.com/1/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/perlswirl.wordpress.com/1/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/perlswirl.wordpress.com/1/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/perlswirl.wordpress.com/1/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/perlswirl.wordpress.com/1/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/perlswirl.wordpress.com/1/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/perlswirl.wordpress.com/1/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/perlswirl.wordpress.com/1/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=perlswirl.wordpress.com&amp;blog=3728920&amp;post=1&amp;subd=perlswirl&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://perlswirl.wordpress.com/2008/05/14/hello-world/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/e06142390bd11bed85e1e0e21b75ce59?s=96&#38;d=identicon" medium="image">
			<media:title type="html">unni</media:title>
		</media:content>
	</item>
	</channel>
</rss>
