<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>yaw angle</title>
	<atom:link href="http://vinoduec.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://vinoduec.wordpress.com</link>
	<description>The right direction..</description>
	<lastBuildDate>Fri, 23 Feb 2007 04:35:55 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='vinoduec.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>yaw angle</title>
		<link>http://vinoduec.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://vinoduec.wordpress.com/osd.xml" title="yaw angle" />
	<atom:link rel='hub' href='http://vinoduec.wordpress.com/?pushpress=hub'/>
		<item>
		<title>cat data-extraction.techniques &#124; more</title>
		<link>http://vinoduec.wordpress.com/2007/02/23/cat-data-extractiontechniques-more/</link>
		<comments>http://vinoduec.wordpress.com/2007/02/23/cat-data-extractiontechniques-more/#comments</comments>
		<pubDate>Fri, 23 Feb 2007 04:35:55 +0000</pubDate>
		<dc:creator>lordoftheflame</dc:creator>
				<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://vinoduec.wordpress.com/2007/02/23/cat-data-extractiontechniques-more/</guid>
		<description><![CDATA[After my post on the tools and methods for screen scraping, I was on the wild internet again to find some more interesting and useful tools, given the travails I&#8217;ve undergone through to obtain a machine readable format of the ICD-9- CM codes. I still wonder why, even in this age, people think only of [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vinoduec.wordpress.com&amp;blog=342071&amp;post=11&amp;subd=vinoduec&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>After my post on the tools and methods for screen scraping, I was on the wild internet again to find some more interesting and useful tools, given the travails I&#8217;ve undergone through to obtain a machine readable format of the ICD-9- CM codes. I still wonder why, even in this age, people think only of themselves and don&#8217;t even give a dime to the stupid machines working that hard to make our life simpler. Any ways, here we go :</p>
<ul>
<li>No prize for guessing this : <a href="http://en.wikipedia.org/wiki/Screen_scraping" title="screen scraping at wikipedia" target="_blank">screen_scraping at wikipedia</a></li>
<li>Which takes me through along some more threads
<ul>
<li><a href="http://blog.screen-scraper.com" title="screen-scraper" target="_blank">The screen-scrapeable blog,</a> in particular <a href="http://blog.screen-scraper.com/2006/03/21/three-common-methods-for-data-extraction/" title="screen-scrapeable" target="_blank">this</a> post</li>
<li><a href="http://www.perl.com/pub/a/2003/01/22/mechanize.html" title="perl screen scraping" target="_blank">Screen-scraping in perl</a> (!, I cannot expect more given that I like to &#8216;speak&#8217; in perl)</li>
<li>And <a href="http://www.iopus.com/imacros/tutorials/java.htm" title="java screen-scraping">java</a></li>
<li>I digress a lot, see <a href="http://en.wikipedia.org/wiki/Web_mining" title="wikipedia web-mining" target="_blank">web-mining</a> and the threads from there.<a href="http://en.wikipedia.org/wiki/Web_mining" title="wikipedia web-mining" target="_blank"><br />
</a></li>
</ul>
</li>
</ul>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/vinoduec.wordpress.com/11/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/vinoduec.wordpress.com/11/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/vinoduec.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/vinoduec.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/vinoduec.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/vinoduec.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/vinoduec.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/vinoduec.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/vinoduec.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/vinoduec.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/vinoduec.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/vinoduec.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/vinoduec.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/vinoduec.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/vinoduec.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/vinoduec.wordpress.com/11/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vinoduec.wordpress.com&amp;blog=342071&amp;post=11&amp;subd=vinoduec&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://vinoduec.wordpress.com/2007/02/23/cat-data-extractiontechniques-more/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/58fa1ab664d113046c6e3df992f811bf?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">lordoftheflame</media:title>
		</media:content>
	</item>
		<item>
		<title>Accoutrements for document processing / text extraction</title>
		<link>http://vinoduec.wordpress.com/2007/02/19/accoutrements-for-document-processing-text-extraction/</link>
		<comments>http://vinoduec.wordpress.com/2007/02/19/accoutrements-for-document-processing-text-extraction/#comments</comments>
		<pubDate>Mon, 19 Feb 2007 19:52:42 +0000</pubDate>
		<dc:creator>lordoftheflame</dc:creator>
				<category><![CDATA[Blogroll]]></category>

		<guid isPermaLink="false">http://vinoduec.wordpress.com/2007/02/19/accoutrements-for-document-processing-text-extraction/</guid>
		<description><![CDATA[Here are some tools and things one should(read: I should) never forget while trying to extract text from documents of all kinds and file formats. Here they go : The survival armour: &#8220;Lynx -dump&#8221; : The first phase of screen-scraping . Nothing else comes before this. The one that comes to mind next to this [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vinoduec.wordpress.com&amp;blog=342071&amp;post=10&amp;subd=vinoduec&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Here are some tools and things one should(read: I should) never forget while trying to extract text from documents of all kinds and file formats. Here they go :</p>
<p>The survival armour:</p>
<ul>
<li>&#8220;<em><strong>Lynx -dump</strong></em>&#8221; : The first phase of <a href="http://en.wikipedia.org/wiki/Screen_scraping" title="Screen_scraping" target="_blank">screen-scraping</a> . Nothing else comes before this. The one that comes to mind next to this is to convert the given file(of arbitrary file format) to text format and start ascending the hill from there.</li>
<li>Convert to <em><strong>XML</strong></em> : Open office has excellent format to convert several kinds of formats into XML standard documents. Open office rocks, except for its memory constraints while<img src="http://www.gnu.org/graphics/badvista-trash.png" alt="vista, windows to trash" align="right" border="1" height="99" width="57" /> starting up.</li>
</ul>
<p>Windows? You will disappear to antiquity. Need Suggestion? <a href="http://www.gnu.org/" title="GNU is not UNIX" target="_blank">GNU homepage</a> will tell you.</p>
<ul>
<li><em><strong>HTML Parser</strong></em> : See <a href="http://htmlparser.sourceforge.net/" title="html parser" target="_blank">sourceforge</a>.</li>
<li><em><strong>XML Parsing</strong></em> : All kinds of Parsers are available in all flavours, in all languages. But when it comes to handling huge files, go for <a href="http://en.wikipedia.org/wiki/SAX_Parsing" title="SAX Parsing" target="_blank">SAX parsing</a>, I prefer JAVA.</li>
<li><em><strong>Body Text Extraction</strong></em> : <a href="http://www.aidanf.net/software/bte-body-text-extraction" title="Body Text Extraction" target="_blank">This</a> is by far the best script (sorry.. program) I&#8217;ve ever known to extract body(in its real sense) from any HTML page. It does have performance problems, it cannot be used per se for real time extraction, a minor modification involving dynamic programming will make it ready for the race.</li>
<li>Along with that, we have up our sleeve <em><strong>the divine editor &#8220;Vi&#8221;</strong></em>, the <em><strong>swiss-army-knife &#8220;sed&#8221;</strong></em>, the dark horse <em><strong>&#8220;Grep&#8221;</strong></em> along with <em><strong>&#8220;tr&#8221;</strong></em> and many of its friends from GNU, to solve much of the problems that seem to trouble us.</li>
<li>Anything beyond that, the <em><strong>God &#8220;Perl&#8221;</strong></em> takes over the reins. The moment you get tired of any more hacks, <em><strong>Java</strong></em> evinces its importance. The story should end there, otherwise you are making a mistake, somewhere, huge.</li>
</ul>
<p>That&#8217;s probably not everything. I can&#8217;t remember others right away, probably because of half-interest in posting this and probably because of the sound sleep that&#8217;s taking over. Good night.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/vinoduec.wordpress.com/10/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/vinoduec.wordpress.com/10/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/vinoduec.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/vinoduec.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/vinoduec.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/vinoduec.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/vinoduec.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/vinoduec.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/vinoduec.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/vinoduec.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/vinoduec.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/vinoduec.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/vinoduec.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/vinoduec.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/vinoduec.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/vinoduec.wordpress.com/10/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vinoduec.wordpress.com&amp;blog=342071&amp;post=10&amp;subd=vinoduec&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://vinoduec.wordpress.com/2007/02/19/accoutrements-for-document-processing-text-extraction/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/58fa1ab664d113046c6e3df992f811bf?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">lordoftheflame</media:title>
		</media:content>

		<media:content url="http://www.gnu.org/graphics/badvista-trash.png" medium="image">
			<media:title type="html">vista, windows to trash</media:title>
		</media:content>
	</item>
		<item>
		<title>Challenge</title>
		<link>http://vinoduec.wordpress.com/2007/02/17/challenge/</link>
		<comments>http://vinoduec.wordpress.com/2007/02/17/challenge/#comments</comments>
		<pubDate>Sat, 17 Feb 2007 19:16:18 +0000</pubDate>
		<dc:creator>lordoftheflame</dc:creator>
				<category><![CDATA[Blogroll]]></category>
		<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://vinoduec.wordpress.com/2007/02/17/challenge/</guid>
		<description><![CDATA[After that TS(test series, name sake), I am participating (read: working towards participation/ planning to participate) in the International Challenge: Classifying Clinical Free Text Using Natural Language Processing that involves assignment of ICD-9-CM codes to clinical free text. Hope, it would be decent effort, if not outright success.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vinoduec.wordpress.com&amp;blog=342071&amp;post=8&amp;subd=vinoduec&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>After that TS(test series, name sake), I am participating (read: working towards participation/ planning to participate) in the <a href="http://www.computationalmedicine.org/challenge/index.php">International Challenge</a>: Classifying Clinical Free Text Using Natural Language Processing that involves assignment of ICD-9-CM codes to clinical free text.</p>
<p>Hope, it would be decent effort, if not outright success.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/vinoduec.wordpress.com/8/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/vinoduec.wordpress.com/8/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/vinoduec.wordpress.com/8/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/vinoduec.wordpress.com/8/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/vinoduec.wordpress.com/8/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/vinoduec.wordpress.com/8/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/vinoduec.wordpress.com/8/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/vinoduec.wordpress.com/8/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/vinoduec.wordpress.com/8/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/vinoduec.wordpress.com/8/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/vinoduec.wordpress.com/8/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/vinoduec.wordpress.com/8/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/vinoduec.wordpress.com/8/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/vinoduec.wordpress.com/8/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/vinoduec.wordpress.com/8/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/vinoduec.wordpress.com/8/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vinoduec.wordpress.com&amp;blog=342071&amp;post=8&amp;subd=vinoduec&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://vinoduec.wordpress.com/2007/02/17/challenge/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/58fa1ab664d113046c6e3df992f811bf?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">lordoftheflame</media:title>
		</media:content>
	</item>
		<item>
		<title>An idea</title>
		<link>http://vinoduec.wordpress.com/2007/02/08/7/</link>
		<comments>http://vinoduec.wordpress.com/2007/02/08/7/#comments</comments>
		<pubDate>Thu, 08 Feb 2007 17:48:31 +0000</pubDate>
		<dc:creator>lordoftheflame</dc:creator>
				<category><![CDATA[Blogroll]]></category>

		<guid isPermaLink="false">http://vinoduec.wordpress.com/2007/02/08/7/</guid>
		<description><![CDATA[Its been quite some time. Days back, I was trying to figure out ways to post to my blog even without logging into it, by email. Blogspot does give me this, but somehow I could clearly feel wp far better than blogspot, it lets me enjoy freedom, tons of it. But then, every time I [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vinoduec.wordpress.com&amp;blog=342071&amp;post=7&amp;subd=vinoduec&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Its been quite some time. Days back, I was trying to figure out ways to post to my blog even without logging into it, by email. Blogspot does give me this, but somehow I could clearly feel wp far better than blogspot, it lets me enjoy freedom, tons of it. But then, every time I try to login to wp, I quickly realise how busy it is, from its load time (don&#8217;t tell me its software is bad). Eventually I failed to figure out such tricks, forcing myself back again to posting through wp on-line.</p>
<p>And then the big idea! I don&#8217;t know for sure how many hours a day I spend online, but I am afraid, if I really start taking statistics, it would make me face some music from my parents. Leaving that aside, I often feel this.. Yes the web 2.0 way of social bookmarking is good. Pretty good. Equally good was the google psearch (personalized search, underline &#8216;was&#8217;, for me delicious is far better). But would it not be better, if a firefox plugin(I dont take the risk of suggesting new features for IE.. even after witnessing the &#8216;grand&#8217; release of Vista!) could <u><em><strong>observe me all the time and noting now nicely what links I am traversing and storing it in some cute data structure</strong></em></u>, so that when I return back trying to figure out what I was looking for and felt very interesting that day after the Math class? (that one seems to run quite long..sentence that is.). That would be one more aspect of personalizing search and the online experience. For eg, I see that <u><em><strong>it is helpful if a browser plugin notes down that I visited wikipedia main page, then to the ACL wiki page, and the from that page to the nlpers blog page and then (after realising this idea) visiting the wordpress login page to submit this post</strong></em></u> and so on.</p>
<p>If the current bookmarking without giving importance to the previous and later hyperlinks followed is to the bag of words approach of the state-of-art search engines, my idea would be mapped to something kind of discourse in natural language processing( twas a boring analogy perhaps).</p>
<p>And realising that my age old plans to contribute to the open source community with some decent project, here I sign off with a brand new exciting plan to jump into the open source software contribution, along with bolstering my research background. See ya.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/vinoduec.wordpress.com/7/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/vinoduec.wordpress.com/7/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/vinoduec.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/vinoduec.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/vinoduec.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/vinoduec.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/vinoduec.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/vinoduec.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/vinoduec.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/vinoduec.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/vinoduec.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/vinoduec.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/vinoduec.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/vinoduec.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/vinoduec.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/vinoduec.wordpress.com/7/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vinoduec.wordpress.com&amp;blog=342071&amp;post=7&amp;subd=vinoduec&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://vinoduec.wordpress.com/2007/02/08/7/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/58fa1ab664d113046c6e3df992f811bf?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">lordoftheflame</media:title>
		</media:content>
	</item>
		<item>
		<title>IR and language modelling</title>
		<link>http://vinoduec.wordpress.com/2007/01/18/ir-and-language-modelling/</link>
		<comments>http://vinoduec.wordpress.com/2007/01/18/ir-and-language-modelling/#comments</comments>
		<pubDate>Thu, 18 Jan 2007 07:08:01 +0000</pubDate>
		<dc:creator>lordoftheflame</dc:creator>
				<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://vinoduec.wordpress.com/2007/01/18/ir-and-language-modelling/</guid>
		<description><![CDATA[From now on I should better post all my notes on this category. Here is the first post in Research category. My interest in NLP,Question Answering urges me have a deep study of Information Retrieval as well. Here are some points that keep recurring (reference: wikipedia and the links from there) - Information Retrieval The [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vinoduec.wordpress.com&amp;blog=342071&amp;post=6&amp;subd=vinoduec&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>From now on I should better post all my notes on this category. Here is the first post in Research category.</p>
<p>My interest in NLP,Question Answering urges me have a deep study of Information Retrieval as well.  Here are some points that keep recurring (reference: wikipedia and the links from there) -</p>
<p><strong>Information Retrieval </strong></p>
<ul>
<li>The performance measures heavily rely on the collection of documents and the query for which the relevancy of the documents is known.</li>
<li>And they assume binary relevancy: the document is either relevant or completely irrelevant which is different from what we face in practice.</li>
<li>Precision, recall, F-measure, Fall-out and Average precision are measures for IR. Question answering needs a different kind of measures or modified versions of these.</li>
</ul>
<p><strong>Modelling the document for retrieval</strong></p>
<ul>
<li><em>Set-theoretic Models</em> represent documents by sets : boolean and fuzzy models.</li>
<li><em>Algebraic Models</em> represent documents and queries usually as vectors, matrices or tuples. Those vectors, matrices or tuples are transformed by the use of a finite number of algebraic operations to a one-dimensional similarity measurement : vector space model and oh.. the <em>latent semantic analysis</em>, I&#8217;ve read quite a bit about it, but could never fit it into the big picture. Things now seem to arrange themselves into the divine order.</li>
<li><em>Probabilistic Models:</em> And this my favourite, for reasons that remained unknown even to me tiil now. Its all probabilities down the lane which would probably justify my interest : <em>Bayesian inference</em> which I studied hard to gasp their points of applications and of course yes, the <em>language models</em> which kept haunting me,  just like LSA never fitting into the big picture.  <em>Conditional random fields</em> , I&#8217;ve heard, is the latest model of the town, and is involved in all of this.</li>
</ul>
<p><strong>Language modelling</strong></p>
<ul>
<li>A statistical language model assigns a <em>probability to a sequence of words P(w<sub>1..n</sub>)</em> by means of a probability distribution(wikipedia).</li>
<li>Estimating the probabilty of sequences can become difficult in corpora, in which phrasessentences can be arbitrarily long and hence some sequences are not observed during training of the language model (<span class="new">data sparseness problem</span>). For that reason these models are often approximated using smoothed <em>N-gram models</em>. or</li>
<li>Fits perfectly in speech recognition, in guessing the next few words.</li>
<li>And what does it signify when used in IR? A language model is then associated with a document in a collection. <em>With query Q as input, retrieved documents are ranked based on the probability that the document&#8217;s language model would generate the terms of the query, </em><em>P(Q|M<sub>d</sub>).</em></li>
</ul>
<p><span id="more-6"></span></p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/vinoduec.wordpress.com/6/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/vinoduec.wordpress.com/6/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/vinoduec.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/vinoduec.wordpress.com/6/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/vinoduec.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/vinoduec.wordpress.com/6/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/vinoduec.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/vinoduec.wordpress.com/6/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/vinoduec.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/vinoduec.wordpress.com/6/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/vinoduec.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/vinoduec.wordpress.com/6/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/vinoduec.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/vinoduec.wordpress.com/6/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/vinoduec.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/vinoduec.wordpress.com/6/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vinoduec.wordpress.com&amp;blog=342071&amp;post=6&amp;subd=vinoduec&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://vinoduec.wordpress.com/2007/01/18/ir-and-language-modelling/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/58fa1ab664d113046c6e3df992f811bf?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">lordoftheflame</media:title>
		</media:content>
	</item>
		<item>
		<title>My debute attempt</title>
		<link>http://vinoduec.wordpress.com/2006/08/18/my-debute-attempt/</link>
		<comments>http://vinoduec.wordpress.com/2006/08/18/my-debute-attempt/#comments</comments>
		<pubDate>Fri, 18 Aug 2006 18:06:05 +0000</pubDate>
		<dc:creator>lordoftheflame</dc:creator>
				<category><![CDATA[Blogroll]]></category>

		<guid isPermaLink="false">https://vinoduec.wordpress.com/2006/08/18/my-debute-attempt/</guid>
		<description><![CDATA[Here is my first academic work which can be dubbed a project(in my opinion). This is work done by me during summer of 06 at IIIT. Report<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vinoduec.wordpress.com&amp;blog=342071&amp;post=3&amp;subd=vinoduec&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Here is my first academic work which can be dubbed a project(in my opinion). This is work done by me during summer of 06 at IIIT.</p>
<p><a href="http://vinoduec.files.wordpress.com/2006/08/final_report.pdf" title="summer'06">Report</a></p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/vinoduec.wordpress.com/3/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/vinoduec.wordpress.com/3/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/vinoduec.wordpress.com/3/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/vinoduec.wordpress.com/3/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/vinoduec.wordpress.com/3/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/vinoduec.wordpress.com/3/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/vinoduec.wordpress.com/3/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/vinoduec.wordpress.com/3/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/vinoduec.wordpress.com/3/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/vinoduec.wordpress.com/3/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/vinoduec.wordpress.com/3/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/vinoduec.wordpress.com/3/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/vinoduec.wordpress.com/3/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/vinoduec.wordpress.com/3/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/vinoduec.wordpress.com/3/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/vinoduec.wordpress.com/3/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vinoduec.wordpress.com&amp;blog=342071&amp;post=3&amp;subd=vinoduec&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://vinoduec.wordpress.com/2006/08/18/my-debute-attempt/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/58fa1ab664d113046c6e3df992f811bf?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">lordoftheflame</media:title>
		</media:content>
	</item>
		<item>
		<title>Hello world!</title>
		<link>http://vinoduec.wordpress.com/2006/08/06/hello-world/</link>
		<comments>http://vinoduec.wordpress.com/2006/08/06/hello-world/#comments</comments>
		<pubDate>Sun, 06 Aug 2006 18:48:36 +0000</pubDate>
		<dc:creator>lordoftheflame</dc:creator>
				<category><![CDATA[Blogroll]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[This is a record of my drop-in-an-ocean contributions for bettering of this world.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vinoduec.wordpress.com&amp;blog=342071&amp;post=1&amp;subd=vinoduec&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This is a record of my drop-in-an-ocean contributions for bettering of this world.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/vinoduec.wordpress.com/1/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/vinoduec.wordpress.com/1/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/vinoduec.wordpress.com/1/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/vinoduec.wordpress.com/1/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/vinoduec.wordpress.com/1/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/vinoduec.wordpress.com/1/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/vinoduec.wordpress.com/1/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/vinoduec.wordpress.com/1/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/vinoduec.wordpress.com/1/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/vinoduec.wordpress.com/1/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/vinoduec.wordpress.com/1/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/vinoduec.wordpress.com/1/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/vinoduec.wordpress.com/1/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/vinoduec.wordpress.com/1/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/vinoduec.wordpress.com/1/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/vinoduec.wordpress.com/1/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vinoduec.wordpress.com&amp;blog=342071&amp;post=1&amp;subd=vinoduec&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://vinoduec.wordpress.com/2006/08/06/hello-world/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/58fa1ab664d113046c6e3df992f811bf?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">lordoftheflame</media:title>
		</media:content>
	</item>
	</channel>
</rss>
