Off topic: A New Twist in Word Comparisons থ্রেড পোস্টার: Vito Smolej
| Vito Smolej জার্মানি Local time: 00:04 2004 থেকে সদস্য ইংরেজি থেকে স্লোভেনিয়ান + ... SITE LOCALIZER
From Science / Random Samples vol 323
Sung-Hou Kim, a chemist at the University of California, Berkeley, usually explores gene and protein relationships. Now, Kim and colleagues are using similar algorithms to probe literary ties.
Books, genes, and proteins can all be represented as strings of letters, which Kirn's software analyzes to tease out underlying patterns. It's filed the Koran with other religious texts rather than with philosophical tracts as other literary c... See more From Science / Random Samples vol 323
Sung-Hou Kim, a chemist at the University of California, Berkeley, usually explores gene and protein relationships. Now, Kim and colleagues are using similar algorithms to probe literary ties.
Books, genes, and proteins can all be represented as strings of letters, which Kirn's software analyzes to tease out underlying patterns. It's filed the Koran with other religious texts rather than with philosophical tracts as other literary comparison programs often do. Recently, the team cast fresh doubt on whether Shakespeare penned Pericles, Prince of Tyre, in which some scholars have detected the Bard's hand.
![](http://www.textnart.de/downloads/bard.jpg)
Unlike programs that simply compare word frequencies, Kim's approach first strips the text of punctuation and spaces, transforming the book into a single string of letters. Their algorithm records the first eight letters in a string and then advances this "window" one letter and repeats. It then looks at the frequency with which two letters appear next to one another. "I'm just stunned" at how well it works, says Kim, who thinks that's because the eight-letter windows often span multiple words, thereby picking up common syntax patterns.
The team has also found that the software can classify evolutionary relationships among hundreds of viruses, a feat that conventional tools struggle with because viruses share so few genes in common. Next, they hope to adapt the technique to analyze everything from musical patterns to ancient languages.
[Edited at 2009-03-11 19:33 GMT] ▲ Collapse | | | word'strings | Mar 18, 2009 |
hello'vito:
how'are'you!i'quite'like'this'idea'because'it'saves'a'lot'of'space'and'uses'fewer'keys。what'do'you'think?
lai'an
[Edited at 2009-03-18 04:22 GMT] | | | Vito Smolej জার্মানি Local time: 00:04 2004 থেকে সদস্য ইংরেজি থেকে স্লোভেনিয়ান + ... TOPIC STARTER SITE LOCALIZER the metaphore is pretty straightforward | Mar 18, 2009 |
The method looks at the text as the genomic material, i.e. as a sequence of codons, without externally enforced semantics. Anyhow - whatever - as long as it works (g).
Regards
Vito
[Edited at 2009-03-18 06:16 GMT] | | | externally enforced semantics/syntax: punctuation, spaces | Mar 20, 2009 |
Hello Vito
Semantics/Syntax: Yes, you are right. AFAIK in the old days Chinese and Hebrew didn't use to have punctuation. Is that your understanding? Also, there are no spaces between words in some language scripts.
Codons: BTW what is your understanding of a codon? one letter, or a string of 8? I wonder why Kim chose eight letters. There are 26 letters in the English alphabet. What if he had used Chinese or Korean characters as codons ...
Applications: Wo... See more Hello Vito
Semantics/Syntax: Yes, you are right. AFAIK in the old days Chinese and Hebrew didn't use to have punctuation. Is that your understanding? Also, there are no spaces between words in some language scripts.
Codons: BTW what is your understanding of a codon? one letter, or a string of 8? I wonder why Kim chose eight letters. There are 26 letters in the English alphabet. What if he had used Chinese or Korean characters as codons ...
Applications: Would it work on prosody and metre I wonder? For example, I am not convinced of the distinction between stress-timed and syllable-timed languages. I wonder whether this method can verify the difference. To me (English being 'stress-timed', and Chinese so-called 'syllable-timed') they seem much the same (but that could just be me, imposing L1 patterns on L2)
Lesley
[Edited at 2009-03-20 03:40 GMT] ▲ Collapse | |
|
|
Vito Smolej জার্মানি Local time: 00:04 2004 থেকে সদস্য ইংরেজি থেকে স্লোভেনিয়ান + ... TOPIC STARTER SITE LOCALIZER some explanations | Mar 20, 2009 |
...Codons: BTW what is your understanding of a codon?
I meant genetic codons (three base pairs > one codon > the corresponding amino acid, or stop or nonsense, see http://en.wikipedia.org/wiki/Genetic_code)
I wonder why Kim chose eight letters.
My gut feeling is that it was just to speed up the analysis (subsampling of the complete text sequence.
...I am not convinced of the distinction between stress-timed and syllable-timed languages. I wonder whether this method can verify the difference.
Well, there's for sure one way to find this out (g).
Regards
Vito
[Edited at 2009-03-20 05:32 GMT] | | | text, fabric, literature, 'Pattern' | Mar 24, 2009 |
文 wen in the 《文心雕龙》"The Literary Mind and the Carving of Dragons" can be translated as Pattern, sometimes interpreted as 'text' apparently.
Then again the word 'text', comes from Latin textus = fabric, pp. of texere = to weave.
Somehow, it doesn't surprise me that Kim is investigating patterns in literature in this way.
Link to post on Wen Xin Diao Long "The Literary Mind and the Carving of... See more 文 wen in the 《文心雕龙》"The Literary Mind and the Carving of Dragons" can be translated as Pattern, sometimes interpreted as 'text' apparently.
Then again the word 'text', comes from Latin textus = fabric, pp. of texere = to weave.
Somehow, it doesn't surprise me that Kim is investigating patterns in literature in this way.
Link to post on Wen Xin Diao Long "The Literary Mind and the Carving of Dragons":
http://www.proz.com/forum/chinese/129977-question_about_tv_show_the_water_margin_水滸傳-page3.html#1087931
[Edited at 2009-03-24 22:16 GMT] ▲ Collapse | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » A New Twist in Word Comparisons Pastey | Your smart companion app
Pastey is an innovative desktop application that bridges the gap between human expertise and artificial intelligence. With intuitive keyboard shortcuts, Pastey transforms your source text into AI-powered draft translations.
Find out more » |
| Protemos translation business management system | Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!
The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |