Off topic: A New Twist in Word Comparisons
থ্রেড পোস্টার: Vito Smolej
Vito Smolej
Vito Smolej
জার্মানি
Local time: 00:04
2004 থেকে সদস্য
‍ইংরেজি থেকে স্লোভেনিয়ান
+ ...
SITE LOCALIZER
Mar 11, 2009

From Science / Random Samples vol 323

Sung-Hou Kim, a chemist at the University of California, Berkeley, usually explores gene and protein relationships. Now, Kim and colleagues are using similar algorithms to probe literary ties.

Books, genes, and proteins can all be represented as strings of letters, which Kirn's software analyzes to tease out underlying patterns. It's filed the Koran with other religious texts rather than with philosophical tracts as other literary c
... See more
From Science / Random Samples vol 323

Sung-Hou Kim, a chemist at the University of California, Berkeley, usually explores gene and protein relationships. Now, Kim and colleagues are using similar algorithms to probe literary ties.

Books, genes, and proteins can all be represented as strings of letters, which Kirn's software analyzes to tease out underlying patterns. It's filed the Koran with other religious texts rather than with philosophical tracts as other literary comparison programs often do. Recently, the team cast fresh doubt on whether Shakespeare penned Pericles, Prince of Tyre, in which some scholars have detected the Bard's hand.



Unlike programs that simply compare word frequencies, Kim's approach first strips the text of punctuation and spaces, transforming the book into a single string of letters. Their algorithm records the first eight letters in a string and then advances this "window" one letter and repeats. It then looks at the frequency with which two letters appear next to one another. "I'm just stunned" at how well it works, says Kim, who thinks that's because the eight-letter windows often span multiple words, thereby picking up common syntax patterns.

The team has also found that the software can classify evolutionary relationships among hundreds of viruses, a feat that conventional tools struggle with because viruses share so few genes in common. Next, they hope to adapt the technique to analyze everything from musical patterns to ancient languages.

[Edited at 2009-03-11 19:33 GMT]
Collapse


 
chica nueva
chica nueva
Local time: 12:04
চাইনিজ/চিনা থেকে ‍ইংরেজি
word'strings Mar 18, 2009

hello'vito:

how'are'you!i'quite'like'this'idea'because'it'saves'a'lot'of'space'and'uses'fewer'keys。what'do'you'think?

lai'an

[Edited at 2009-03-18 04:22 GMT]


 
Vito Smolej
Vito Smolej
জার্মানি
Local time: 00:04
2004 থেকে সদস্য
‍ইংরেজি থেকে স্লোভেনিয়ান
+ ...
TOPIC STARTER
SITE LOCALIZER
the metaphore is pretty straightforward Mar 18, 2009

The method looks at the text as the genomic material, i.e. as a sequence of codons, without externally enforced semantics. Anyhow - whatever - as long as it works (g).

Regards

Vito

[Edited at 2009-03-18 06:16 GMT]


 
chica nueva
chica nueva
Local time: 12:04
চাইনিজ/চিনা থেকে ‍ইংরেজি
externally enforced semantics/syntax: punctuation, spaces Mar 20, 2009

Hello Vito

Semantics/Syntax: Yes, you are right. AFAIK in the old days Chinese and Hebrew didn't use to have punctuation. Is that your understanding? Also, there are no spaces between words in some language scripts.

Codons: BTW what is your understanding of a codon? one letter, or a string of 8? I wonder why Kim chose eight letters. There are 26 letters in the English alphabet. What if he had used Chinese or Korean characters as codons ...

Applications: Wo
... See more
Hello Vito

Semantics/Syntax: Yes, you are right. AFAIK in the old days Chinese and Hebrew didn't use to have punctuation. Is that your understanding? Also, there are no spaces between words in some language scripts.

Codons: BTW what is your understanding of a codon? one letter, or a string of 8? I wonder why Kim chose eight letters. There are 26 letters in the English alphabet. What if he had used Chinese or Korean characters as codons ...

Applications: Would it work on prosody and metre I wonder? For example, I am not convinced of the distinction between stress-timed and syllable-timed languages. I wonder whether this method can verify the difference. To me (English being 'stress-timed', and Chinese so-called 'syllable-timed') they seem much the same (but that could just be me, imposing L1 patterns on L2)

Lesley

[Edited at 2009-03-20 03:40 GMT]
Collapse


 
Vito Smolej
Vito Smolej
জার্মানি
Local time: 00:04
2004 থেকে সদস্য
‍ইংরেজি থেকে স্লোভেনিয়ান
+ ...
TOPIC STARTER
SITE LOCALIZER
some explanations Mar 20, 2009

...Codons: BTW what is your understanding of a codon?

I meant genetic codons (three base pairs > one codon > the corresponding amino acid, or stop or nonsense, see http://en.wikipedia.org/wiki/Genetic_code)
I wonder why Kim chose eight letters.

My gut feeling is that it was just to speed up the analysis (subsampling of the complete text sequence.
...I am not convinced of the distinction between stress-timed and syllable-timed languages. I wonder whether this method can verify the difference.

Well, there's for sure one way to find this out (g).

Regards

Vito

[Edited at 2009-03-20 05:32 GMT]


 
chica nueva
chica nueva
Local time: 12:04
চাইনিজ/চিনা থেকে ‍ইংরেজি
text, fabric, literature, 'Pattern' Mar 24, 2009

文 wen in the 《文心雕龙》"The Literary Mind and the Carving of Dragons" can be translated as Pattern, sometimes interpreted as 'text' apparently.

Then again the word 'text', comes from Latin textus = fabric, pp. of texere = to weave.

Somehow, it doesn't surprise me that Kim is investigating patterns in literature in this way.

Link to post on Wen Xin Diao Long "The Literary Mind and the Carving of
... See more
文 wen in the 《文心雕龙》"The Literary Mind and the Carving of Dragons" can be translated as Pattern, sometimes interpreted as 'text' apparently.

Then again the word 'text', comes from Latin textus = fabric, pp. of texere = to weave.

Somehow, it doesn't surprise me that Kim is investigating patterns in literature in this way.

Link to post on Wen Xin Diao Long "The Literary Mind and the Carving of Dragons":
http://www.proz.com/forum/chinese/129977-question_about_tv_show_the_water_margin_水滸傳-page3.html#1087931

[Edited at 2009-03-24 22:16 GMT]
Collapse


 


To report site rules violations or get help, contact a site moderator:

এই ফোরামের মডারেটরগণ
Fernanda Rocha[Call to this topic]
Rita Pang[Call to this topic]
Simone Catania[Call to this topic]

You can also contact site staff by submitting a support request »

A New Twist in Word Comparisons






Pastey
Your smart companion app

Pastey is an innovative desktop application that bridges the gap between human expertise and artificial intelligence. With intuitive keyboard shortcuts, Pastey transforms your source text into AI-powered draft translations.

Find out more »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »