Welcome to Pittsburgh!
|
|
- Augusta Barker
- 5 years ago
- Views:
Transcription
1 Welcome to Pittsburgh!
2 Overview of the IWSLT 2005 Evaluation Campaign atthias Eck and Chiori Hori InterACT Carnegie ellon niversity
3 IWSLT2005 Evaluation campaign Working on the same test bed Training Corpus elease: ay 20,2005 Test Corpus elease: Aug 16,2005 esult submission Due: Aug 18, 2005 Camera-ready Paper: Sep 25, 2005 Technical paper Submission: July 25,2005
4 Translation target anual transcription (plain sentences in BTEC) AS output of spoken BTEC sentences Where would you like to go? Is there a discount for children? Did you have fun today? Sure. Can I have a receipt? I'd like to try some local wine. No discourse.
5 Scientific Question How well AS output could be translated in the face of recognition errors? How much T performance could be enhanced by considering multiple hypotheses? Which hypothesis can contribute for T performance?..
6 Translation target AS output of spoken BTEC sentences eal evaluation conditions: ead aloud speech of the BTEC No spontaneity The difference between text and AS output translations -> handling recognition errors Providing multiple hypotheses: N-best, lattice (HTK format)
7 Directions and source input Translation direction anual transcription AS output Chinese English Japanese English Arabic English Korean English English Chinese (Dr. Chen, NLP) Dr. Yamamoto (AT), () - () - () (r. Paulik, /KA)
8 Provided Data All Data from BTEC corpus 2 Development sets CSTA 2003 test sets: 506 sentences IWSLT 2004 test sets: 500 sentences Training Data 20K sentences Test Data 506 sentences
9 Data and tool restriction Supplied Supplied & Tools nrestricted C-STA IWSLT05 corpus Tagger/Chunker/Parser x Public data x x Proprietary data x x x
10 Detail of Data and tool restriction Supplied Data Track: - the supplied corpus only Supplied Data + Tools Track: - The training data is limited to the supplied corpus only - Parser/Chunker and Tagger tools is available nrestricted Data Track: - all publicly available data - the data crawled from the www C-STA Track: - no limitations on the linguistic resources - Full BTEC corpus and proprietary data
11 Participants - 17 institutions/16 groups Institution Systems Aachen niversity ITC - Center for Scientific and Technological esearch ITC-IST niversity of Edinburgh EDINBGH Nagaoka niversity of Technology NGKT niversity of Southern California Information Sciences Institute niversity of Tokyo TOKYO AT Spoken Language Communication esearch Labs AT-ALEPH AT-SL
12 Participants 19 translation systems Institution Systems IT/Lincoln Laboratory Airforce esearch Laboratory IT-LL/AFL National Laboratory of Pattern ecognition NLP Cyber Space Laboratories TALP esearch Center ram rase esearch icrosoft esearch ICOSOFT Carnegie ellon niversity Oki Electric Industry Co., Ltd. OKI Sehda Inc. SEHDA
13 Participants - Techniques 19 translation systems ST ST+Synta x EBT ET ram SEHDA TOKYO EDINBGH TALPphrase ICOSOFT AT-ALEPH NGKT OKI AT-SL ITC-IST ITLL/AFL NLP
14 Translation systems Techniques Country of Origin Italy, 1 China, 1 ET, 1 K, 1 EBT, 3 Germany, 1 ST + Syntax, 3 Japan, 7 (5 groups) Spain, 2 (1 group) ST, 12 SA, 6
15 System participation anual transcription Supplied 12 Supplied & Tools C-STA nrestricted Chinese English Japanese English Arabic English Korean English English Chinese
16 System participation AS output Supplied Supplied & Tools nrestricted Chinese English Japanese English English Chinese C-STA
17 esults an. Trans. atthias
18 BLE BLE Geometric mean of n-gram precision of hypothesis compared to the reference translation Length Penalty for short translations Benefits issing references can be covered by combining of other references Correlates well with Fluency Scores: 0 1 Problems e-combination of references could cause errors All words are equally important Weak correlation with Adequacy
19 NIST NIST Variant of BLE using arithmetic mean of weighted n-gram precision values Scores: Tt Tt Benefits Problems Considers information gain p to 9-grams, usually 5grams Good correlation with Adequacy 0 e-combination of references could cause errors Weak correlation with Fluency (human judgement)
20 mwe, mpe mwe Word Error ate on multiple references edit distance: hypothesis closest reference Scores: 0 1 mpe mwe without considering word order Benefits Correlates well with human judgement... Problems...if enough references are available
21 GT, ETEO GT Similarity between texts using unigram based F-measure ETEO Considers: Exact matches stem matches synonym matches (using WordNet) case insensitive Cannot be used on Chinese output (yet) Scores: werwerwerwerewr 0 1 houses (exact match) house (stem match) home (synonym match)
22 Automatic Evaluation Evaluation specification for English outputs Focus on speech-to-speech translation Punctuation marks and mixed casing less relevant Standard Evaluation case insensitive, all lowercase removed punctuation marks:.? removed - to split compounds!,:; but: Optional Evaluation case sensitive, mixed case separated punctuation marks only done if submitted data contained mixed case characters No numbers reported here please refer to overview paper
23 Automatic Evaluation Evaluation specification for Chinese outputs Evaluation 1 sing given (AS) segmentation emoved punctuation marks Evaluation 2 Character segmented Eliminates segmentation influence emoved punctuation marks Eval Server
24 Online Evaluation Server
25 Online Evaluation Server Language Pair Data Track File Further Comments
26 Evaluation Server Output
27 Evaluation Server Output ixed case!! Automatically detected Number of lines?
28 Evaluation Server Output (2)
29 Subjective Evaluation Fluency/Adequacy Fluency Adequacy 4 Flawless English 4 All information 3 Good English 3 ost information 2 Non-Native English 2 uch information 1 Disfluent English 1 Little information 0 Incomprehensible 0 None Typically used metrics Fluency/Adequacy (e.g. IWSLT 2004) Here: 0 4 instead of 1 5
30 Subjective Evaluation eaning aintenance eaning aintenance Adequacy 4 Exactly the same meaning 4 All information 3 Almost the same meaning 3 ost information 2 Partially the same meaning and no new information 2 uch information 1 Partially the same meaning but misleading information is introduced 1 Little information 0 Totally different meaning 0 None
31 Why eaning aintenance? Focus on comparing meaning of translation with source Degree of misleading information? 2 types of errors errors Obvious error no meaning change Translation is still useful Adequacy and eaning aintenance Score are similar Error changes meaning (e.g. negation) X Translation is not useful Adequacy grader might ignore change and judge only correct parts Prevented by focus on meaning
32 Subjective Evaluation procedure All translations shown at the same time andomly ordered Comparison among translations of the same sentence No explicit reference reference included in translations No bias by shown reference gives oracle score Source is shown for Adequacy and eaning aintenance scores 5 bilingual graders (scores shown are for 3 graders) First all Fluency scores, then Adequacy, finally eaning aintenance
33 Subjective Evaluation Tool - Fluency Part 1: Fluency
34 Subjective Evaluation Tool - Adequacy Part 2: Adequacy
35 Subjective Evaluation Tool ean. aint. Part 3: eaning aintenance eaning aintenance
36 Evaluation esults Human Evaluation was only done for most popular track Chinese English translation of manual transcription (T) Supplied Data Track 11 submissions for this track were evaluated +10% translations graded a second time by the same grader to measure inconsistencies
37 Human Evaluation esults Adequacy Fluency ean. aint. IT-LL/AFL 2.71 ITC-IST 3.15 IT-LL/AFL 2.63 ITC-IST ITC-IST 2.60 rase rase ram ram 2.44 EDINBGH 2.81 ram 2.40 EDINBGH 2.33 IT-LL/AFL 2.79 EDINBGH rase
38 Human Evaluation esults - Adequacy pper bound: reference performance Adequacy IS I SC C TT N ra ED m IN B G H AT -C 3 -n g IB TA LP e ra s -p h W TH -I ST IT C TA LP IT -L L /A F L 0.00 Significance
39 e TT -IS I SC C N ra ED m IN B G H AT -C 3 -n g IB ra s W TH -p h TA LP L -I ST /A F IT C -L L TA LP IT Adequacy Significance? Adequacy
40 e -IS I SC C TT Adequacy N ra ED m IN B G H AT -C 3 -n g IB ra s W TH -p h TA LP L -I ST /A F IT C -L L TA LP IT...and Fluency Fluency
41 ...and eaning aintenance Adequacy Fluency ean. aint IS I SC C TT N ra ED m IN B G H AT -C 3 -n g IB TA LP e ra s -p h W TH -I ST IT C TA LP IT -L L /A F L 0.00 Adequacy - Fluency
42 Analysis: Adequacy - Fluency 5 4 Fluency Adequacy 3 4 5
43 Analysis: Adequacy - Fluency Fluency >> Adequacy 5 4 Fluency Fluency ~ Adequacy Adequacy Fluency << Adequacy
44 Analysis: Adequacy - Fluency Fluency 3 (Good English) is rare 4: Flawless 5 4 Fluency : Non-native Adequacy Inconsistencies
45 Consistency? (Inter - Grader) How consistent are the scores assigned by the 3 graders? Average differences between grades: Adequacy Fluency ean. aint. G1-G G1-G G2-G AVG Agreement between all 3 graders for about 40% of sentences Agreement between 2 graders for about 60% of sentences
46 Consistency? (Intra Grader) How consistent are the scores assigned by each grader? (based on 10% sentences graded twice) Average differences between first and second grade: Adequacy Fluency ean. aint. Grader Grader Grader AVG
47 Do we need eaning aintenance? Difference to Adequacy is less than 2 in 91% of the cases High correlation with Adequacy (Pearson: 0.82) But low correlation for low scores (ean. aint. 0, 1) Avg. difference is 0.75, Pearson 0.20 For high scores (3,4): Avg. difference is 0.25 with Pearson: 0.65 Graders tend to use similar scores on good translations Differences on bad translations Lower grader inconsistency for eaning aintenance No additional score necessary ake graders aware of meaning in Adequacy scoring BLE Scores
48 IT - H -C N TT I 3 SC -I S AT IB C se LL /A F TA L LP -n gr am ph ra G W TH B P- IN TA L ED IT CI ST BLE: Chinese English T Supplied Data
49 IT - H -C N TT I 3 SC -I S AT IB C se LL /A F TA L LP -n gr am ph ra G W TH B P- IN TA L ED IT C -I ST BLE: Chinese English T Supplied Data
50 I H TT SC -I S C 3 se G N ph ra B P- m -C ng ra IB AT P- IN TA L ED TH LL /A F W L IT C -I ST IT - TA L NIST: Chinese English T Supplied Data
51 I H TT SC -I S C 3 se G N ph ra B P- m -C ng ra IB AT P- IN TA L ED TH LL /A F W L IT C -I ST IT - TA L NIST: Chinese English T Supplied Data
52 NT T AT C3 S CIS I C IB TA LP -n gr am W ED TH IN B G TA H LP -p hr as IT e -L L/ AF L IT CI ST mpe mpe TE
53 NT T AT C3 S CIS I mwe C IB TA LP -n gr am W ED TH IN B G TA H LP -p hr as IT e -L L/ AF L IT CI ST mwe, mpe mpe TE
54 Addtl. etric for Chinese English, T, Suppl. TE Translation Error ate Newly introduced metric: easure error as the minimum number of edits needed to change hypothesis so that it exactly matches one of the references TE = <# of edits> / <avg # of reference words> TE is calculated against best (closest) reference Edits include insertions, deletions, substitutions and shifts All edits count as 1 error (=edit distance) Shift moves a sequence of words within the hypothesis Shift of any sequence of words (any distance) is only 1 error 0 1
55 NT T AT C3 S CIS I mwe C IB TA LP -n gr am W ED TH IN B G TA H LP -p hr as IT e -L L/ AF L IT CI ST mwe, mpe mpe
56 TE NT T AT C3 S CIS I mpe C mwe IB TA LP -n gr am W ED TH IN B G TA H LP -p hr as IT e -L L/ AF L IT CI ST mwe, mpe and TE
57 Chinese English Supplied Data - ankings BLE NIST mwe mpe GT ETEO TE ITC-IST ITC-IST ITC-IST ITC-IST IT/AF IT/AF Adeq. ITC-IST IT/AF Fluency ean.. ITC-IST IT/AF IT/AF IT/AF ITC-IST ITC-IST ITC-IST ITC-IST IT/AF IT/AF IT/AF IT/AF
58 Chinese English Supplied Data - ankings BLE NIST mwe mpe GT ETEO TE ITC-IST ITC-IST ITC-IST ITC-IST IT/AF IT/AF Adeq. ITC-IST IT/AF Fluency ean.. ITC-IST IT/AF IT/AF IT/AF ITC-IST ITC-IST ITC-IST ITC-IST IT/AF IT/AF IT/AF IT/AF
59 Chinese English Supplied Data - ankings BLE NIST mwe mpe GT ETEO TE ITC-IST ITC-IST ITC-IST ITC-IST IT/AF IT/AF Adeq. ITC-IST IT/AF Fluency ean.. ITC-IST IT/AF IT/AF IT/AF ITC-IST ITC-IST ITC-IST ITC-IST IT/AF IT/AF IT/AF IT/AF
60 Chinese English Supplied Data - ankings BLE NIST mwe mpe GT ETEO TE ITC-IST ITC-IST ITC-IST ITC-IST IT/AF IT/AF Adeq. ITC-IST IT/AF Fluency ean.. ITC-IST IT/AF IT/AF IT/AF ITC-IST ITC-IST ITC-IST ITC-IST IT/AF IT/AF IT/AF IT/AF
61 Chinese English Supplied Data - ankings BLE NIST mwe mpe GT ETEO TE ITC-IST ITC-IST ITC-IST ITC-IST IT/AF IT/AF Adeq. ITC-IST IT/AF Fluency ean.. ITC-IST IT/AF IT/AF IT/AF ITC-IST ITC-IST ITC-IST ITC-IST IT/AF IT/AF IT/AF IT/AF
62 Chinese English Supplied Data - ankings BLE NIST mwe mpe GT ETEO TE ITC-IST ITC-IST ITC-IST ITC-IST IT/AF IT/AF Adeq. ITC-IST IT/AF Fluency ean.. ITC-IST IT/AF IT/AF IT/AF ITC-IST ITC-IST ITC-IST ITC-IST IT/AF IT/AF IT/AF IT/AF
63 Chinese English Supplied Data - ankings BLE NIST mwe mpe GT ETEO TE ITC-IST ITC-IST ITC-IST ITC-IST IT/AF IT/AF Adeq. ITC-IST IT/AF Fluency ean.. ITC-IST IT/AF IT/AF IT/AF ITC-IST ITC-IST ITC-IST ITC-IST IT/AF IT/AF IT/AF IT/AF
64 Chinese English Supplied Data - ankings BLE NIST mwe mpe GT ETEO TE ITC-IST ITC-IST ITC-IST ITC-IST IT/AF IT/AF Adeq. ITC-IST IT/AF Fluency ean.. ITC-IST IT/AF IT/AF IT/AF ITC-IST ITC-IST ITC-IST ITC-IST IT/AF IT/AF IT/AF IT/AF
65 Chinese English Supplied Data - ankings BLE NIST mwe mpe GT ETEO TE ITC-IST ITC-IST ITC-IST ITC-IST IT/AF IT/AF Adeq. ITC-IST IT/AF Fluency ean.. ITC-IST IT/AF IT/AF IT/AF ITC-IST ITC-IST ITC-IST ITC-IST IT/AF IT/AF IT/AF IT/AF
66 Chinese English Supplied Data - ankings BLE NIST mwe mpe GT ETEO TE ITC-IST ITC-IST ITC-IST ITC-IST IT/AF IT/AF Adeq. ITC-IST IT/AF Fluency ean.. ITC-IST IT/AF IT/AF IT/AF ITC-IST ITC-IST ITC-IST ITC-IST IT/AF IT/AF IT/AF IT/AF
67 Chinese English Supplied Data - ankings BLE NIST mwe mpe GT ETEO TE ITC-IST ITC-IST ITC-IST ITC-IST IT/AF IT/AF Adeq. ITC-IST IT/AF Fluency ean.. ITC-IST IT/AF IT/AF IT/AF ITC-IST ITC-IST ITC-IST ITC-IST IT/AF IT/AF IT/AF IT/AF
68 Chinese English Supplied Data - ankings BLE NIST mwe mpe GT ETEO TE ITC-IST ITC-IST ITC-IST ITC-IST IT/AF IT/AF Adeq. ITC-IST IT/AF Fluency ean.. ITC-IST IT/AF IT/AF IT/AF ITC-IST ITC-IST ITC-IST ITC-IST IT/AF IT/AF IT/AF IT/AF correlations Human/Automatic
69 Correlation Human Automatic Scores Pearson Correlation between scores Adequacy Fluency ean. aint. BLE NIST mwe mpe GT ETEO TE other data conditions
70 C IB AT -C 3 SC -IS I N TT N G K AT T -S L AT N LP -A LE PH IT C -I ST ED W IN T H TA B LP G H -ph IT r -L ase TA L/A LP F -n L gr am Chinese English BLE Scores Supplied
71 C IB AT -C 3 SC -IS I N TT N G K AT T -S L AT N LP -A LE PH IT C -I ST ED W IN T H TA B LP G H -ph IT r -L ase TA L/A LP F -n L gr am Chinese English BLE Scores Supplied Supplied + Tools
72 Supplied C IB AT -C 3 SC -IS I N TT N G K AT T -S L AT N LP -A LE PH IT C -I ST ED W IN T H TA B LP G H -ph IT r -L ase TA L/A LP F -n L gr am Chinese English BLE Scores Supplied + Tools nrestricted
73 Supplied C IB AT -C 3 SC -IS I N TT N G K AT T -S L AT N LP -A LE PH IT C -I ST ED W IN T H TA B LP G H -ph IT r -L ase TA L/A LP F -n L gr am Chinese English BLE Scores Supplied + Tools nrestricted C-STA NIST Scores
74 IT - W LL TH /A IT F C- L I ST TA LP -n gr AT a m TA L P -C -p 3 hr as e ED IN NT B T G H C SC N I SI G AT K T -S L AT N L P -A LE PH Chinese English NIST Scores Supplied
75 IT - W LL TH /A IT F C- L I ST TA LP -n gr AT a m TA L P -C -p 3 hr as e ED IN NT B T G H C SC N I SI G AT K T -S L AT N L P -A LE PH Chinese English NIST Scores Supplied Supplied + Tools
76 IT - W LL TH /A IT F C- L I ST TA LP -n gr AT a m TA L P -C -p 3 hr as e ED IN NT B T G H C SC N I SI G AT K T -S L AT N L P -A LE PH Chinese English NIST Scores Supplied Supplied + Tools nrestricted
77 IT - W LL TH /A IT F C- L I ST TA LP -n gr AT a m TA L P -C -p 3 hr as e ED IN NT B T G H C SC N I SI G AT K T -S L AT N L P -A LE PH Chinese English NIST Scores Supplied Supplied + Tools nrestricted C-STA
78 IN C O KI B G AT H -C 3 N TT SC IC -I S I O SO AT F T -S L TO KY N O AT GK -A T LE PH ED IT CI ST W TH Japanese English BLE Scores Supplied
79 IN C Supplied O KI B G AT H -C 3 N TT SC IC -I S I O SO AT F T -S L TO KY N O AT GK -A T LE PH ED IT CI ST W TH Japanese English BLE Scores Supplied + Tools
80 Japanese English BLE Scores Supplied Supplied + Tools nrestricted O KI B G AT H -C 3 N TT SC IC -I S I O SO AT F T -S L TO KY N O AT GK -A T LE PH C IN ED IT CI ST W TH 0.0
81 Japanese English BLE Scores Supplied Supplied + Tools nrestricted C-STA O KI B G AT H -C 3 N TT SC IC -I S I O SO AT F T -S L TO KY N O AT GK -A T LE PH C IN ED IT CI ST W TH 0.0 NIST Scores
82 W TH O KI AT IT C3 CI ST ED S C -I S IN B I IC G O H SO F TO T KY N O G K AT T A T -S L -A LE PH TT N C Japanese English NIST Scores Supplied
83 W TH Supplied O KI AT IT C3 CI ST ED S C -I S IN B I IC G O H SO F TO T KY N O G K AT T A T -S L -A LE PH TT N C Japanese English NIST Scores Supplied + Tools
84 Japanese English NIST Scores Supplied Supplied + Tools nrestricted O KI AT IT C3 CI ST ED S C -I S IN B I IC G O H SO F TO T KY N O G K AT T A T -S L -A LE PH W TH TT N C 0.0
85 Japanese English NIST Scores Supplied Supplied + Tools nrestricted C-STA O KI AT IT C3 CI ST ED S C -I S IN B I IC G O H SO F TO T KY N O G K AT T A T -S L -A LE PH W TH TT N C 0.0
86 Japanese English Supplied Data - ankings BLE NIST mwe mpe GT ETEO ITC-IST ITC-IST ITC-IST ITC-IST ITC-IST EDINBGH EDINBGH EDINBGH ITC-IST EDINBGH EDINBGH EDINBGH Different metrics rank differently!
87 Japanese English Supplied Data - ankings BLE NIST mwe mpe GT ETEO ITC-IST ITC-IST ITC-IST ITC-IST ITC-IST EDINBGH EDINBGH EDINBGH ITC-IST EDINBGH EDINBGH EDINBGH
88 Japanese English Supplied Data - ankings BLE NIST mwe mpe GT ETEO ITC-IST ITC-IST ITC-IST ITC-IST ITC-IST EDINBGH EDINBGH EDINBGH ITC-IST EDINBGH EDINBGH EDINBGH
89 Japanese English Supplied Data - ankings BLE NIST mwe mpe GT ETEO ITC-IST ITC-IST ITC-IST ITC-IST ITC-IST EDINBGH EDINBGH EDINBGH ITC-IST EDINBGH EDINBGH EDINBGH
90 Japanese English Supplied Data - ankings BLE NIST mwe mpe GT ETEO ITC-IST ITC-IST ITC-IST ITC-IST ITC-IST EDINBGH EDINBGH EDINBGH ITC-IST EDINBGH EDINBGH EDINBGH
91 Japanese English Supplied Data - ankings BLE NIST mwe mpe GT ETEO ITC-IST ITC-IST ITC-IST ITC-IST ITC-IST EDINBGH EDINBGH EDINBGH ITC-IST EDINBGH EDINBGH EDINBGH
92 Japanese English Supplied Data - ankings BLE NIST mwe mpe GT ETEO ITC-IST ITC-IST ITC-IST ITC-IST ITC-IST EDINBGH EDINBGH EDINBGH ITC-IST EDINBGH EDINBGH EDINBGH
93 Japanese English Supplied Data - ankings BLE NIST mwe mpe GT ETEO ITC-IST ITC-IST ITC-IST ITC-IST ITC-IST EDINBGH EDINBGH EDINBGH ITC-IST EDINBGH EDINBGH EDINBGH Arabic English
94 hr a se IN -A LE PH I TT SC -I S C m G H N B TH IB W ng ra P- AT ED -p IT CI ST LP TA L TA Arabic English BLE Scores Supplied
95 Arabic English BLE Scores Supplied Supplied + Tools LE PH I AT -A SC -I S C TT N G H B m IN ng ra ED P- IB TH W IT CI ST TA L TA LP -p hr a se 0.0
96 Arabic English BLE Scores Supplied Supplied + Tools nrestricted LE PH I AT -A SC -I S C TT N G H B m IN ng ra ED P- IB TH W IT CI ST TA L TA LP -p hr a se 0.0
97 Arabic English BLE Scores Supplied Supplied + Tools nrestricted C-STA AT -A LE PH I SC -I S C TT N G H B m IN ng ra ED P- IB TH W IT CI ST TA L TA LP -p hr a se 0.0 NIST Scores
98 W TH AT LE PH I m SC -I S -A ng ra G H IB TT B P- IN TA L ED C N IT CI TA ST LP -p hr as e Arabic English NIST Scores Supplied
99 Arabic English NIST Scores Supplied Supplied + Tools LE PH I -A AT SC -I S m ng ra P- G H TA L B IN C TT N IB ED W TH IT CI TA ST LP -p hr as e 0.0
100 Arabic English NIST Scores Supplied Supplied + Tools nrestricted LE PH I -A AT SC -I S m ng ra P- G H TA L B IN C TT N IB ED W TH IT CI TA ST LP -p hr as e 0.0
101 Arabic English NIST Scores Supplied Supplied + Tools nrestricted C-STA LE PH I -A AT SC -I S m ng ra P- G H TA L B IN C TT N IB ED W TH IT CI TA ST LP -p hr as e 0.0 Korean English
102 Korean English BLE Scores Supplied PH D A LE -A SE I SC -I S TT N C H AT ED IN B G H 0.0
103 Korean English BLE Scores Supplied Supplied + Tools PH D A LE -A SE I SC -I S TT N C H AT ED IN B G H 0.0
104 Korean English BLE Scores Supplied Supplied + Tools nrestricted PH D A LE -A SE I SC -I S TT N C H AT ED IN B G H 0.0
105 Korean English BLE Scores Supplied Supplied + Tools nrestricted C-STA PH D A LE -A SE I SC -I S TT N C H AT ED IN B G H 0.0 NIST Scores
106 Korean English NIST Scores Supplied PH DA AT -A LE H G B IN ED SE H I SC -I S TT N C
107 Korean English NIST Scores Supplied Supplied + Tools PH DA AT -A LE H G B IN ED SE H I SC -I S TT N C
108 Korean English NIST Scores Supplied Supplied + Tools nrestricted PH DA AT -A LE H G B IN ED SE H I SC -I S TT N C
109 Korean English NIST Scores Supplied Supplied + Tools nrestricted C-STA PH DA AT -A LE H G B IN ED SE H I SC -I S TT N C English Chinese
110 English Chinese BLE Score Supplied EDINBGH ICOSOFT AT-ALEPH
111 English Chinese BLE Score Supplied Supplied + Tools ICOSOFT EDINBGH AT-ALEPH
112 English Chinese BLE Score Supplied Supplied + Tools nrestricted EDINBGH ICOSOFT AT-ALEPH
113 English Chinese BLE Score Supplied Supplied + Tools nrestricted C-STA EDINBGH ICOSOFT AT-ALEPH
114 English Chinese NIST Score Supplied EDINBGH ICOSOFT AT-ALEPH
115 English Chinese NIST Score Supplied Supplied + Tools EDINBGH ICOSOFT AT-ALEPH
116 English Chinese NIST Score Supplied Supplied + Tools nrestricted EDINBGH ICOSOFT AT-ALEPH
117 English Chinese NIST Score Supplied Supplied + Tools nrestricted C-STA EDINBGH ICOSOFT AT-ALEPH
118 AS esults Chiori
119 Directions and source input Translation direction anual transcription AS output Chinese English Japanese English Arabic English Korean English English Chinese (Dr. Chen, NLP) Dr. Yamamoto (AT), () - () - () (r. Paulik, /KA)
120 Japanese AS performance Word error rate (%) 506 utterances were recognized. 1-best best lattice DEVSET1 DEVSET2 TESTSET
121 Japanese AS performance Word error rate (%) 506 utterances were recognized. 1-best best lattice DEVSET1 DEVSET2 TESTSET
122 Japanese AS performance Word error rate (%) 506 utterances were recognized. 1-best best lattice DEVSET1 DEVSET2 TESTSET
123 Chinese AS performance 506 utterances were recognized. 1-best 50 Word error rate (%) 20-best latteice DEVSET1 DEVSET2 TESTSET
124 Chinese AS performance 506 utterances were recognized. 1-best 50 Word error rate (%) 20-best latteice DEVSET1 DEVSET2 TESTSET
125 Chinese AS performance 506 utterances were recognized. 1-best 50 Word error rate (%) 20-best latteice DEVSET1 DEVSET2 TESTSET
126 How much the performance of T is degraded by AS recognition? Chinese English track 160 #sentences % 20-40% 40-60% 60-80% Word error rate (%) %
127 BLE 0.6 IT BLE score Word error rate (%)
128 BLE 0.6 IT BLE score Word error rate (%)
129 NIST IT 10 NIST score Word error rate (%)
130 NIST IT 10 NIST score Word error rate (%)
131 ultiple AS hypotheses translation #sentences Word error rate (%) CASIA CASIA Word error rate (%) BLE NIST Score Word error rate (%) Word error rate (%)
132 Acknowledgment All participants NLP and AT for AS BBN for TE /KA Thanks a lot!
Evaluation. Brian Thompson slides by Philipp Koehn. 25 September 2018
Evaluation Brian Thompson slides by Philipp Koehn 25 September 2018 Evaluation 1 How good is a given machine translation system? Hard problem, since many different translations acceptable semantic equivalence
More informationMachine Translation Evaluation
Machine Translation Evaluation Sara Stymne 2017-03-29 Partly based on Philipp Koehn s slides for chapter 8 Why Evaluation? How good is a given machine translation system? Which one is the best system for
More informationORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation
ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation Chin-Yew Lin and Franz Josef Och Information Sciences Institute University of Southern California 4676 Admiralty Way
More informationTALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto
TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto Marta R. Costa-jussà, Josep M. Crego, Adrià de Gispert, Patrik Lambert, Maxim Khalilov, José A.R. Fonollosa, José
More information1 Evaluation of SMT systems: BLEU
1 Evaluation of SMT systems: BLEU Idea: We want to define a repeatable evaluation method that uses: a gold standard of human generated reference translations a numerical translation closeness metric in
More informationSpeech Translation: from Singlebest to N-Best to Lattice Translation. Spoken Language Communication Laboratories
Speech Translation: from Singlebest to N-Best to Lattice Translation Ruiqiang ZHANG Genichiro KIKUI Spoken Language Communication Laboratories 2 Speech Translation Structure Single-best only ASR Single-best
More informationA Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister
A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model
More informationMultiple System Combination. Jinhua Du CNGL July 23, 2008
Multiple System Combination Jinhua Du CNGL July 23, 2008 Outline Introduction Motivation Current Achievements Combination Strategies Key Techniques System Combination Framework in IA Large-Scale Experiments
More informationMaja Popović Humboldt University of Berlin Berlin, Germany 2 CHRF and WORDF scores
CHRF deconstructed: β parameters and n-gram weights Maja Popović Humboldt University of Berlin Berlin, Germany maja.popovic@hu-berlin.de Abstract Character n-gram F-score (CHRF) is shown to correlate very
More informationN-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24
L545 Dept. of Linguistics, Indiana University Spring 2013 1 / 24 Morphosyntax We just finished talking about morphology (cf. words) And pretty soon we re going to discuss syntax (cf. sentences) In between,
More informationEnhanced Bilingual Evaluation Understudy
Enhanced Bilingual Evaluation Understudy Krzysztof Wołk, Krzysztof Marasek Department of Multimedia Polish Japanese Institute of Information Technology, Warsaw, POLAND kwolk@pjwstk.edu.pl Abstract - Our
More informationSYNTHER A NEW M-GRAM POS TAGGER
SYNTHER A NEW M-GRAM POS TAGGER David Sündermann and Hermann Ney RWTH Aachen University of Technology, Computer Science Department Ahornstr. 55, 52056 Aachen, Germany {suendermann,ney}@cs.rwth-aachen.de
More informationNatural Language Processing SoSe Words and Language Model
Natural Language Processing SoSe 2016 Words and Language Model Dr. Mariana Neves May 2nd, 2016 Outline 2 Words Language Model Outline 3 Words Language Model Tokenization Separation of words in a sentence
More informationAnalysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation
Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation David Vilar, Daniel Stein, Hermann Ney IWSLT 2008, Honolulu, Hawaii 20. October 2008 Human Language Technology
More informationAutomatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics
Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics Chin-Yew Lin and Franz Josef Och Information Sciences Institute University of Southern California
More informationPhrase-Based Statistical Machine Translation with Pivot Languages
Phrase-Based Statistical Machine Translation with Pivot Languages N. Bertoldi, M. Barbaiani, M. Federico, R. Cattoni FBK, Trento - Italy Rovira i Virgili University, Tarragona - Spain October 21st, 2008
More informationDesign and Implementation of Speech Recognition Systems
Design and Implementation of Speech Recognition Systems Spring 2013 Class 7: Templates to HMMs 13 Feb 2013 1 Recap Thus far, we have looked at dynamic programming for string matching, And derived DTW from
More informationTuning as Linear Regression
Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical
More informationPaterson Public Schools
A. Concepts About Print Understand how print is organized and read. (See LAL Curriculum Framework Grade KR Page 1 of 12) Understand that all print materials in English follow similar patterns. (See LAL
More informationCS230: Lecture 10 Sequence models II
CS23: Lecture 1 Sequence models II Today s outline We will learn how to: - Automatically score an NLP model I. BLEU score - Improve Machine II. Beam Search Translation results with Beam search III. Speech
More informationThe Noisy Channel Model and Markov Models
1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle
More informationNatural Language Processing. Statistical Inference: n-grams
Natural Language Processing Statistical Inference: n-grams Updated 3/2009 Statistical Inference Statistical Inference consists of taking some data (generated in accordance with some unknown probability
More informationN-gram Language Modeling
N-gram Language Modeling Outline: Statistical Language Model (LM) Intro General N-gram models Basic (non-parametric) n-grams Class LMs Mixtures Part I: Statistical Language Model (LM) Intro What is a statistical
More informationMark Scheme (Results) June GCSE Mathematics (2MB01) Foundation 5MB2F (Non-Calculator) Paper 01
Mark Scheme (Results) June 2012 GCSE Mathematics (2MB01) Foundation 5MB2F (Non-Calculator) Paper 01 Edexcel and BTEC Qualifications Edexcel and BTEC qualifications come from Pearson, the world s leading
More informationÉvalua&on)des)systèmes)de)TA)
Avant)propos) Évalua&on)des)systèmes)de)TA) Hervé)Blanchon) Herve.Blanchon@imag.fr)! De)quoi)vaAtAon)parler)?) " Évalua&on)des)systèmes)de)traduc&on)automa&que)! Que)vaAtAon)couvrir)?) " Évalua&on)par)des)humains):)évalua&on)subjec&ve)
More informationPaterson Public Schools
By this marking period students should be applying skills and strategies learned in the first three marking periods. (See LAL Curriculum frameworks Grade KR page 1 of 12) Identifying front cover, back
More informationDiscriminative Training
Discriminative Training February 19, 2013 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder
More informationAn Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems
An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems Wolfgang Macherey Google Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043, USA wmach@google.com Franz
More informationTnT Part of Speech Tagger
TnT Part of Speech Tagger By Thorsten Brants Presented By Arghya Roy Chaudhuri Kevin Patel Satyam July 29, 2014 1 / 31 Outline 1 Why Then? Why Now? 2 Underlying Model Other technicalities 3 Evaluation
More informationCourse Staff. Textbook
Course Staff CS311H: Discrete Mathematics Intro and Propositional Logic Instructor: Işıl Dillig Instructor: Prof. Işıl Dillig TAs: Jacob Van Geffen, Varun Adiga, Akshay Gupta Class meets every Monday,
More informationstatistical machine translation
statistical machine translation P A R T 3 : D E C O D I N G & E V A L U A T I O N CSC401/2511 Natural Language Computing Spring 2019 Lecture 6 Frank Rudzicz and Chloé Pou-Prom 1 University of Toronto Statistical
More informationCross-Lingual Language Modeling for Automatic Speech Recogntion
GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim woosung@cs.jhu.edu Center for Language and Speech Processing Dept. of Computer Science The
More informationThe Geometry of Statistical Machine Translation
The Geometry of Statistical Machine Translation Presented by Rory Waite 16th of December 2015 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions ntroduction We provide
More informationChapter 3: Basics of Language Modeling
Chapter 3: Basics of Language Modeling Section 3.1. Language Modeling in Automatic Speech Recognition (ASR) All graphs in this section are from the book by Schukat-Talamazzini unless indicated otherwise
More informationDeep Learning Sequence to Sequence models: Attention Models. 17 March 2018
Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:
More informationMIDDLE GRADES MATHEMATICS
MIDDLE GRADES MATHEMATICS Content Domain Range of Competencies l. Number Sense and Operations 0001 0002 17% ll. Algebra and Functions 0003 0006 33% lll. Measurement and Geometry 0007 0009 25% lv. Statistics,
More informationCS 224N HW:#3. (V N0 )δ N r p r + N 0. N r (r δ) + (V N 0)δ. N r r δ. + (V N 0)δ N = 1. 1 we must have the restriction: δ NN 0.
CS 224 HW:#3 ARIA HAGHIGHI SUID :# 05041774 1. Smoothing Probability Models (a). Let r be the number of words with r counts and p r be the probability for a word with r counts in the Absolute discounting
More informationExternal Backward Linkage and External Forward Linkage. in Asian International Input-Output Table
Prepared for the 20 th INFORUM World Conference in Firenze, Italy External Backward Linkage and External Forward Linkage in Asian International Input-Output Table Toshiaki Hasegawa Faculty of Economics
More informationLecture 10. Discriminative Training, ROVER, and Consensus. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen
Lecture 10 Discriminative Training, ROVER, and Consensus Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen}@us.ibm.com
More informationLanguage Models. Philipp Koehn. 11 September 2018
Language Models Philipp Koehn 11 September 2018 Language models 1 Language models answer the question: How likely is a string of English words good English? Help with reordering p LM (the house is small)
More informationChapter 3: Basics of Language Modelling
Chapter 3: Basics of Language Modelling Motivation Language Models are used in Speech Recognition Machine Translation Natural Language Generation Query completion For research and development: need a simple
More informationRecap: Language models. Foundations of Natural Language Processing Lecture 4 Language Models: Evaluation and Smoothing. Two types of evaluation in NLP
Recap: Language models Foundations of atural Language Processing Lecture 4 Language Models: Evaluation and Smoothing Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipp
More information{ Jurafsky & Martin Ch. 6:! 6.6 incl.
N-grams Now Simple (Unsmoothed) N-grams Smoothing { Add-one Smoothing { Backo { Deleted Interpolation Reading: { Jurafsky & Martin Ch. 6:! 6.6 incl. 1 Word-prediction Applications Augmentative Communication
More informationN-gram Language Modeling Tutorial
N-gram Language Modeling Tutorial Dustin Hillard and Sarah Petersen Lecture notes courtesy of Prof. Mari Ostendorf Outline: Statistical Language Model (LM) Basics n-gram models Class LMs Cache LMs Mixtures
More informationSpeech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II)
Speech and Language Processing Chapter 9 of SLP Automatic Speech Recognition (II) Outline for ASR ASR Architecture The Noisy Channel Model Five easy pieces of an ASR system 1) Language Model 2) Lexicon/Pronunciation
More informationNatural Language Processing SoSe Language Modelling. (based on the slides of Dr. Saeedeh Momtazi)
Natural Language Processing SoSe 2015 Language Modelling Dr. Mariana Neves April 20th, 2015 (based on the slides of Dr. Saeedeh Momtazi) Outline 2 Motivation Estimation Evaluation Smoothing Outline 3 Motivation
More informationMark Scheme (Results) Summer Pearson Edexcel GCE in Statistics 3R (6691/01R)
Mark Scheme (Results) Summer 2014 Pearson Edexcel GCE in Statistics 3R (6691/01R) Edexcel and BTEC Qualifications Edexcel and BTEC qualifications come from Pearson, the world s leading learning company.
More informationStatistical Phrase-Based Speech Translation
Statistical Phrase-Based Speech Translation Lambert Mathias 1 William Byrne 2 1 Center for Language and Speech Processing Department of Electrical and Computer Engineering Johns Hopkins University 2 Machine
More informationAutomated Summarisation for Evidence Based Medicine
Automated Summarisation for Evidence Based Medicine Diego Mollá Centre for Language Technology, Macquarie University HAIL, 22 March 2012 Contents Evidence Based Medicine Our Corpus for Summarisation Structure
More informationMark Scheme (Results) Summer International GCSE Mathematics (4MA0) Paper 4HR
Mark Scheme (Results) Summer 0 International GCSE Mathematics (4MA0) Paper 4HR Edexcel and BTEC Qualifications Edexcel and BTEC qualifications come from Pearson, the world s leading learning company. We
More informationMass Asset Additions. Overview. Effective mm/dd/yy Page 1 of 47 Rev 1. Copyright Oracle, All rights reserved.
Overview Effective mm/dd/yy Page 1 of 47 Rev 1 System References None Distribution Oracle Assets Job Title * Ownership The Job Title [list@yourcompany.com?subject=eduxxxxx] is responsible for ensuring
More informationHomework 4, Part B: Structured perceptron
Homework 4, Part B: Structured perceptron CS 585, UMass Amherst, Fall 2016 Overview Due Friday, Oct 28. Get starter code/data from the course website s schedule page. You should submit a zipped directory
More informationImproved Decipherment of Homophonic Ciphers
Improved Decipherment of Homophonic Ciphers Malte Nuhn and Julian Schamper and Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department, RWTH Aachen University, Aachen,
More informationStatistical Machine Translation and Automatic Speech Recognition under Uncertainty
Statistical Machine Translation and Automatic Speech Recognition under Uncertainty Lambert Mathias A dissertation submitted to the Johns Hopkins University in conformity with the requirements for the degree
More informationElastic and Inelastic Collisions
Elastic and Inelastic Collisions - TA Version Physics Topics If necessary, review the following topics and relevant textbook sections from Serway / Jewett Physics for Scientists and Engineers, 9th Ed.
More informationPMT. Mark Scheme (Results) Summer Pearson Edexcel GCSE In Mathematics A (1MA0) Higher (Calculator) Paper 2H
Mark Scheme (Results) Summer 2014 Pearson Edexcel GCSE In Mathematics A (1MA0) Higher (Calculator) Paper 2H Edexcel and BTEC Qualifications Edexcel and BTEC qualifications are awarded by Pearson, the UK
More informationPart I: Web Structure Mining Chapter 1: Information Retrieval and Web Search
Part I: Web Structure Mining Chapter : Information Retrieval an Web Search The Web Challenges Crawling the Web Inexing an Keywor Search Evaluating Search Quality Similarity Search The Web Challenges Tim
More information1. Markov models. 1.1 Markov-chain
1. Markov models 1.1 Markov-chain Let X be a random variable X = (X 1,..., X t ) taking values in some set S = {s 1,..., s N }. The sequence is Markov chain if it has the following properties: 1. Limited
More informationMath.3336: Discrete Mathematics. Applications of Propositional Logic
Math.3336: Discrete Mathematics Applications of Propositional Logic Instructor: Dr. Blerina Xhabli Department of Mathematics, University of Houston https://www.math.uh.edu/ blerina Email: blerina@math.uh.edu
More informationLanguage Processing with Perl and Prolog
Language Processing with Perl and Prolog Chapter 5: Counting Words Pierre Nugues Lund University Pierre.Nugues@cs.lth.se http://cs.lth.se/pierre_nugues/ Pierre Nugues Language Processing with Perl and
More informationMachine Learning for natural language processing
Machine Learning for natural language processing N-grams and language models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 25 Introduction Goals: Estimate the probability that a
More informationStatistical Machine Translation
Statistical Machine Translation Marcello Federico FBK-irst Trento, Italy Galileo Galilei PhD School University of Pisa Pisa, 7-19 May 2008 Part V: Language Modeling 1 Comparing ASR and statistical MT N-gram
More informationUnsupervised Vocabulary Induction
Infant Language Acquisition Unsupervised Vocabulary Induction MIT (Saffran et al., 1997) 8 month-old babies exposed to stream of syllables Stream composed of synthetic words (pabikumalikiwabufa) After
More informationTriplet Lexicon Models for Statistical Machine Translation
Triplet Lexicon Models for Statistical Machine Translation Saša Hasan, Juri Ganitkevitch, Hermann Ney and Jesús Andrés Ferrer lastname@cs.rwth-aachen.de CLSP Student Seminar February 6, 2009 Human Language
More informationStatistical Substring Reduction in Linear Time
Statistical Substring Reduction in Linear Time Xueqiang Lü Institute of Computational Linguistics Peking University, Beijing lxq@pku.edu.cn Le Zhang Institute of Computer Software & Theory Northeastern
More informationDiscriminative Learning in Speech Recognition
Discriminative Learning in Speech Recognition Yueng-Tien, Lo g96470198@csie.ntnu.edu.tw Speech Lab, CSIE Reference Xiaodong He and Li Deng. "Discriminative Learning in Speech Recognition, Technical Report
More informationPenn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark
Penn Treebank Parsing Advanced Topics in Language Processing Stephen Clark 1 The Penn Treebank 40,000 sentences of WSJ newspaper text annotated with phrasestructure trees The trees contain some predicate-argument
More informationA L A BA M A L A W R E V IE W
A L A BA M A L A W R E V IE W Volume 52 Fall 2000 Number 1 B E F O R E D I S A B I L I T Y C I V I L R I G HT S : C I V I L W A R P E N S I O N S A N D TH E P O L I T I C S O F D I S A B I L I T Y I N
More informationKneser-Ney smoothing explained
foldl home blog contact feed Kneser-Ney smoothing explained 18 January 2014 Language models are an essential element of natural language processing, central to tasks ranging from spellchecking to machine
More informationLanguage Modeling. Michael Collins, Columbia University
Language Modeling Michael Collins, Columbia University Overview The language modeling problem Trigram models Evaluating language models: perplexity Estimation techniques: Linear interpolation Discounting
More informationOut of GIZA Efficient Word Alignment Models for SMT
Out of GIZA Efficient Word Alignment Models for SMT Yanjun Ma National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Series March 4, 2009 Y. Ma (DCU) Out of Giza
More informationA Convolutional Neural Network-based
A Convolutional Neural Network-based Model for Knowledge Base Completion Dat Quoc Nguyen Joint work with: Dai Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung April 16, 2018 Introduction Word vectors learned
More informationAn Algorithm for Fast Calculation of Back-off N-gram Probabilities with Unigram Rescaling
An Algorithm for Fast Calculation of Back-off N-gram Probabilities with Unigram Rescaling Masaharu Kato, Tetsuo Kosaka, Akinori Ito and Shozo Makino Abstract Topic-based stochastic models such as the probabilistic
More informationAccelerated Natural Language Processing Lecture 3 Morphology and Finite State Machines; Edit Distance
Accelerated Natural Language Processing Lecture 3 Morphology and Finite State Machines; Edit Distance Sharon Goldwater (based on slides by Philipp Koehn) 20 September 2018 Sharon Goldwater ANLP Lecture
More informationSpatial Role Labeling CS365 Course Project
Spatial Role Labeling CS365 Course Project Amit Kumar, akkumar@iitk.ac.in Chandra Sekhar, gchandra@iitk.ac.in Supervisor : Dr.Amitabha Mukerjee ABSTRACT In natural language processing one of the important
More informationACS Introduction to NLP Lecture 3: Language Modelling and Smoothing
ACS Introduction to NLP Lecture 3: Language Modelling and Smoothing Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk Language Modelling 2 A language model is a probability
More informationElastic and Inelastic Collisions
Physics Topics Elastic and Inelastic Collisions If necessary, review the following topics and relevant textbook sections from Serway / Jewett Physics for Scientists and Engineers, 9th Ed. Kinetic Energy
More informationDecoding Revisited: Easy-Part-First & MERT. February 26, 2015
Decoding Revisited: Easy-Part-First & MERT February 26, 2015 Translating the Easy Part First? the tourism initiative addresses this for the first time the die tm:-0.19,lm:-0.4, d:0, all:-0.65 tourism touristische
More informationStatistical Machine Translation
Statistical Machine Translation -tree-based models (cont.)- Artem Sokolov Computerlinguistik Universität Heidelberg Sommersemester 2015 material from P. Koehn, S. Riezler, D. Altshuler Bottom-Up Decoding
More informationConditional Language Modeling. Chris Dyer
Conditional Language Modeling Chris Dyer Unconditional LMs A language model assigns probabilities to sequences of words,. w =(w 1,w 2,...,w`) It is convenient to decompose this probability using the chain
More informationStatistical methods for NLP Estimation
Statistical methods for NLP Estimation UNIVERSITY OF Richard Johansson January 29, 2015 why does the teacher care so much about the coin-tossing experiment? because it can model many situations: I pick
More informationCS446: Machine Learning Spring Problem Set 4
CS446: Machine Learning Spring 2017 Problem Set 4 Handed Out: February 27 th, 2017 Due: March 11 th, 2017 Feel free to talk to other members of the class in doing the homework. I am more concerned that
More informationDecoding in Statistical Machine Translation. Mid-course Evaluation. Decoding. Christian Hardmeier
Decoding in Statistical Machine Translation Christian Hardmeier 2016-05-04 Mid-course Evaluation http://stp.lingfil.uu.se/~sara/kurser/mt16/ mid-course-eval.html Decoding The decoder is the part of the
More informationExperiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition
Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden
More informationSemantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing
Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding
More informationAcoustic Modeling for Speech Recognition
Acoustic Modeling for Speech Recognition Berlin Chen 2004 References:. X. Huang et. al. Spoken Language Processing. Chapter 8 2. S. Young. The HTK Book (HTK Version 3.2) Introduction For the given acoustic
More informationThe statement calculus and logic
Chapter 2 Contrariwise, continued Tweedledee, if it was so, it might be; and if it were so, it would be; but as it isn t, it ain t. That s logic. Lewis Carroll You will have encountered several languages
More informationFROM QUERIES TO TOP-K RESULTS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
FROM QUERIES TO TOP-K RESULTS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Retrieval evaluation Link
More information] Automatic Speech Recognition (CS753)
] Automatic Speech Recognition (CS753) Lecture 17: Discriminative Training for HMMs Instructor: Preethi Jyothi Sep 28, 2017 Discriminative Training Recall: MLE for HMMs Maximum likelihood estimation (MLE)
More informationMark Scheme (Results) November Pearson Edexcel GCSE in Mathematics Linear (1MA0) Higher (Non-Calculator) Paper 1H
Mark Scheme (Results) November 2013 Pearson Edexcel GCSE in Mathematics Linear (1MA0) Higher (Non-Calculator) Paper 1H Edexcel and BTEC Qualifications Edexcel and BTEC qualifications are awarded by Pearson,
More informationCollaborative NLP-aided ontology modelling
Collaborative NLP-aided ontology modelling Chiara Ghidini ghidini@fbk.eu Marco Rospocher rospocher@fbk.eu International Winter School on Language and Data/Knowledge Technologies TrentoRISE Trento, 24 th
More informationLanguage Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky
Language Modeling Introduction to N-grams Many Slides are adapted from slides by Dan Jurafsky Probabilistic Language Models Today s goal: assign a probability to a sentence Why? Machine Translation: P(high
More informationDARPA ATIS Test Results June 1990
DARPA ATIS Test Results June 1990 D. S. Pallett, W. M. Fisher, J. G. Fiscus, and J. S. Garofolo Room A 216 Technology Building National Institute of Standards and Technology (NIST) Gaithersburg, MD 20899
More informationThe distribution of characters, bi- and trigrams in the Uppsala 70 million words Swedish newspaper corpus
Uppsala University Department of Linguistics The distribution of characters, bi- and trigrams in the Uppsala 70 million words Swedish newspaper corpus Bengt Dahlqvist Abstract The paper describes some
More informationModeling Norms of Turn-Taking in Multi-Party Conversation
Modeling Norms of Turn-Taking in Multi-Party Conversation Kornel Laskowski Carnegie Mellon University Pittsburgh PA, USA 13 July, 2010 Laskowski ACL 1010, Uppsala, Sweden 1/29 Comparing Written Documents
More informationMachine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples
Statistical NLP Spring 2009 Machine Translation: Examples Lecture 17: Word Alignment Dan Klein UC Berkeley Corpus-Based MT Levels of Transfer Modeling correspondences between languages Sentence-aligned
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationAutomatic Speech Recognition and Statistical Machine Translation under Uncertainty
Outlines Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Lambert Mathias Advisor: Prof. William Byrne Thesis Committee: Prof. Gerard Meyer, Prof. Trac Tran and Prof.
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More information