Page xix: "Wed 2.0" -> "Web 2.0"
Page 82: p_ir –> P_iR
Page 92: the computation of "sim(d_1,q)" should be restricted to just the termsets with minimum frequency of 2. Thus, the correct formula is as follows:
* sim (d_1,q) = (W_{d,1} x W_{d,q} + W_{ad,1} x W_{ad,q} + W_{bd,1} x W_{bd,q}) / |vec(d)_1|
* sim (d_1,q) = (2.00 x 1.00 + 3.17 x 1.58 + 2.44 x 1.22) / 7.35 = 1.35
Page 101: "The matrix D^T is the matrix of eigenvectors derived from the transpose of the document-document matrix given by M^T · M" --> "The matrix D^T is the transpose of the matrix eigenvectors derived from the document-document matrix given by M^T · M"
Page 107: Section 3.5.2 on Language Models: subsection on "Language Model based on a Bernoulli Process" should come *before* subsection on "Language Model based on a Multinomial Process"
Page 123: the second on(i,k) –> on(j,k)
Page 141: the R-precision value for R2 –> the R-precision value for q2
Page 143: Formula 4.8: instead of: $MRR(Q) = sum_{i=1}^{N_q} \frac {1} {S_{correct} ({\cal R}_i )$ it should be: $MRR(Q) = \frac{1}{N_q} sum_{i=1}^{N_q} \frac {1} {S_{correct} ( {\cal R}_i )$
Page 171: "turkey" -> "turker" (twice)
Page 192: Cv_ym –> since Su and Sv is of the same length, is it Cy_yn?
Page 192: s_u = (c_{u,x1} , s_{u, x2} ...s_{u, xn} ) --> s_u = (c_{u,x1} , c_{u, x2} ...c_{u, xn} )
Page 196: defined by equation 5.10 –> defined by equation 5.12
Page 199: only the lower frequency documents –> only the lower frequency terms
Page 202: "used to estimate an initial query using relevance feedback techniques" --> "used to estimate an expanded query using relevance feedback techniques"
Page 223: Notice that very similar documents will have a similarity value close to 0 while very different documents will have similarity close to 1 –> 1,0
Page 315: Because the number of classifiers increases exponentially with the number of classes –> quadratically
Page 318: whose weights are the error estimates of each classifier –> whose weights depend on the error estimates of each classifier
Page 319: The class that receives the highest sum of weights –> the highest score
Page 323: the maximum term information –> the maximum term mutual information
Page 343: The full inversion –> Addressing words
Page 345: "being n the text collection size" --> "being n the vocabulary size"
Page 347: "explained at the end of section 9.2.3" --> "explained before in this subsection".
Page 349, rows 10 and -6: Figure 3.3 --> Table 3.3
Page 353: "next takes O(n/M) I/O time" --> "next takes O(n) I/O time"
Page 356: q=1 and r=1 –> q=1 and r=3
Page 371: "left child" --> "right child"
Page 377: "the i-th bit set" --> "see Figure 9.20 where B is complemented"
Page 381: We do in line(6) –> line(8)
Page 460: Each search cluster … –> Change "In this figure, we show an index partitioned into n clusters with m replicas." by "In this figure, we show an indexpartitioned into m servers forming a cluster with n replicas."
Page 461: Fig. 11.7 "n replicas of the whole index" --> "m"
Page 471: H(b) --> H(p)
Page 479: "90% percent" --> "90%"
Page 489: WEST and EAST are switched in Figure 11.12
Page 491: "SiteMonkey" -> "SearchMonkey"
Page 513: "Lui" --> "Liu"
Page 517: "to to" --> "to"
Page 519: "It is critical than…" --> "It is critical that..."
Page 521: "by issuing a query" --> "or by issuing a query"
Page 523: Figure 12.6: "there should be a wait icon besides S3"
Page 525: "of they way Web..." --> "of the way Web..."
Page 530: "A possible predictor are..." --> "A possible predictor is..."
Page 576: "red" --> "red wine"
Page 637: to build tress or forests –> trees
Page 742: where is (g)? –> Disregard label (g)
Page 751: Test D –> A.4.4 Test D