Friday, February 21, 2014

Charting the Un-Discovered Country: Discovery About Discovery

Charting the Un-Discovered Country: Discovery About Discovery

Technology Assisted Review (“TAR”), also known as predictive coding, is increasingly popular as a means of controlling discovery costs, especially with large organizational defendants faced with the mounting costs of reliably searching and producing hundreds of gigabytes – or even terabytes – of potentially discoverable information.  TAR proponents tout studies showing its ability to simultaneously lower costs and increase search recall and precision (both these terms of art are discussed below).  See Maura R. Grossman & Gordon V. Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, Rich. J.L.& Tech., Spring 2011, at 8-9, (available at  Indeed, advancements in TAR technology have been made which, when properly employed, can allow predictive coding algorithms to deal effectively with situations which have complicated the use of TAR in the past, such as low-richness data sets (data sets with a low percentage of responsive documents), unrepresentative initial seed sets, and other potential complications.

The numbers, however, tell only a part of the story.  TAR is not a one-size-fits all solution, and each search process must be individually tailored to the data set under review.  This tailoring (or “training”) can be conducted in a number of different ways, but generally involves creating one or more “seed set(s)” of documents, having a human reviewer review and code the “seed” or “training” documents, and then feeding those coding decisions back to the algorithm so it can learn the distinction between relevant and non-relevant documents.  Most training processes are iterative and some TAR tools involve the correction of errors made by the algorithm.  The training process is repeated until the algorithm’s recall (the percentage of responsive documents retrieved) and precision (the percentage of retrieved documents which are responsive) are within acceptable limits.  The “seed sets” themselves can be generated in several ways, including hand-picking documents using keyword or otherwise, reviewing a random sample of documents, and/or through the use of “intelligent learning” algorithms (“active learning”) to select documents that would assist the algorithm in learning, independent of direct human intervention.  These techniques are not mutually exclusive and different predictive coding algorithms may make use of them either singly or in combination, depending on the needs of the production and the characteristics of the data set.  As noted above, not all TAR tools and protocols are created equal so some diligence is needed in selecting a tool and implementing a reasonable and defensible process. 

With TAR’s increasing popularity, the training process is being subjected to increasing scrutiny, with a growing number of attorneys seeking “discovery about discovery” to ascertain the methods used to train the TAR process – and to identify any errors, omissions, or flaws in that process.  In particular, the creation and composition of the iterative “seed sets” used to train the algorithm are often the subject of great interest to plaintiffs, who may wish to review, and potentially provide input into, the generation of the seed set and/or the conduct of the training process. 

The courts are still struggling to determine how to approach the thorny issues present at the intersection of broad discovery, the work product doctrine, the attorney-client privilege, and the cooperation protocols enunciated by the Sedona Conference.  Some have expressed hesitation regarding discovery about discovery, while others argue in favor of broad discoverability.  Notably, Judge Paul Grimm, writing in a law review, stated simply that, at least in the context of record preservation, “[i]t is axiomatic that an opponent may routinely obtain discovery of a client’s actions taken to implement the duty to preserve information[,]” explaining that “[i]t is of no moment that the…search was conducted at the direction of counsel.  Parties are permitted to inquire into an opponent’s efforts to preserve relevant information[.]”  Hon. Paul W. Grimm, et al. “Discovery About Discovery: Does the Attorney-Client Privilege Protect All Attorney-Client Communications Relating to the Preservation of Potentially Relevant Information?”, 37 Balt. L.R. 413 (2008) (available at:  While TAR is currently used principally in support of document review and production efforts, it is hard to see why the processes used by a party to identify responsive documents should be provided greater protection than the processes used to identify the location of potentially responsive documents. 

A number of courts have relied on Rule 26(f) and the Sedona Conference’s Cooperation Protocol to permit discovery about discovery.  For example, courts have compelled disclosure of the data repositories (e.g. custodians and sources) searched, as well as the search terms used to conduct that search.  See Am. Home Assurance Co. v. Greater Omaha Packing Co., Inc., No. 8:11-cv-270, 2013 U.S. Dist. LEXIS 129638, 2013 WL 4875997 (D. Neb. Sept. 11, 2013); Apple Inc. v. Samsung Electronics Co. Ltd., No. 12-cv-0630, 2013 U.S. Dist. LEXIS 67085 (N.D. Cal. May 9, 2013); Uelian de Abadia-Peixoto v. U.S. Dept. of Homeland Sec., Civ. No. 11-04001 (N.D. Cal. Aug. 23, 2013) (all compelling production of search terms); see alsoRalph Losey, More Courts Are Requiring Disclosure of Keywords, E-Discovery Law Today (May 28, 2013) (available at:  Other courts have permitted wider-ranging discovery on discovery in appropriate circumstances.  For example, in Ruiz-Bueno, III v. Scott, the federal district court for the Southern District of Ohio found that, by refusing to provide discovery on discovery, defendants had “fail[ed] to acknowledge the nuanced nature of discovery.”  No. 2:12-cv-0809, 2013 U.S. Dist. LEXIS 162953, 2013 WL 6055402 (S.D. Ohio Nov. 15, 2013).  While noting that, ideally, the need for discovery on discovery should be obviated by the Rule 26(f) planning process, the Court nevertheless held that “[s]imply put, when plaintiffs expressed some skepticism about the sufficiency of defendants’ efforts to produce…defendants’ counsel should have been forthcoming with information…[t]hat did not happen.  The Court has the power…to make that happen now.” 

            The debate about discovery of the discovery process, as it relates to TAR, primarily revolves around the “seed set”.  While this terminology implies a single “set” of documents, TAR programs are trained in many different ways, and often make use of an iterative process with multiple “sets” of documents coupled with human review and correction.  As such, discovery about the seed set should be viewed as discovery of the documents and processes used to “train” the algorithm to recognize responsive and non-responsive documents. 

            However, rather than address a deficiency after the fact as in Ruiz-Bueno, potentially after the expenditure of substantial time and expense by both parties, plaintiffs may be better off trying obtain transparency and cooperation in advance – either through agreement with defendants or by use of a motion to compel cooperation.  In Moore v. Publicis Groupe, Magistrate Judge Andrew Peck avoided the need for “discovery about discovery” by encouraging that the that the seed set be disclosed to plaintiff’s as part of the discovery protocol.  287 F.R.D. 182 (S.D.N.Y. 2012); see also William A. Gross Constr. Assocs., Inc. v. Am. Mfrs. Mut. Ins. Co., 256 F.R.D. 134 (S.D.N.Y. 2009) (Peck, M.J.) (holding that the parties must cooperate in selecting appropriate key-words to facilitate computerized search for relevant e-mails).  Indeed, under the protocol outlined in Moore, the parties agreed to participate cooperatively in an iterative process which included conferring several times regarding the composition of the seed set, and on the training process in general.  Id.  In that case defendant committed to provide to plaintiff all non-privileged documents used as part of the seed set, regardless of final relevancy.  Id. at 185, 192 (while not entering a ruling on the subject, the Court noted that “[i]f you do predictive coding, you are going to have to give your seed set, including the seed documents marked as nonresponsive to the plaintiffs counsel[.]”). 

Other courts also appear to have contemplated the establishment of the same sort of collaborative effort envisioned by the Sedona Conference and established in Moore.  See, e.g. Gordon v. Kaleida Health, No. 08-cv-378S, 2013 U.S. Dist. LEXIS 73330 (W.D.N.Y. May 21, 2013) (Foschio, M.J.);  Hinterberger v. Catholic Health Sys., No. 08-cv-380S, 2013 U.S. Dist. LEXIS 73141 (W.D.N.Y. May 21, 2013) (Foschio, M.J.); see alsoSedona Conference, The Sedona Conference Cooperation Proclamation, 10 Sedona Conf. J. 331 (2009) (available at:  But see H. Christopher Boehning & Daniel J. Toal, ‘Seed Set’ Documents Should Not Be Discoverable, New York Law Journal (Feb. 4, 2014) (available at:;  In Gordon and Hinterberger, the plaintiff moved to compel Defendants to “engage in meaningful meet and confer discussions regarding an ESI protocol” and, if an agreement could not be reached, to compel the submission by each party of a proposed protocol for adoption by the Court.  Id.  The Court in Gordon denied plaintiff’s motion in each case without prejudice, explaining that “Defendants state they are prepared to meet and confer with Plaintiffs… regarding Defendants’ ESI production using predictive coding… [a]ccordingly, it is not necessary for the court to further address the merits of Plaintiffs’ motion at this time.”  Gordon, 2013 U.S. Dist. LEXIS at *11; see also Hinterberger, 2013 U.S. Dist. LEXIS at *10 (same).   While the Court in Gordon and Hinterberger did not find any need to enter an order, given defendants’ expressed willingness to cooperate, where defendants prove unwilling to cooperate – or where there is reason to doubt the sufficiency of their production – a court may prove more amenable to compelling either cooperation or permitting discovery on discovery, as did the court in Ruiz-Bueno, above.  2013 U.S. Dist. LEXIS 162953, 2013 WL 6055402.

At least one court, however, has taken a more restrictive view about the discoverability of the seed set.  This position is well summarized by the federal district court for the Northern District of Indiana in In Re: Biomet M2A Magnum Hip Implant Prods. Liability Litig., No. 3:12-MD-2391, 2013 U.S. Dist. LEXIS 172570 (N.D. Ind. Aug. 21, 2013).  In that case, plaintiff requested that defendant produce “the discoverable documents used in the training of the ‘predictive coding’ algorithm.”  Defendants disclosed only that the discoverable documents used in the training had already been provided, without specifically identifying those documents.  Id. at *2.  After first noting that it was “self evident” that plaintiff did not have a right to discover the entirety of the “seed set” used to train the algorithm, the Court addressed whether defendant was required to disclose which of the admittedly responsive documents were used in training the algorithm.  Id. at *3.  The Court held that Rule 26(b)(1) does not permit discovery into the use to which defendant put discoverable documents prior to their production.  Id. at *4.  Nevertheless, the Court called defendant’s refusal to cooperate “troubling” and indicated that, although it could not compel production of the seed set, that “[defendant]’s cooperation falls below what the Sedona Conference endorses[,]”, going on to state that “[a]n unexplained lack of cooperation in discovery can lead a court to question why the uncooperative party is hiding something, and such questions can affect the exercise of discretion.”  Id. at *5. 

            Given the concerns identified by Biomet, plaintiff’s counsel should work assiduously with defense counsel to arrive at an agreeable protocol whereby, to the extent practicable, they are able to review and participate in the creation of the “seed set” and the training of the TAR algorithm used.  This cooperation is desirable not only for its potential to resolve this issue with a minimum of time and expense, but also to ensure that the real goal – maximal production of responsive information in an efficient, and timely, fashion – can be achieved with a minimum of collateral litigation.  It is worth noting that even the Court in Biomet found defendant’s lack of transparency “troubling”.  As such, it may be that courts would be more open to mandating such cooperation than to mandating after-the-fact discovery on discovery.  That said, if such a compromise cannot be reached, plaintiff should consider moving to compel cooperation and/or to compel entry of a cooperative discovery protocol. 

Please be sure to visit our website at