/docs/cs/ Directory Listing

Directories

Files

  • 1955-nash-nsacryptography.pdf: ⁠, John Nash (1955-01; backlinks):

    [In 1955, well-known mathematician John Nash was in correspondence with the United States National Security Agency. In these letters, Nash proposes a novel enciphering scheme. He also sets forth an important cryptographic principle that now underpin modern computational complexity theory and cryptography. In particular, he proposes a natural definition for “[security] in a practical sense”—that exponential computational effort is required for an enemy to recovery a secret key. Nash further conjectures that this property holds for any suitable enciphering mechanism.

    These correspondences, recently declassified by the NSA [1]; this document is the NSA scan with original images and drawings of Nash’s handwritten letters.]

  • 1955-nash-nsacryptography.pdf: ⁠, John Nash (1955-01; backlinks):

    [In 1955, well-known mathematician John Nash was in correspondence with the United States National Security Agency. In these letters, Nash proposes a novel enciphering scheme. He also sets forth an important cryptographic principle that now underpin modern computational complexity theory and cryptography. In particular, he proposes a natural definition for “[security] in a practical sense”—that exponential computational effort is required for an enemy to recovery a secret key. Nash further conjectures that this property holds for any suitable enciphering mechanism.

    These correspondences, recently declassified by the NSA [1]; this document is the NSA scan with original images and drawings of Nash’s handwritten letters.]

  • 1955-nash-typeset.pdf: ⁠, John Nash, Mike Rosulek (2012-02-20; backlinks):

    In 1955, well-known mathematician John Nash was in correspondence with the United States National Security Agency. In these letters, Nash proposes a novel enciphering scheme. He also sets forth an important cryptographic principle that now underpin modern computational complexity theory and cryptography. In particular, he proposes a natural definition for “[security] in a practical sense”—that exponential computational effort is required for an enemy to recovery a secret key. Nash further conjectures that this property holds for any suitable enciphering mechanism.

    These correspondences, recently declassified by the NSA [1], have been transcribed and typeset in this document. [Typeset by Mike Rosulek: Department of Computer Science, University of Montana, ]

  • 1959-shannon.pdf: ⁠, Claude Shannon (1959; backlinks):

    Consider a discrete source producing a sequence of message letters from a finite alphabet. A single-letter distortion measure is given by an non-negative matrix (dij). The entry dij measures the “cost” or “distortion” if letter i is reproduced at the receiver as letter j. The average distortion of a communications system (source-coder-noisy channel-decoder) is taken to be where Pij is the probability of i being reproduced as j. It is shown that there is a function R(d) that measures the “equivalent rate” of the source for a given level of distortion. For coding purposes where a level d of distortion can be tolerated, the source acts like one within information rate R(d). Methods are given for calculating R(d), and various properties discussed. Finally, generalizations to ergodic sources, to continuous sources, and to distortion measures involving blocks of letters are developed.

  • 1961-rao.pdf: ⁠, C. Radhakrishna Rao (1961-08-01; backlinks):

    A general method is given for generating random permutations of integers using a table of random sampling numbers and without wasting the random numbers read. This is more convenient in practice, specially when random permutations of large numbers of elements are needed. It is suggested that even for permutations of small numbers, the method offers greater scope than consulting a table of a limited number of random permutations. [See also ⁠, hence the description of this as “Rao-Sandelius shuffling”.]

  • 1962-rado.pdf: ⁠, T. Rado (1962-05-01; backlinks):

    [On the ⁠.] The construction of non-computable functions used in this paper is based on the principle that a finite, non-empty set of non-negative integers has a largest element. Also, this principle is used only for sets which are exceptionally well-defined by current standards. No enumeration of computable functions is used, and in this sense the diagonal process is not employed. Thus, it appears that an apparently self-evident principle, of constant use in every area of mathematics, yields non-constructive entities.

  • 1962-sandelius.pdf: ⁠, Martin Sandelius (1962-07-01; backlinks):

    The paper describes a randomization procedure consisting in distributing a deck of cards into 10 decks using random decimal digits and repeating this step with each deck consisting of three or more cards. One random digit is used for randomizing a deck of two cards. This procedure, which is essentially a particular case of a general procedure described by ⁠, is called the “multistage randomization procedure”, or MRP. Some applications are described. A recursive formula is given for the expected number of random digits required by MRP for the randomization of n symbols. A measure of the efficiency of a randomization procedure is presented. The efficiency of MRP is compared with the efficiencies of two other randomization procedures, and it is proved that MRP has an asymptotic efficiency of 100%.

  • 1964-gluskin.pdf: “FLODAC - A Pure Fluid Digital Computer”⁠, R. S. Gluskin, M. Jacoby, T. D. Reader

  • 1979-may.pdf: ⁠, Timothy C. May, M. H. Woods (1979-01-01):

    A new physical soft error mechanism in dynamic RAM’s and CCD’s is the upset of stored data by the passage of alpha particles through the memory array area. The alpha particles are emitted by the radioactive decay of uranium and thorium which are present in parts-per-million levels in packaging materials. When an alpha particle penetrates the die surface, it can create enough electron-hole pairs near a storage node to cause a random, single-bit error. Results of experiments and measurements of alpha activity of materials are reported and a physical model for the soft error is developed. Implications for the future of dynamic memories are also discussed.

  • 1980-rytter.pdf: ⁠, Wojciech Rytter (1980-01-01):

    We present the correction to Knuth’s algorithm [2] for computing the table of pattern shifts later used in the Boyer-Moore algorithm for pattern matching.

  • 1981-cohen.pdf: ⁠, Daniel Cohen (1981-10-01; backlinks):

    Which bit should travel first? The bit from the big end or the bit from the little end? Can a war between Big Endians and Little Endians be avoided?

    This article was written in an attempt to stop a war. I hope it is not too late for peace to prevail again. Many believe that the central question of this war is, What is the proper byte order in messages? More specifically, the question is, Which bit should travel first-the bit from the little end of the word or the bit from the big end of the word? Followers of the former approach are called Little Endians, or Lilliputians; followers of the latter are called Big Endians, or Blefuscuians. I employ these Swiftian terms because this modern conflict is so reminiscent of the holy war described in Gulliver’s Travels.

    …To sum it all up, there are two camps, each with its own language. These languages are as compatible with each other as any Semitic and Latin languages. All Big Endians can talk only to each other. So can all the Little Endians, although there are some differences among the dialects used by different tribes. There is no middle ground—only one end can go first. As in all the religious wars of the past, power—not logic—will be the decisive factor. This is not the first holy war, and will probably not be the last. The “Reasonable, do it my way” approach does not work. Neither does the Esperanto approach of switching to yet another new language. Lest our communications world split along theses lines, we should take note of a certain book (not mentioned in the references), which has an interesting story about a similar phenomenon: the Tower of Babel. Lilliput and Blefuscu will never come to terms of their own free will. We need some Gulliver between the two islands to force a unified communication regime on all of us.

    Of course, I hope that my way will be chosen, but it is not really critical. Agreement upon an order is more important than the order agreed upon.

    Shall we toss a coin?"

  • 1982-perlis.pdf: ⁠, Alan J. Perlis (1982-09-1; backlinks):

    [130 epigrams on computer science and technology, published in 1982, for ACM’s SIGPLAN journal, by noted computer scientist and programming language researcher ⁠. The epigrams are a series of short, programming-language-neutral, humorous statements about computers and programming, distilling lessons he had learned over his career, which are widely quoted.]

    8. A programming language is low level when its programs require attention to the irrelevant….19. A language that doesn’t affect the way you think about programming, is not worth knowing….54. Beware of the Turing tar-pit in which everything is possible but nothing of interest is easy.

    15. Everything should be built top-down, except the first time….30. In programming, everything we do is a special case of something more general—and often we know it too quickly….31. Simplicity does not precede complexity, but follows it….58. Fools ignore complexity. Pragmatists suffer it. Some can avoid it. Geniuses remove it….65. Make no mistake about it: Computers process numbers—not symbols. We measure our understanding (and control) by the extent to which we can arithmetize an activity….56. Software is under a constant tension. Being symbolic it is arbitrarily perfectible; but also it is arbitrarily changeable.

    1. One man’s constant is another man’s variable. 34. The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information.

    36. The use of a program to prove the 4-color theorem will not change mathematics—it merely demonstrates that the theorem, a challenge for a century, is probably not important to mathematics.

    39. Re graphics: A picture is worth 10K words—but only those to describe the picture. Hardly any sets of 10K words can be adequately described with pictures.

    48. The best book on programming for the layman is Alice in Wonderland; but that’s because it’s the best book on anything for the layman.

    77. The cybernetic exchange between man, computer and algorithm is like a game of musical chairs: The frantic search for balance always leaves one of the 3 standing ill at ease….79. A year spent in artificial intelligence is enough to make one believe in God….84. Motto for a research laboratory: What we work on today, others will first think of tomorrow.

    91. The computer reminds one of Lon Chaney—it is the machine of a thousand faces.

    7. It is easier to write an incorrect program than understand a correct one….93. When someone says “I want a programming language in which I need only say what I wish done,” give him a lollipop….102. One can’t proceed from the informal to the formal by formal means.

    100. We will never run out of things to program as long as there is a single program around.

    108. Whenever 2 programmers meet to criticize their programs, both are silent….112. Computer Science is embarrassed by the computer….115. Most people find the concept of programming obvious, but the doing impossible. 116. You think you know when you can learn, are more sure when you can write, even more when you can teach, but certain when you can program. 117. It goes against the grain of modern education to teach children to program. What fun is there in making plans, acquiring discipline in organizing thoughts, devoting attention to detail and learning to be self-critical?

  • 1983-brady.pdf: ⁠, Allen H. Brady (1983-04-01; backlinks):

    The well-defined but noncomputable functions Σ(k) and S(k)) given by as the “score” and “shift number” for the k-state Turing machine “ Game” were previously known only for k ≤ 3. The largest known lower bounds yielding the relations Σ(4) ≥ 13 and S(4) ≥ 107, reported by this author, supported the conjecture that these lower bounds are the actual particular values of the functions for k = 4.

    The four-state case has previously been reduced to solving the blank input tape halting problem of only 5,820 individual machines. In this final stage of the k = 4 case, one appears to move into a heuristic level of higher order where it is necessary to treat each machine as representing a distinct theorem.

    The remaining set consists of two primary classes in which a machine and its tape are viewed as the representation of a growing string of cellular automata. The proof techniques, embodied in programs, are entirely heuristic, while the inductive proofs, once established by the computer, are completely rigorous and become the key to the proof of the new and original mathematical results: Σ(4) = 13 and S(4) = 107.

  • 1985-feynman-surelyyourejokingmrfeynman-ch18-safecrackermeetsafecracker.pdf: ⁠, Richard Feynman (1985; backlinks):

    [In this chapter, Feynman discusses his fascination with locks and safes, and the ways he learns to crack different locks and safes. He also talks about the locks and safes there were at Los Alamos (the ones that held the secrets of the atomic bomb). Generally speaking, the safes containing this material, which is (understandably) considered top secret, are startlingly easy to crack. When Feynman and his colleagues first arrive at Los Alamos, the construction of the facility is not even complete yet, and there is almost no security at all for the “top secret” information. Even later, when the information is stored in safes and locked file cabinets, the locks are generally cheap, and it is easy for Feynman to determine what the combination is, as combinations were often reused. He can memorize common passwords, or watch people dialing carelessly to deduce several digits, and then brute-force the rest; often, people do not even change the factory default.]

  • 1985-lehman-programevolution.pdf

  • 1986-bobrow.pdf

  • 1987-baker.pdf: “The Symbolics Ivory Processor: A 40 Bit Tagged Architecture Lisp Microprocessor”⁠, Clark Baker, David Chan, Jim Cherry, Alan Corry, Greg Efland, Bruce Edwards, Mark Matson, Henry Minsky, Eric Nestler, Kalman Reti, David Sarrazin, Charles Sommer, David Tan, Neil Weste

  • 1987-conway.pdf: ⁠, John H. Conway (1987-01-01; backlinks):

    To play the fraction game corresponding to a given list

    f1, f2, …, fk

    of fractions and starting integer N, you repeatedly multiply the integer you have at any stage (initially N) by the earliest fi in the list for which the answer is integral. Whenever there is no such fi, the game stops.

  • 1987-moon.pdf

  • 1988-borenstein.pdf

  • 1989-devore.pdf: ⁠, Ronald A. DeVore, Ralph Howard, Charles Micchelli (1989):

    We introduce a definition of nonlinear n-widths and then determine the n-widths of the unit ball of the Sobolev space W[^r^~p~]{.supsub} in Lq. We prove that in the sense of these widths the manifold of splines of fixed degree with n free knots is optimal for approximating functions in these Sobolev spaces.

  • 1989-robson.pdf: “Separating strings with small automata”⁠, J. M. Robson

  • 1990-miller.pdf: ⁠, Barton P. Miller, Louis Fredriksen, Bryan So (1990; backlinks):

    Operating system facilities, such as the kernel and utility programs, are typically assumed to be reliable. In our recent experiments, we have been able to crash 25–33% of the utility programs on any version of UNIX that was tested. This report describes these tests and the program bugs that caused the crashes…The following section describes the tools we built to test the utilities. These tools include the fuzz (random character) generator, ptyjig (to test interactive utilities), and scripts to automate the testing process. Next, we will describe the tests we performed, giving the types of input we presented to the utilities. Results from the tests will follow along with an analysis of the results, including identification and classification of the program bugs that caused the crashes. The final section presents concluding remarks, including suggestions for avoiding the types of problems detected by our study and some commentary on the bugs we found. We include an Appendix with the user manual pages for fuzz and ptyjig.

  • 1993-marcus.pdf: ⁠, Mitchell Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz (1993-10-01; backlinks):

    In this paper, we review our experience with constructing one such large annotated corpus—the Penn Treebank, a corpus consisting of over 4.5 million words of American English. During the first three-year phase of the Penn Treebank Project (1989–1992), this corpus has been annotated for part-of-speech (POS) information. In addition, over half of it has been annotated for skeletal syntactic structure.

  • 1994-binstock-lucidlost.pdf

  • 1994-burke-informationandsecrecyvannevarbushultrandtheothermemex.pdf: “Information and Secrecy: Vannevar Bush, Ultra, and the Other Memex”⁠, Colin Burke

  • 1994-levitin.pdf: ⁠, Lev B. Levitin, Zeev Reingold (1994-05-01):

    The concept of the entropy of natural languages, first introduced by Shannon 1948 and its importance is discussed. A review of various known approaches to and results of previous studies of language entropy is presented. A new improved method for evaluation of both lower and upper bounds of the entropy of printed texts is developed. This method is a refinement of Shannon 1951’s prediction (guessing) method. The evaluation of the lower bound is shown to be a classical linear programming problem. Statistical analysis of the estimation of the bounds is given and procedures for the statistical treatment of the experimental data (including verification of statistical validity and statistical-significance) are elaborated. The method has been applied to printed Hebrew texts in a large experiment (1000 independent samples) in order to evaluate entropy and other information-theoretical characteristics of the Hebrew language. The results have demonstrated the efficiency of the new method: the gap between the upper and lower bounds of entropy has been reduced by a factor of 2.25 compared to the original Shannon approach. Comparison with other languages is given. Possible applications of the method are briefly discussed.

  • 1996-11-15-usenix-ithorrorstories.html

  • 1997-bejan.pdf

  • 1998-reeds.pdf: ⁠, Jim Reeds (1998):

    Book III of (written ca. 1500) contains hidden cipher messages within what is ostensibly a work on magic. After almost 500 years these cryptograms have been detected and solved. (Since 1606 it was known that similar ciphers were present in Books I and II.) As a result the Steganographia can no longer be regarded as one of the main early modern demonological treatises but instead stands unambiguously revealed as the first book-length treatment of cryptography in Europe. [Keywords: Trithemius, Steganographia, history of cryptography.]

  • 2000-mao.pdf: “LNCS 1796 - Time-Lock Puzzle with Examinable Evidence of Unlocking Time”⁠, Wenbo Mao (backlinks)

  • 2001-12-02-treginaldgibbons-isyoursonacomputerhacker.html

  • 2001-demarco-peopleware-whymeasureperformance.pdf: "“Alt”⁠, GdeKerchove (backlinks)

  • 2001-pawson.pdf: ⁠, Richard Pawson, Robert Matthews (2001-12-01):

    Naked objects is an approach to systems design in which core business objects show directly through to the user interface, and in which all interaction consists of invoking methods on those objects in the noun-verb style. One advantage of this approach is that it results in systems that are more expressive from the viewpoint of the user: they treat the user like a problem solver, not as merely a process-follower. Another advantage is that the 1:1 mapping between the user’s representation and the underlying model means that it is possible to auto-generate the former from the latter, which yields benefits to the development process. The authors have designed a Java-based, open source toolkit called Naked Objects which facilitates this style of development. This paper describes the design and operation of the toolkit and its application to the prototyping of a core business system. Some initial feedback from the project is provided, together with a list of future research directions both for the toolkit and for a methodology to apply the naked objects approach.

  • 2001-scott.pdf: ⁠, Kevin Scott (2001-03-01; backlinks):

    In 1965 Gordon Moore observed that the capacity of semiconductor ICs doubled every 18 to 24 months. This trend, now known as Moore’s Law, has held for over 25 years and is responsible for the exponential increase in microprocessor performance over this period. In 1998 Todd Proebsting made a similar-in-spirit, but altogether less optimistic observation about optimizing compilers. His observation, henceforth known as ⁠, is that improvements to compiler technology double the performance of typical programs every 18 years. Proebsting has suggested an experiment to evaluate the veracity of his observation. This paper presents the results of this experiment and some comments on what Proebsting’s Law portends for compiler research.

  • 2002-magkos.pdf (backlinks)

  • 2003-candea.pdf: ⁠, George Candea, Armando Fox (2003-05-01; backlinks):

    Crash-only programs crash safely and recover quickly. There is only one way to stop such software—by crashing it—and only one way to bring it up—by initiating recovery. Crash-only systems are built from crash-only components, and the use of transparent component-level retries hides intra-system component crashes from end users. In this paper we advocate a crash-only design for Internet systems, showing that it can lead to more reliable, predictable code and faster, more effective recovery. We present ideas on how to build such crash-only Internet services, taking successful techniques to their logical extreme.

  • 2003-fortnow.pdf: “A Short History of Computational Complexity”⁠, Fortnow, Lance, Homer, Steve (backlinks)

  • 2003-ross.pdf: “5 Commandments Some technology laws and rules of thumb have stood the test of time, but not all. Spectrum looks at five to see how they have fared, starting with the grandaddy of them all-- - Spectrum, IEEE (backlinks)

  • 2005-09-30-smith-whyihateframeworks.html

  • 2005-knuth-taocp-v4-prefascicle4b.pdf: “History of Combinatorial Generation (The Art of Computer Programming: Volume 4: Pre-Fascicle 4B: Section 7.2.1.7)”⁠, Donald E. Knuth (backlinks)

  • 2006-jared-wikimediaprovesgreenspunstenthlaw.html (backlinks)

  • 2007-drepper.pdf: ⁠, Ulrich Drepper (2007-11-12; backlinks):

    As CPU cores become increasingly faster, the limiting factor for most programs is now, and will be for some time, memory access. Hardware designers have come up with ever-more sophisticated memory-handling and memory-acceleration techniques—such as CPU caches—but these cannot work optimally without some help from the programmer. Unfortunately, neither the structure nor the cost of using the memory subsystem of a computer or the caches on CPU is well understood by most programmers. This paper explains the structure of memory subsystems in use on modern commodity hardware, illustrating why CPU caches were developed, how they work, and what programs should do to achieve optimal performance by utilizing them.

    [Some parts are obsolete as of 2017.]

  • 2008-changizi.pdf: ⁠, Mark Changizi (2008-01-01; backlinks):

    Might it be possible to harness the visual system to carry out artificial computations, somewhat akin to how DNA has been harnessed to carry out computation? I provide the beginnings of a research programme attempting to do this. In particular, new techniques are described for building ‘visual circuits’ (or ‘visual software’) using wire, NOT, OR, and AND gates in a visual modality such that our visual system acts as ‘visual hardware’ computing the circuit, and generating a resultant perception which is the output.

  • 2009-gluck.pdf: “Is There A Fourth Futamura Projection?”⁠, Glueck Robert

  • 2009-mytkowicz.pdf: ⁠, Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, Peter F. Sweeney (2009-03-07; backlinks):

    This paper presents a surprising result: changing a seemingly innocuous aspect of an experimental setup can cause a systems researcher to draw wrong conclusions from an experiment. What appears to be an innocuous aspect in the experimental setup may in fact introduce a substantial bias in an evaluation. This phenomenon is called measurement bias in the natural and social sciences.

    Our results demonstrate that measurement bias is substantial and commonplace in computer system evaluation. By significant we mean that measurement bias can lead to a performance analysis that either over-states an effect or even yields an incorrect conclusion. By commonplace we mean that measurement bias occurs in all architectures that we tried (Pentium 4, Core 2, and m5 O3CPU), both compilers that we tried (gcc and Intel’s C compiler), and most of the SPEC CPU2006 C programs. Thus, we cannot ignore measurement bias. Nevertheless, in a literature survey of 133 recent papers from ASPLOS, PACT, PLDI, and CGO, we determined that none of the papers with experimental results adequately consider measurement bias.

    Inspired by similar problems and their solutions in other sciences, we describe and demonstrate two methods, one for detecting (causal analysis) and one for avoiding (setup randomization) measurement bias. [Keywords: experimentation, measurement, performance, bias]

  • 2009-narayanan.pdf: ⁠, Arvind Narayanan, Vitaly Shmatikov (2009-05-17; backlinks):

    Operators of online social networks are increasingly sharing potentially sensitive information about users and their relationships with advertisers, application developers, and data-mining researchers. Privacy is typically protected by anonymization, i.e., removing names, addresses, etc.

    We present a framework for analyzing privacy and anonymity in social networks and develop a new re-identification algorithm targeting anonymized social-network graphs. To demonstrate its effectiveness on real-world networks, we show that a third of the users who can be verified to have accounts on both Twitter, a popular microblogging service, and Flickr, an online photo-sharing site, can be re-identified in the anonymous Twitter graph with only a 12% error rate.

    Our de-anonymization algorithm is based purely on the network topology, does not require creation of a large number of dummy “sybil” nodes, is robust to noise and all existing defenses, and works even when the overlap between the target network and the adversary’s auxiliary information is small.

  • 2010-aldubaee.pdf: ⁠, Shawki A. Al-Dubaee, Nesar Ahmad (2010-08-05):

    This paper proposes a new strategy that is based on the signal processing tools applied to text compression of files namely, the and the ⁠.

    The influence of compression size and threshold of wavelet filters and the Fourier transform as well as 2 parameters: families of wavelet filters and decomposition levels, on compression factor of text files are investigated. The experimental results are shown that the wavelet and the Fourier transforms are suitable for lossy text compression with non-stationary text signal files. In addition, the Fourier transform is the most suitable with files which have same characters such as aaa.txt and aaaa.txt files. However, the results of wavelet and Fourier transforms are lossless text compression with stationary text signal files (aaa.txt and aaaa.txt files).

    This research also represents a step forwards dealing with both images and text compression i.e. multimedia compression. [Keywords: Wavelet transforms, Fourier transforms, ASCII, lossy, text compression]

  • 2010-bates.pdf: “BatesTalk”⁠, Joseph Bates (backlinks)

  • 2010-nordhaus-nordhaus2007twocenturiesofproductivitygrowthincomputing-appendix.xlsx (backlinks)

  • 2010-pcast-nitrd.pdf (backlinks)

  • 2010-ren.pdf#google: ⁠, Gang Ren; Tune, E.; Moseley, T.; Yixin Shi; Rus, S.; Hundt, R. (2010-08-19; backlinks):

    Google-Wide Profiling (GWP), a continuous profiling infrastructure for data centers, provides performance insights for cloud applications. With negligible overhead, GWP provides stable, accurate profiles and a datacenter-scale tool for traditional performance analyses. Furthermore, GWP introduces novel applications of its profiles, such as application-platform affinity measurements and identification of platform-specific, microarchitectural peculiarities.

  • 2011-pellegrino.pdf: ⁠, François Pellegrino, Christophe Coupé, Egidio Marsico (2011-09; backlinks):

    This article is a cross-linguistic investigation of the hypothesis that the average information rate conveyed during speech communication results from a trade-off between average information density and speech rate. The study, based on seven languages, shows a negative correlation between density and rate, indicating the existence of several encoding strategies. However, these strategies do not necessarily lead to a constant information rate. These results are further investigated in relation to the notion of syllabic complexity.

  • 2011-yelick.pdf: “Exascale Computing: Technical Challenges”⁠, Kathy Yelick (backlinks)

  • 2012-02-12-arvindnarayanan-iswritingstylesufficienttodeanonymizematerialonline.html

  • 2012-jarvisalo.pdf: ⁠, Matti Järvisalo, Daniel Le Berre, Olivier Roussel, Laurent Simon (2012-03-15; backlinks):

    The International SAT Solver Competition is today an established series of competitive events aiming at objectively evaluating the progress in state-of-the-art procedures for solving (SAT) instances.

    Over the years, the competitions have substantially contributed to the fast progress in SAT solver technology that has made SAT a practical success story of computer science. This short article provides an overview of the SAT solver competitions.

    Figure 2: Performance Evolution of the Best  Solvers from 2002 to 2011. The farther to the right the data points are, the better the solver.
  • 2012-johnson.pdf: ⁠, David S. Johnson (2012-01-01; backlinks):

    The year 2012 marks the 40th anniversary of the publication of the influential paper “Reducibility among combinatorial problems” by Richard Karp. This paper was the first to demonstrate the wide applicability of the concept now known as NP-completeness, which had been introduced the previous year by Stephen Cook and Leonid Levin, independently. 2012 also marks the 100th anniversary of the birth of Alan Turing, whose invention of what is now known as the “Turing machine” underlay that concept. In this chapter, I shall briefly sketch the history and pre-history of NP-completeness (with pictures), and provide a brief personal survey of the developments in the theory over the last 40 years and their impact (or lack thereof) on the practice and theory of optimization. I assume the reader is familiar with the basic concepts of NP-completeness, P, and NP, although I hope the story will still be interesting to those with only a fuzzy recollection of the definitions.

  • 2012-terencetao-anonymity.html (backlinks)

  • 2013-geravand.pdf: ⁠, Shahabeddin Geravand, Mahmood Ahmadi (2013-12-24):

    Undoubtedly, dealing with security issues is one of the most important and complex tasks various networks face today. A large number of security algorithms have been proposed to enhance security in various types of networks.

    Many of these solutions are either directly or indirectly based on (BF), a space-efficient and time-efficient probabilistic data structure introduced by Burton Bloom in 1970. Obviously, Bloom filters and their variants are getting more and more consideration in network security area.

    This paper provides an up-to-date survey of the application of BFs and their variants to improve performance of the approaches proposed to address security problems with different types of networks.

  • 2015-brooks.pdf: ⁠, Fred Brooks, Len Shustek (2015-01-01):

    ACM Fellow and A.M. Turing Award recipient Fred Brooks reflects on his career.

  • 2015-kanev.pdf#google: ⁠, Svilen Kanev, Juan Pablo Darago, Kim M. Hazelwood, Parthasarathy Ranganathan, Tipp J. Moseley, Gu-Yeon Wei, David Michael Brooks (2015-06-01; backlinks):

    With the increasing prevalence of warehouse-scale (WSC) and cloud computing, understanding the interactions of server applications with the underlying microarchitecture becomes ever more important in order to extract maximum performance out of server hardware. To aid such understanding, this paper presents a detailed microarchitectural analysis of live datacenter jobs, measured on more than 20,000 Google machines over a three year period, and comprising thousands of different applications.

    We first find that WSC workloads are extremely diverse, breeding the need for architectures that can tolerate application variability without performance loss. However, some patterns emerge, offering opportunities for co-optimization of hardware and software. For example, we identify common building blocks in the lower levels of the software stack. This “datacenter tax” can comprise nearly 30% of cycles across jobs running in the fleet, which makes its constituents prime candidates for hardware specialization in future server systems-on-chips. We also uncover opportunities for classic microarchitectural optimizations for server processors, especially in the cache hierarchy. Typical workloads place substantial stress on instruction caches and prefer memory latency over bandwidth. They also stall cores often, but compute heavily in bursts. These observations motivate several interesting directions for future warehouse-scale computers.

  • 2015-kuppusamy.pdf (backlinks)

  • 2015-mcsherry.pdf: ⁠, Frank McSherry, Michael Isard, Derek G. Murray (2015-05; backlinks):

    We offer a new metric for big data platforms, COST, or the Configuration that Outperforms a Single Thread. The COST of a given platform for a given problem is the hardware configuration required before the platform outperforms a competent single-threaded implementation. COST weighs a system’s scalability against the overheads introduced by the system, and indicates the actual performance gains of the system, without rewarding systems that bring substantial but parallelizable overheads.

    We survey measurements of data-parallel systems [for graph processing] recently reported in SOSP and OSDI, and find that many systems have either a surprisingly large COST, often hundreds of cores, or simply underperform one thread for all of their reported configurations.

  • 2017-denning.pdf: ⁠, Peter J. Denning, Ted G. Lewis (2017-01-01):

    Moore’s Law is one small component in an exponentially growing planetary computing ecosystem.

  • 2017-hazel.pdf: “From Punched Cards to Flat Screens: A Technical Autobiography”⁠, Philip Hazel

  • 2017-ridel.pdf: “Mechanical Turing Machine in Wood Explained”⁠, Richard J. Ridel

  • 2017-zhou.pdf: ⁠, Junfeng Zhou, Shijie Zhou, Jeffrey Xu Yu, Hao Wei, Ziyang Chen, Xian Tang (2017-05-01):

    Answering reachability queries is one of the fundamental graph operations. The existing approaches build indexes and answer reachability queries on a directed acyclic graph (DAG) G, which is constructed by coalescing each strongly connected component of the given directed graph G into a node of G. Considering that G can still be large to be processed efficiently, there are studies to further reduce G to a smaller graph. However, these approaches suffer from either inefficiency in answering reachability queries, or cannot scale to large graphs.

    In this paper, we study DAG reduction to accelerate reachability query processing, which reduces the size of G by computing transitive reduction (TR) followed by computing equivalence reduction (ER). For ER, we propose a divide-and-conquer algorithm, namely linear-ER. Given the result Gt of TR, linear-ER gets a smaller DAG Gε in linear time based on equivalence relationship between nodes in G. Our DAG reduction approaches (TR and ER) substantially improve the cost of time and space, and can be scaled to large graphs. We confirm the efficiency of our approaches by extensive experimental studies for TR, ER, and reachability query processing using 20 real datasets.

  • 2019-07-15-danielbali-citiesskylineisturingcomplete.html (backlinks)

  • 2019-aaronson.pdf: ⁠, Scott Aaronson (2019-08-28; backlinks):

    The ⁠, with its incomprehensibly rapid growth, has captivated generations of computer scientists, mathematicians, and hobbyists. In this survey, I offer a personal view of the BB function 58 years after ⁠, emphasizing lesser-known insights, recent progress, and especially favorite open problems.

    Examples of such problems include: when does the BB function first exceed the ? Is the value of BB(20) independent of set theory? Can we prove that BB(n + 1) > 2BB(n) for large enough n? Given BB(n), how many advice bits are needed to compute BB(n + 1)? Do all Busy Beavers halt on all inputs, not just the 0 input? Is it decidable, given n, whether BB(n) is even or odd?

  • 2019-kleppmann.pdf: ⁠, Martin Kleppmann, Adam Wiggins, Peter van Hardenberg, Mark McGranaghan (Ink & Switch) (2019-10-23; backlinks):

    Cloud apps like Google Docs and Trello are popular because they enable real-time collaboration with colleagues, and they make it easy for us to access our work from all of our devices. However, by centralizing data storage on servers, cloud apps also take away ownership and agency from users. If a service shuts down, the software stops functioning, and data created with that software is lost.

    In this article we propose local-first software, a set of principles for software that enables both collaboration and ownership for users. Local-first ideals include the ability to work offline and collaborate across multiple devices, while also improving the security, privacy, long-term preservation, and user control of data.

    We survey existing approaches to data storage and sharing, ranging from email attachments to web apps to Firebase-backed mobile apps, and we examine the trade-offs of each. We look at Conflict-free Replicated Data Types (CRDTs): data structures that are multi-user from the ground up while also being fundamentally local and private. CRDTs have the potential to be a foundational technology for realizing local-first software.

    We share some of our findings from developing local-first software prototypes at the Ink & Switch research lab over the course of several years. These experiments test the viability of CRDTs in practice, and explore the user interface challenges for this new data model. Lastly, we suggest some next steps for moving towards local-first software: for researchers, for app developers, and a startup opportunity for entrepreneurs. [Keywords: collaboration software, mobile computing, data ownership, CRDTs, peer-to-peer communication]

  • 2020-fleder.pdf: ⁠, Michael Fleder, Devavrat Shah (2020-12-01):

    We consider the question of identifying which set of products are purchased and at what prices in a given transaction by observing only the total amount spent in the transaction, and nothing more. The ability to solve such an inverse problem can lead to refined information about consumer spending by simply observing anonymized credit card transactions data.

    Indeed, when considered in isolation, it is impossible to identify the products purchased and their prices from a given transaction just based on the transaction total. However, given a large number of transactions, there may be a hope. As the main contribution of this work, we provide a robust estimation algorithm for decomposing transaction totals into the underlying, individual product(s) purchased by utilizing a large corpus of transactions.

    Our method recovers a (product prices) vector _p_ ∈ ℝ[^*N*^~\>0~]{.supsub} of unknown dimension (number of products) N as well as matrix A ∈ ℤ[^*M*×*N*^~≥0~]{.supsub} simply from M observations (transaction totals) y ∈ ℝ[^*M*^~\>0~]{.supsub} such that y = Ap + η with η ∈ ℝM representing noise (taxes, discounts, etc). We formally establish that our algorithm identifies N, A precisely and p approximately, as long as each product is purchased individually at least once, i.e. MN and A has rank N. Computationally, the algorithm runs in polynomial time (with respect to problem parameters), and thus we provide a computationally efficient and statistically robust method for solving such inverse problems.

    We apply the algorithm to a large corpus of anonymized consumer credit card transactions in the period 2016–2019, with data obtained from a commercial data vendor. The transactions are associated with spending at Apple, Chipotle, Netflix, and Spotify. From just transactions data, our algorithm identifies (1) key price points (without access to the listed prices), (2) products purchased within a transaction, (3) product launches, and (4) evidence of a new ‘secret’ product from Netflix—rumored to be in limited release. [Keywords: blind compressed sensing; alternative data; finance; consumer credit card transactions]

  • 2020-leiserson.pdf: ⁠, Charles E. Leiserson, Neil C. Thompson, Joel S. Emer, Bradley C. Kuszmaul, Butler W. Lampson, Daniel Sanchez, Tao B. Schardl (2020-06-05; backlinks):

    From bottom to top: The doubling of the number of transistors on a chip every 2 years, a seemly inevitable trend that has been called Moore’s law, has contributed immensely to improvements in computer performance. However, silicon-based transistors cannot get much smaller than they are today, and other approaches should be explored to keep performance growing. Leiserson et al. review recent examples and argue that the most promising place to look is at the top of the computing stack, where improvements in software, algorithms, and hardware architecture can bring the much-needed boost…The miniaturization of semiconductor transistors has driven the growth in computer performance for more than 50 years. As miniaturization approaches its limits, bringing an end to Moore’s law, performance gains will need to come from software, algorithms, and hardware. We refer to these technologies as the “Top” of the computing stack to distinguish them from the traditional technologies at the “Bottom”: semiconductor physics and silicon-fabrication technology. In the post-Moore era, the Top will provide substantial performance gains, but these gains will be opportunistic, uneven, and sporadic, and they will suffer from the law of diminishing returns. Big system components offer a promising context for tackling the challenges of working at the Top.

    …Unfortunately, semiconductor miniaturization is running out of steam as a viable way to grow computer performance—there isn’t much more room at the “Bottom.” If growth in computing power stalls, practically all industries will face challenges to their productivity. Nevertheless, opportunities for growth in computing performance will still be available, especially at the “Top” of the computing-technology stack: software, algorithms, and hardware architecture.

    Advances: Software can be made more efficient by performance engineering: restructuring software to make it run faster. Performance engineering can remove inefficiencies in programs, known as software bloat, arising from traditional software-development strategies that aim to minimize an application’s development time rather than the time it takes to run. Performance engineering can also tailor software to the hardware on which it runs, for example, to take advantage of parallel processors and vector units.

    Algorithms offer more-efficient ways to solve problems. Indeed, since the late 1970s, the time to solve the maximum-flow problem improved nearly as much from algorithmic advances as from hardware speedups. But progress on a given algorithmic problem occurs unevenly and sporadically and must ultimately face diminishing returns. As such, we see the biggest benefits coming from algorithms for new problem domains (e.g., machine learning) and from developing new theoretical machine models that better reflect emerging hardware.

    Hardware architectures can be streamlined—for instance, through processor simplification, where a complex processing core is replaced with a simpler core that requires fewer transistors. The freed-up transistor budget can then be redeployed in other ways—for example, by increasing the number of processor cores running in parallel, which can lead to large efficiency gains for problems that can exploit parallelism. Another form of streamlining is domain specialization, where hardware is customized for a particular application domain. This type of specialization jettisons processor functionality that is not needed for the domain. It can also allow more customization to the specific characteristics of the domain, for instance, by decreasing floating-point precision for machine-learning applications.

    In the post-Moore era, performance improvements from software, algorithms, and hardware architecture will increasingly require concurrent changes across other levels of the stack. These changes will be easier to implement, from engineering-management and economic points of view, if they occur within big system components: reusable software with typically more than a million lines of code or hardware of comparable complexity. When a single organization or company controls a big component, modularity can be more easily re-engineered to obtain performance gains. Moreover, costs and benefits can be pooled so that important but costly changes in one part of the big component can be justified by benefits elsewhere in the same component.

    OUTLOOK: As miniaturization wanes, the silicon-fabrication improvements at the Bottom will no longer provide the predictable, broad-based gains in computer performance that society has enjoyed for more than 50 years. Software performance engineering, development of algorithms, and hardware streamlining at the Top can continue to make computer applications faster in the post-Moore era. Unlike the historical gains at the Bottom, however, gains at the Top will be opportunistic, uneven, and sporadic. Moreover, they will be subject to diminishing returns as specific computations become better explored.

  • 2020-masanet.pdf: “Recalibrating global data center energy-use estimates”⁠, Eric Masanet, Arman Shehabi, Nuoa Lei, Sarah Smith, Jonathan Koomey

  • 2020-thyagarajan.pdf: ⁠, Sri Aravinda Krishnan Thyagarajan, Adithya Bhat, Giulio Malavolta, Nico Döttling, Aniket Kate, Dominique Schröder (2020-10-01; backlinks):

    A verifiable timed signature (VTS) scheme allows one to time-lock a signature on a known message for a given amount of time T such that after performing a sequential computation for time T anyone can extract the signature from the time-lock. Verifiability ensures that anyone can publicly check if a time-lock contains a valid signature on the message without solving it first, and that the signature can be obtained by solving the same for time T.

    This work formalizes VTS, presents efficient constructions compatible with BLS, Schnorr, and ECDSA signatures, and experimentally demonstrates that these constructions can be employed in practice. On a technical level, we design an efficient cut-and-choose protocol based on the homomorphic time-lock puzzles to prove the validity of a signature encapsulated in a time-lock puzzle. We also present a new efficient range proof protocol that substantially improves upon existing proposals in terms of the proof size, and is also of independent interest.

    While VTS is a versatile tool with numerous existing applications, we demonstrate VTS’s applicability to resolve three novel challenging issues in the space of cryptocurrencies. Specifically, we show how VTS is the cryptographic cornerstone to construct: (1) Payment channel networks with improved on-chain unlinkability of users involved in a transaction, (2) multi-party signing of transactions for cryptocurrencies without any on-chain notion of time and (3) cryptocurrency-enabled fair multi-party computation protocol. [Keywords: timed signatures, time lock puzzles, payment channel network, multi-party signing]

  • 2020-troise.pdf: ⁠, Blake Troise (2020-01-01; backlinks):

    The 1-bit sonic environment (perhaps most famously musically employed on the ZX Spectrum) is defined by extreme limitation. Yet, belying these restrictions, there is a surprisingly expressive instrumental versatility. This article explores the theory behind the primary, idiosyncratically 1-bit techniques available to the composer-programmer, those that are essential when designing “instruments” in 1-bit environments. These techniques include pulse width modulation for timbral manipulation and means of generating virtual polyphony in software, such as the pin pulse and pulse interleaving techniques. These methodologies are considered in respect to their compositional implications and instrumental applications. [Keywords: chiptune, 1-bit, one-bit, ZX Spectrum, pulse pin method, pulse interleaving, timbre, polyphony, history]

  • 2021-hochschild.pdf: ⁠, Peter H. Hochschild, Paul Turner, Jeffrey C. Mogul, Rama Govindaraju, Parthasarathy Ranganathan, David E. Culler, Amin Vahdat (2021-05-31; backlinks):

    We are accustomed to thinking of computers as fail-stop, especially the cores that execute instructions, and most system software implicitly relies on that assumption. During most of the VLSI era, processors that passed manufacturing tests and were operated within specifications have insulated us from this fiction. As fabrication pushes towards smaller feature sizes and more elaborate computational structures, and as increasingly specialized instruction-silicon pairings are introduced to improve performance, we have observed ephemeral computational errors that were not detected during manufacturing tests. These defects cannot always be mitigated by techniques such as microcode updates, and may be correlated to specific components within the processor, allowing small code changes to effect large shifts in reliability. Worse, these failures are often “silent”—the only symptom is an erroneous computation.

    We refer to a core that develops such behavior as “mercurial.” Mercurial cores are extremely rare, but in a large fleet of servers we can observe the disruption they cause, often enough to see them as a distinct problem—one that will require collaboration between hardware designers, processor vendors, and systems software architects.

    This paper is a call-to-action for a new focus in systems research; we speculate about several software-based approaches to mercurial cores, ranging from better detection and isolating mechanisms, to methods for tolerating the silent data corruption they cause.

    …Because CEEs may be correlated with specific execution units within a core, they expose us to large risks appearing suddenly and unpredictably for several reasons, including seemingly-minor software changes. Hyperscalers have a responsibility to customers to protect them against such risks. For business reasons, we are unable to reveal exact CEE rates, but we observe on the order of a few mercurial cores per several thousand machines—similar to the rate reported by Facebook ⁠. The problem is serious enough for us to have applied many engineer-decades to it.

    …We have observed defects scattered across many functions, though there are some general patterns, along with many examples that (so far) seem to be outliers. Failures mostly appear non-deterministically at variable rate. Faulty cores typically fail repeatedly and intermittently, and often get worse with time; we have some evidence that aging is a factor. In a multi-core processor, typically just one core fails, often consistently. CEEs appear to be an industry-wide problem, not specific to any vendor, but the rate is not uniform across CPU products.

    Corruption rates vary by many orders of magnitude (given a particular workload or test) across defective cores, and for any given core can be highly dependent on workload and on frequency, voltage, temperature. In just a few cases, we can reproduce the errors deterministically; usually the implementation-level and environmental details have to line up. Data patterns can affect corruption rates, but it’s often hard for us to tell. Some specific examples where we have seen CEE:

    • Violations of lock semantics leading to application data corruption and crashes.
    • Data corruptions exhibited by various load, store, vector, and coherence operations.
    • A deterministic AES miscomputation, which was “self-inverting”: encrypting and decrypting on the same core yielded the identity function, but decryption elsewhere yielded gibberish.
    • Corruption affecting garbage collection, in a storage system, causing live data to be lost.
    • Database index corruption leading to some queries, depending on which replica (core) serves them, being non-deterministically corrupted.
    • Repeated bit-flips in strings, at a particular bit position (which stuck out as unlikely to be coding bugs).
    • Corruption of kernel state resulting in process and kernel crashes and application malfunctions.

    CEEs are harder to root-cause than software bugs, which we usually assume we can debug by reproducing on a different machine.

  • 2021-ranganathan.pdf#google: ⁠, Parthasarathy Ranganathan, Daniel Stodolsky, Jeff Calow, Jeremy Dorfman, Marisabel Guevara, Clinton Wills Smullen IV, Aki Kuusela, Raghu Balasubramanian, Sandeep Bhatia, Prakash Chauhan, Anna Cheung, In Suk Chong, Niranjani Dasharathi, Jia Feng, Brian Fosco, Samuel Foss, Ben Gelb, Sara J. Gwin, Yoshiaki Hase, Da-ke He, C. Richard Ho, Roy W. Huffman Jr., Elisha Indupalli, Indira Jayaram, Poonacha Kongetira, Cho Mon Kyaw, Aaron Laursen, Yuan Li, Fong Lou, Kyle A. Lucke, JP Maaninen, Ramon Macias, Maire Mahony, David Alexander Munday, Srikanth Muroor, Narayana Penukonda, Eric Perkins-Argueta, Devin Persaud, Alex Ramirez, Ville-Mikko Rautio, Yolanda Ripley, Amir Salek, Sathish Sekar, Sergey N. Sokolov, Rob Springer, Don Stark, Mercedes Tan, Mark S. Wachsler, Andrew C. Walton, David A. Wickeraad, Alvin Wijaya, Hon Kwan Wu (2021-02-27; backlinks):

    Video sharing (e.g., YouTube, Vimeo, Facebook, TikTok) accounts for the majority of internet traffic, and video processing is also foundational to several other key workloads (video conferencing, virtual/augmented reality, cloud gaming, video in Internet-of-Things devices, etc.). The importance of these workloads motivates larger video processing infrastructures and—with the slowing of Moore’s law—specialized hardware accelerators to deliver more computing at higher efficiencies.

    This paper describes the design and deployment, at scale, of a new accelerator targeted at warehouse-scale video transcoding. We present our hardware design including a new accelerator building block—the video coding unit (VCU)—and discuss key design trade-offs for balanced systems at data center scale and co-designing accelerators with large-scale distributed software systems. We evaluate these accelerators “in the wild” serving live data center jobs, demonstrating 20–33× improved efficiency over our prior well-tuned non-accelerated baseline. Our design also enables effective adaptation to changing bottlenecks and improved failure management, and new workload capabilities not otherwise possible with prior systems.

    To the best of our knowledge, this is the first work to discuss video acceleration at scale in large warehouse-scale environments. [Keywords: video transcoding, warehouse-scale computing, domain-specific accelerators, hardware-software co-design]

    Table 1: Offline 2-pass single output  throughput in   and  systems. · Encoding Throughput: Table 1 shows throughput and  (performance per total cost of ownership) for the 4 systems and is normalized to the  of the  system. The performance is shown for offline 2-pass  encoding for H.264 and VP9. For H.264, the  has 3.5× higher throughput, and the  and  provide 8.4× and 20.9× more throughput, respectively. For VP9, the  system has 99.4× the throughput of the  baseline. The 2 orders of magnitude increase in performance clearly demonstrates the benefits of our  system.

    The VCU package is a full-length PCI-E card and looks a lot like a graphics card. A board has 2 Argos ASIC chips buried under a gigantic, passively cooled aluminum heat sink. There’s even what looks like an 8-pin power connector on the end because PCI-E just isn’t enough power.

    Google provided a lovely chip diagram that lists 10 “encoder cores” on each chip, with Google’s white paper adding that “all other elements are off-the-shelf IP blocks.” Google says that “each encoder core can encode 2160p in realtime, up to 60 FPS (frames per second) using 3 reference frames.”

    The cards are specifically designed to slot into Google’s warehouse-scale computing system. Each compute cluster in YouTube’s system will house a section of dedicated “VCU machines” loaded with the new cards, saving Google from having to crack open every server and load it with a new card. Google says the cards resemble GPUs because they are what fit in its existing accelerator trays. CNET reports that “thousands of the chips are running in Google data centers right now,” and thanks to the cards, individual video workloads like 4K video “can be available to watch in hours instead of the days it previously took.”

    Factoring in the research and development on the chips, Google says this VCU plan will save the company a ton of money, even given the below benchmark showing the TCO (total cost of ownership) of the setup compared to running its algorithm on Intel Skylake chips and Nvidia T4 Tensor core GPUs.

  • zend-manual-pdfmetadata.html (backlinks)