/docs/cs/ Directory Listing

Annotated bibliography of files in the directory /docs/cs/.
index
2009-01-012021-04-07 in progress certainty: log importance: 0


Files

  • ⁠, John Nash (2012-02-22):

    1955 letters of John Nash and the NSA on a cryptosystem and Nash’s belief that near-perfect cryptography could exploit exponential difficulties.

    [Keywords: cryptography⁠, history]

  • ⁠, John Nash (1955-01):

    [In 1955, well-known mathematician John Nash was in correspondence with the United States National Security Agency. In these letters, Nash proposes a novel enciphering scheme. He also sets forth an important cryptographic principle that now underpin modern computational complexity theory and cryptography. In particular, he proposes a natural definition for “[security] in a practical sense”—that exponential computational effort is required for an enemy to recovery a secret key. Nash further conjectures that this property holds for any suitable enciphering mechanism.

    These correspondences, recently declassified by the NSA [1]; this document is the NSA scan with original images and drawings of Nash’s handwritten letters.]

  • ⁠, John Nash, Mike Rosulek (2012-02-20):

    In 1955, well-known mathematician John Nash was in correspondence with the United States National Security Agency. In these letters, Nash proposes a novel enciphering scheme. He also sets forth an important cryptographic principle that now underpin modern computational complexity theory and cryptography. In particular, he proposes a natural definition for “[security] in a practical sense”—that exponential computational effort is required for an enemy to recovery a secret key. Nash further conjectures that this property holds for any suitable enciphering mechanism.

    These correspondences, recently declassified by the NSA [1], have been transcribed and typeset in this document. [Typeset by Mike Rosulek: Department of Computer Science, University of Montana, ]

  • ⁠, Claude Shannon (1959):

    Consider a discrete source producing a sequence of message letters from a finite alphabet. A single-letter distortion measure is given by an non-negative matrix (dij). The entry dij measures the “cost” or “distortion” if letter i is reproduced at the receiver as letter j. The average distortion of a communications system (source-coder-noisy channel-decoder) is taken to be where Pij is the probability of i being reproduced as j. It is shown that there is a function R(d) that measures the “equivalent rate” of the source for a given level of distortion. For coding purposes where a level d of distortion can be tolerated, the source acts like one within information rate R(d). Methods are given for calculating R(d), and various properties discussed. Finally, generalizations to ergodic sources, to continuous sources, and to distortion measures involving blocks of letters are developed.

  • ⁠, C. Radhakrishna Rao (1961-08-01):

    A general method is given for generating random permutations of integers using a table of random sampling numbers and without wasting the random numbers read. This is more convenient in practice, specially when random permutations of large numbers of elements are needed. It is suggested that even for permutations of small numbers, the method offers greater scope than consulting a table of a limited number of random permutations. [See also ⁠, hence the description of this as “Rao-Sandelius shuffling”.]

  • ⁠, T. Rado (1962-05-01):

    [On the ⁠.] The construction of non-computable functions used in this paper is based on the principle that a finite, non-empty set of non-negative integers has a largest element. Also, this principle is used only for sets which are exceptionally well-defined by current standards. No enumeration of computable functions is used, and in this sense the diagonal process is not employed. Thus, it appears that an apparently self-evident principle, of constant use in every area of mathematics, yields non-constructive entities.

  • ⁠, Martin Sandelius (1962-07-01):

    The paper describes a randomization procedure consisting in distributing a deck of cards into 10 decks using random decimal digits and repeating this step with each deck consisting of three or more cards. One random digit is used for randomizing a deck of two cards. This procedure, which is essentially a particular case of a general procedure described by ⁠, is called the “multistage randomization procedure”, or MRP. Some applications are described. A recursive formula is given for the expected number of random digits required by MRP for the randomization of n symbols. A measure of the efficiency of a randomization procedure is presented. The efficiency of MRP is compared with the efficiencies of two other randomization procedures, and it is proved that MRP has an asymptotic efficiency of 100%.

  • 1964-gluskin.pdf

  • ⁠, Timothy C. May, M. H. Woods (1979-01-01):

    A new physical soft error mechanism in dynamic RAM’s and CCD’s is the upset of stored data by the passage of alpha particles through the memory array area. The alpha particles are emitted by the radioactive decay of uranium and thorium which are present in parts-per-million levels in packaging materials. When an alpha particle penetrates the die surface, it can create enough electron-hole pairs near a storage node to cause a random, single-bit error. Results of experiments and measurements of alpha activity of materials are reported and a physical model for the soft error is developed. Implications for the future of dynamic memories are also discussed.

  • 1980-rytter.pdf: “A Correct Preprocessing Algorithm for Boyer-Moore String-Searching”⁠, Wojciech Rytter (1980-01-01):

    We present the correction to Knuth’s algorithm [2] for computing the table of pattern shifts later used in the Boyer-Moore algorithm for pattern matching.

  • ⁠, Daniel Cohen (1981-10-01):

    Which bit should travel first? The bit from the big end or the bit from the little end? Can a war between Big Endians and Little Endians be avoided?

    This article was written in an attempt to stop a war. I hope it is not too late for peace to prevail again. Many believe that the central question of this war is, What is the proper byte order in messages? More specifically, the question is, Which bit should travel first-the bit from the little end of the word or the bit from the big end of the word? Followers of the former approach are called Little Endians, or Lilliputians; followers of the latter are called Big Endians, or Blefuscuians. I employ these Swiftian terms because this modern conflict is so reminiscent of the holy war described in Gulliver’s Travels.

    …To sum it all up, there are two camps, each with its own language. These languages are as compatible with each other as any Semitic and Latin languages. All Big Endians can talk only to each other. So can all the Little Endians, although there are some differences among the dialects used by different tribes. There is no middle ground—only one end can go first. As in all the religious wars of the past, power—not logic—will be the decisive factor. This is not the first holy war, and will probably not be the last. The “Reasonable, do it my way” approach does not work. Neither does the Esperanto approach of switching to yet another new language. Lest our communications world split along theses lines, we should take note of a certain book (not mentioned in the references), which has an interesting story about a similar phenomenon: the Tower of Babel. Lilliput and Blefuscu will never come to terms of their own free will. We need some Gulliver between the two islands to force a unified communication regime on all of us.

    Of course, I hope that my way will be chosen, but it is not really critical. Agreement upon an order is more important than the order agreed upon.

    Shall we toss a coin?"

  • ⁠, Alan J. Perlis (1982-09-1):

    [130 epigrams on computer science and technology, published in 1982, for ACM’s SIGPLAN journal, by noted computer scientist and programming language researcher ⁠. The epigrams are a series of short, programming-language-neutral, humorous statements about computers and programming, distilling lessons he had learned over his career, which are widely quoted.]

    8. A programming language is low level when its programs require attention to the irrelevant….19. A language that doesn’t affect the way you think about programming, is not worth knowing….54. Beware of the Turing tar-pit in which everything is possible but nothing of interest is easy.

    15. Everything should be built top-down, except the first time….30. In programming, everything we do is a special case of something more general—and often we know it too quickly….31. Simplicity does not precede complexity, but follows it….58. Fools ignore complexity. Pragmatists suffer it. Some can avoid it. Geniuses remove it….65. Make no mistake about it: Computers process numbers—not symbols. We measure our understanding (and control) by the extent to which we can arithmetize an activity….56. Software is under a constant tension. Being symbolic it is arbitrarily perfectible; but also it is arbitrarily changeable.

    1. One man’s constant is another man’s variable. 34. The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information.

    36. The use of a program to prove the 4-color theorem will not change mathematics—it merely demonstrates that the theorem, a challenge for a century, is probably not important to mathematics.

    39. Re graphics: A picture is worth 10K words—but only those to describe the picture. Hardly any sets of 10K words can be adequately described with pictures.

    48. The best book on programming for the layman is Alice in Wonderland; but that’s because it’s the best book on anything for the layman.

    77. The cybernetic exchange between man, computer and algorithm is like a game of musical chairs: The frantic search for balance always leaves one of the 3 standing ill at ease….79. A year spent in artificial intelligence is enough to make one believe in God….84. Motto for a research laboratory: What we work on today, others will first think of tomorrow.

    91. The computer reminds one of Lon Chaney—it is the machine of a thousand faces.

    7. It is easier to write an incorrect program than understand a correct one….93. When someone says “I want a programming language in which I need only say what I wish done,” give him a lollipop….102. One can’t proceed from the informal to the formal by formal means.

    100. We will never run out of things to program as long as there is a single program around.

    108. Whenever 2 programmers meet to criticize their programs, both are silent….112. Computer Science is embarrassed by the computer….115. Most people find the concept of programming obvious, but the doing impossible. 116. You think you know when you can learn, are more sure when you can write, even more when you can teach, but certain when you can program. 117. It goes against the grain of modern education to teach children to program. What fun is there in making plans, acquiring discipline in organizing thoughts, devoting attention to detail and learning to be self-critical?

  • ⁠, Allen H. Brady (1983-04-01):

    The well-defined but noncomputable functions Σ(k) and S(k)) given by as the “score” and “shift number” for the k-state Turing machine “ Game” were previously known only for k ≤ 3. The largest known lower bounds yielding the relations Σ(4) ≥ 13 and S(4) ≥ 107, reported by this author, supported the conjecture that these lower bounds are the actual particular values of the functions for k = 4.

    The four-state case has previously been reduced to solving the blank input tape halting problem of only 5,820 individual machines. In this final stage of the k = 4 case, one appears to move into a heuristic level of higher order where it is necessary to treat each machine as representing a distinct theorem.

    The remaining set consists of two primary classes in which a machine and its tape are viewed as the representation of a growing string of cellular automata. The proof techniques, embodied in programs, are entirely heuristic, while the inductive proofs, once established by the computer, are completely rigorous and become the key to the proof of the new and original mathematical results: Σ(4) = 13 and S(4) = 107.

  • ⁠, Richard Feynman (1985):

    [In this chapter, Feynman discusses his fascination with locks and safes, and the ways he learns to crack different locks and safes. He also talks about the locks and safes there were at Los Alamos (the ones that held the secrets of the atomic bomb). Generally speaking, the safes containing this material, which is (understandably) considered top secret, are startlingly easy to crack. When Feynman and his colleagues first arrive at Los Alamos, the construction of the facility is not even complete yet, and there is almost no security at all for the “top secret” information. Even later, when the information is stored in safes and locked file cabinets, the locks are generally cheap, and it is easy for Feynman to determine what the combination is, as combinations were often reused. He can memorize common passwords, or watch people dialing carelessly to deduce several digits, and then brute-force the rest; often, people do not even change the factory default.]

  • 1985-lehman-programevolution.pdf

  • 1986-bobrow.pdf

  • 1987-baker.pdf

  • ⁠, John H. Conway (1987-01-01):

    To play the fraction game corresponding to a given list

    f1, f2, …, fk

    of fractions and starting integer N, you repeatedly multiply the integer you have at any stage (initially N) by the earliest fi in the list for which the answer is integral. Whenever there is no such fi, the game stops.

  • 1987-moon.pdf

  • 1988-borenstein.pdf

  • ⁠, Ronald A. DeVore, Ralph Howard, Charles Micchelli (1989):

    We introduce a definition of nonlinear n-widths and then determine the n-widths of the unit ball of the Sobolev space W[^r^~p~]{.supsub} in Lq. We prove that in the sense of these widths the manifold of splines of fixed degree with n free knots is optimal for approximating functions in these Sobolev spaces.

  • 1989-robson.pdf

  • ⁠, Barton P. Miller, Louis Fredriksen, Bryan So (1990):

    Operating system facilities, such as the kernel and utility programs, are typically assumed to be reliable. In our recent experiments, we have been able to crash 25–33% of the utility programs on any version of UNIX that was tested. This report describes these tests and the program bugs that caused the crashes…The following section describes the tools we built to test the utilities. These tools include the fuzz (random character) generator, ptyjig (to test interactive utilities), and scripts to automate the testing process. Next, we will describe the tests we performed, giving the types of input we presented to the utilities. Results from the tests will follow along with an analysis of the results, including identification and classification of the program bugs that caused the crashes. The final section presents concluding remarks, including suggestions for avoiding the types of problems detected by our study and some commentary on the bugs we found. We include an Appendix with the user manual pages for fuzz and ptyjig.

  • ⁠, Mitchell Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz (1993-10-01):

    In this paper, we review our experience with constructing one such large annotated corpus—the Penn Treebank, a corpus consisting of over 4.5 million words of American English. During the first three-year phase of the Penn Treebank Project (1989–1992), this corpus has been annotated for part-of-speech (POS) information. In addition, over half of it has been annotated for skeletal syntactic structure.

  • 1994-binstock-lucidlost.pdf

  • 1994-burke-informationandsecrecyvannevarbushultrandtheothermemex.pdf

  • ⁠, Lev B. Levitin, Zeev Reingold (1994-05-01):

    The concept of the entropy of natural languages, first introduced by Shannon 1948 and its importance is discussed. A review of various known approaches to and results of previous studies of language entropy is presented. A new improved method for evaluation of both lower and upper bounds of the entropy of printed texts is developed. This method is a refinement of Shannon 1951’s prediction (guessing) method. The evaluation of the lower bound is shown to be a classical linear programming problem. Statistical analysis of the estimation of the bounds is given and procedures for the statistical treatment of the experimental data (including verification of statistical validity and statistical-significance) are elaborated. The method has been applied to printed Hebrew texts in a large experiment (1000 independent samples) in order to evaluate entropy and other information-theoretical characteristics of the Hebrew language. The results have demonstrated the efficiency of the new method: the gap between the upper and lower bounds of entropy has been reduced by a factor of 2.25 compared to the original Shannon approach. Comparison with other languages is given. Possible applications of the method are briefly discussed.

  • 1996-11-15-usenix-ithorrorstories.html

  • 1997-bejan.pdf

  • ⁠, Jim Reeds (1998):

    Book III of (written ca. 1500) contains hidden cipher messages within what is ostensibly a work on magic. After almost 500 years these cryptograms have been detected and solved. (Since 1606 it was known that similar ciphers were present in Books I and II.) As a result the Steganographia can no longer be regarded as one of the main early modern demonological treatises but instead stands unambiguously revealed as the first book-length treatment of cryptography in Europe. [Keywords: Trithemius, Steganographia, history of cryptography.]

  • 2000-mao.pdf

  • 2001-12-02-treginaldgibbons-isyoursonacomputerhacker.html

  • 2001-demarco-peopleware-whymeasureperformance.pdf

  • ⁠, Richard Pawson, Robert Matthews (2001-12-01):

    Naked objects is an approach to systems design in which core business objects show directly through to the user interface, and in which all interaction consists of invoking methods on those objects in the noun-verb style. One advantage of this approach is that it results in systems that are more expressive from the viewpoint of the user: they treat the user like a problem solver, not as merely a process-follower. Another advantage is that the 1:1 mapping between the user’s representation and the underlying model means that it is possible to auto-generate the former from the latter, which yields benefits to the development process. The authors have designed a Java-based, open source toolkit called Naked Objects which facilitates this style of development. This paper describes the design and operation of the toolkit and its application to the prototyping of a core business system. Some initial feedback from the project is provided, together with a list of future research directions both for the toolkit and for a methodology to apply the naked objects approach.

  • ⁠, Kevin Scott (2001-03-01):

    In 1965 Gordon Moore observed that the capacity of semiconductor ICs doubled every 18 to 24 months. This trend, now known as Moore’s Law, has held for over 25 years and is responsible for the exponential increase in microprocessor performance over this period. In 1998 Todd Proebsting made a similar-in-spirit, but altogether less optimistic observation about optimizing compilers. His observation, henceforth known as ⁠, is that improvements to compiler technology double the performance of typical programs every 18 years. Proebsting has suggested an experiment to evaluate the veracity of his observation. This paper presents the results of this experiment and some comments on what Proebsting’s Law portends for compiler research.

  • 2002-magkos.pdf

  • ⁠, George Candea, Armando Fox (2003-05-01):

    Crash-only programs crash safely and recover quickly. There is only one way to stop such software—by crashing it—and only one way to bring it up—by initiating recovery. Crash-only systems are built from crash-only components, and the use of transparent component-level retries hides intra-system component crashes from end users. In this paper we advocate a crash-only design for Internet systems, showing that it can lead to more reliable, predictable code and faster, more effective recovery. We present ideas on how to build such crash-only Internet services, taking successful techniques to their logical extreme.

  • 2003-fortnow.pdf

  • 2003-ross.pdf

  • 2005-09-30-smith-whyihateframeworks.html

  • 2005-knuth-taocp-v4-prefascicle4b.pdf#page=22

  • 2006-jared-wikimediaprovesgreenspunstenthlaw.html

  • ⁠, Ulrich Drepper (2007-11-12):

    As CPU cores become increasingly faster, the limiting factor for most programs is now, and will be for some time, memory access. Hardware designers have come up with ever-more sophisticated memory-handling and memory-acceleration techniques—such as CPU caches—but these cannot work optimally without some help from the programmer. Unfortunately, neither the structure nor the cost of using the memory subsystem of a computer or the caches on CPU is well understood by most programmers. This paper explains the structure of memory subsystems in use on modern commodity hardware, illustrating why CPU caches were developed, how they work, and what programs should do to achieve optimal performance by utilizing them.

    [Some parts are obsolete as of 2017.]

  • ⁠, Mark Changizi (2008-01-01):

    Might it be possible to harness the visual system to carry out artificial computations, somewhat akin to how DNA has been harnessed to carry out computation? I provide the beginnings of a research programme attempting to do this. In particular, new techniques are described for building ‘visual circuits’ (or ‘visual software’) using wire, NOT, OR, and AND gates in a visual modality such that our visual system acts as ‘visual hardware’ computing the circuit, and generating a resultant perception which is the output.

  • 2009-gluck.pdf

  • ⁠, Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, Peter F. Sweeney (2009-03-07):

    This paper presents a surprising result: changing a seemingly innocuous aspect of an experimental setup can cause a systems researcher to draw wrong conclusions from an experiment. What appears to be an innocuous aspect in the experimental setup may in fact introduce a substantial bias in an evaluation. This phenomenon is called measurement bias in the natural and social sciences.

    Our results demonstrate that measurement bias is substantial and commonplace in computer system evaluation. By significant we mean that measurement bias can lead to a performance analysis that either over-states an effect or even yields an incorrect conclusion. By commonplace we mean that measurement bias occurs in all architectures that we tried (Pentium 4, Core 2, and m5 O3CPU), both compilers that we tried (gcc and Intel’s C compiler), and most of the SPEC CPU2006 C programs. Thus, we cannot ignore measurement bias. Nevertheless, in a literature survey of 133 recent papers from ASPLOS, PACT, PLDI, and CGO, we determined that none of the papers with experimental results adequately consider measurement bias.

    Inspired by similar problems and their solutions in other sciences, we describe and demonstrate two methods, one for detecting (causal analysis) and one for avoiding (setup randomization) measurement bias. [Keywords: experimentation, measurement, performance, bias]

  • ⁠, Arvind Narayanan, Vitaly Shmatikov (2009-05-17):

    Operators of online social networks are increasingly sharing potentially sensitive information about users and their relationships with advertisers, application developers, and data-mining researchers. Privacy is typically protected by anonymization, i.e., removing names, addresses, etc.

    We present a framework for analyzing privacy and anonymity in social networks and develop a new re-identification algorithm targeting anonymized social-network graphs. To demonstrate its effectiveness on real-world networks, we show that a third of the users who can be verified to have accounts on both Twitter, a popular microblogging service, and Flickr, an online photo-sharing site, can be re-identified in the anonymous Twitter graph with only a 12% error rate.

    Our de-anonymization algorithm is based purely on the network topology, does not require creation of a large number of dummy “sybil” nodes, is robust to noise and all existing defenses, and works even when the overlap between the target network and the adversary’s auxiliary information is small.

  • 2010-bates.pdf

  • 2010-nordhaus-nordhaus2007twocenturiesofproductivitygrowthincomputing-appendix.xlsx

  • 2010-pcast-nitrd.pdf

  • ⁠, Gang Ren; Tune, E.; Moseley, T.; Yixin Shi; Rus, S.; Hundt, R. (2010-08-19):

    Google-Wide Profiling (GWP), a continuous profiling infrastructure for data centers, provides performance insights for cloud applications. With negligible overhead, GWP provides stable, accurate profiles and a datacenter-scale tool for traditional performance analyses. Furthermore, GWP introduces novel applications of its profiles, such as application-platform affinity measurements and identification of platform-specific, microarchitectural peculiarities.

  • ⁠, François Pellegrino, Christophe Coupé, Egidio Marsico (2011-09):

    This article is a cross-linguistic investigation of the hypothesis that the average information rate conveyed during speech communication results from a trade-off between average information density and speech rate. The study, based on seven languages, shows a negative correlation between density and rate, indicating the existence of several encoding strategies. However, these strategies do not necessarily lead to a constant information rate. These results are further investigated in relation to the notion of syllabic complexity.

  • 2011-yelick.pdf

  • 2012-02-12-arvindnarayanan-iswritingstylesufficienttodeanonymizematerialonline.html

  • ⁠, David S. Johnson (2012-01-01):

    The year 2012 marks the 40th anniversary of the publication of the influential paper “Reducibility among combinatorial problems” by Richard Karp. This paper was the first to demonstrate the wide applicability of the concept now known as NP-completeness, which had been introduced the previous year by Stephen Cook and Leonid Levin, independently. 2012 also marks the 100th anniversary of the birth of Alan Turing, whose invention of what is now known as the “Turing machine” underlay that concept. In this chapter, I shall briefly sketch the history and pre-history of NP-completeness (with pictures), and provide a brief personal survey of the developments in the theory over the last 40 years and their impact (or lack thereof) on the practice and theory of optimization. I assume the reader is familiar with the basic concepts of NP-completeness, P, and NP, although I hope the story will still be interesting to those with only a fuzzy recollection of the definitions.

  • 2012-terencetao-anonymity.html

  • 2015-brooks.pdf: “An Interview with Fred Brooks”⁠, Fred Brooks, Len Shustek (2015):

    ACM Fellow and A.M. Turing Award recipient Fred Brooks reflects on his career.

  • ⁠, Svilen Kanev, Juan Pablo Darago, Kim M. Hazelwood, Parthasarathy Ranganathan, Tipp J. Moseley, Gu-Yeon Wei, David Michael Brooks (2015-06-01):

    With the increasing prevalence of warehouse-scale (WSC) and cloud computing, understanding the interactions of server applications with the underlying microarchitecture becomes ever more important in order to extract maximum performance out of server hardware. To aid such understanding, this paper presents a detailed microarchitectural analysis of live datacenter jobs, measured on more than 20,000 Google machines over a three year period, and comprising thousands of different applications.

    We first find that WSC workloads are extremely diverse, breeding the need for architectures that can tolerate application variability without performance loss. However, some patterns emerge, offering opportunities for co-optimization of hardware and software. For example, we identify common building blocks in the lower levels of the software stack. This “datacenter tax” can comprise nearly 30% of cycles across jobs running in the fleet, which makes its constituents prime candidates for hardware specialization in future server systems-on-chips. We also uncover opportunities for classic microarchitectural optimizations for server processors, especially in the cache hierarchy. Typical workloads place substantial stress on instruction caches and prefer memory latency over bandwidth. They also stall cores often, but compute heavily in bursts. These observations motivate several interesting directions for future warehouse-scale computers.

  • 2015-kuppusamy.pdf

  • ⁠, Frank McSherry, Michael Isard, Derek G. Murray (2015-05):

    We offer a new metric for big data platforms, COST, or the Configuration that Outperforms a Single Thread. The COST of a given platform for a given problem is the hardware configuration required before the platform outperforms a competent single-threaded implementation. COST weighs a system’s scalability against the overheads introduced by the system, and indicates the actual performance gains of the system, without rewarding systems that bring substantial but parallelizable overheads.

    We survey measurements of data-parallel systems [for graph processing] recently reported in SOSP and OSDI, and find that many systems have either a surprisingly large COST, often hundreds of cores, or simply underperform one thread for all of their reported configurations.

  • 2017-denning.pdf: “Exponential Laws of Computing Growth: Moore's Law is one small component in an exponentially growing planetary computing ecosystem”⁠, Peter J. Denning, Ted G. Lewis (2017):

    Moore’s Law is one small component in an exponentially growing planetary computing ecosystem.

  • 2017-hazel.pdf

  • ⁠, Junfeng Zhou, Shijie Zhou, Jeffrey Xu Yu, Hao Wei, Ziyang Chen, Xian Tang (2017-05-01):

    Answering reachability queries is one of the fundamental graph operations. The existing approaches build indexes and answer reachability queries on a directed acyclic graph (DAG) G, which is constructed by coalescing each strongly connected component of the given directed graph G into a node of G. Considering that G can still be large to be processed efficiently, there are studies to further reduce G to a smaller graph. However, these approaches suffer from either inefficiency in answering reachability queries, or cannot scale to large graphs.

    In this paper, we study DAG reduction to accelerate reachability query processing, which reduces the size of G by computing transitive reduction (TR) followed by computing equivalence reduction (ER). For ER, we propose a divide-and-conquer algorithm, namely linear-ER. Given the result Gt of TR, linear-ER gets a smaller DAG Gε in linear time based on equivalence relationship between nodes in G. Our DAG reduction approaches (TR and ER) substantially improve the cost of time and space, and can be scaled to large graphs. We confirm the efficiency of our approaches by extensive experimental studies for TR, ER, and reachability query processing using 20 real datasets.

  • 2019-07-15-danielbali-citiesskylineisturingcomplete.html

  • ⁠, Scott Aaronson (2019-08-28):

    The ⁠, with its incomprehensibly rapid growth, has captivated generations of computer scientists, mathematicians, and hobbyists. In this survey, I offer a personal view of the BB function 58 years after ⁠, emphasizing lesser-known insights, recent progress, and especially favorite open problems.

    Examples of such problems include: when does the BB function first exceed the ? Is the value of BB(20) independent of set theory? Can we prove that BB(n + 1) > 2BB(n) for large enough n? Given BB(n), how many advice bits are needed to compute BB(n + 1)? Do all Busy Beavers halt on all inputs, not just the 0 input? Is it decidable, given n, whether BB(n) is even or odd?

  • ⁠, Martin Kleppmann, Adam Wiggins, Peter van Hardenberg, Mark McGranaghan (Ink & Switch) (2019-10-23):

    Cloud apps like Google Docs and Trello are popular because they enable real-time collaboration with colleagues, and they make it easy for us to access our work from all of our devices. However, by centralizing data storage on servers, cloud apps also take away ownership and agency from users. If a service shuts down, the software stops functioning, and data created with that software is lost.

    In this article we propose local-first software, a set of principles for software that enables both collaboration and ownership for users. Local-first ideals include the ability to work offline and collaborate across multiple devices, while also improving the security, privacy, long-term preservation, and user control of data.

    We survey existing approaches to data storage and sharing, ranging from email attachments to web apps to Firebase-backed mobile apps, and we examine the trade-offs of each. We look at Conflict-free Replicated Data Types (CRDTs): data structures that are multi-user from the ground up while also being fundamentally local and private. CRDTs have the potential to be a foundational technology for realizing local-first software.

    We share some of our findings from developing local-first software prototypes at the Ink & Switch research lab over the course of several years. These experiments test the viability of CRDTs in practice, and explore the user interface challenges for this new data model. Lastly, we suggest some next steps for moving towards local-first software: for researchers, for app developers, and a startup opportunity for entrepreneurs. [Keywords: collaboration software, mobile computing, data ownership, CRDTs, peer-to-peer communication]

  • ⁠, Michael Fleder, Devavrat Shah (2020-12-01):

    We consider the question of identifying which set of products are purchased and at what prices in a given transaction by observing only the total amount spent in the transaction, and nothing more. The ability to solve such an inverse problem can lead to refined information about consumer spending by simply observing anonymized credit card transactions data.

    Indeed, when considered in isolation, it is impossible to identify the products purchased and their prices from a given transaction just based on the transaction total. However, given a large number of transactions, there may be a hope. As the main contribution of this work, we provide a robust estimation algorithm for decomposing transaction totals into the underlying, individual product(s) purchased by utilizing a large corpus of transactions.

    Our method recovers a (product prices) vector of unknown dimension (number of products) N as well as matrix simply from M observations (transaction totals) such that with representing noise (taxes, discounts, etc). We formally establish that our algorithm identifies N, A precisely and p approximately, as long as each product is purchased individually at least once, i.e.  and A has rank N. Computationally, the algorithm runs in polynomial time (with respect to problem parameters), and thus we provide a computationally efficient and statistically robust method for solving such inverse problems.

    We apply the algorithm to a large corpus of anonymized consumer credit card transactions in the period 2016–2019, with data obtained from a commercial data vendor. The transactions are associated with spending at Apple, Chipotle, Netflix, and Spotify. From just transactions data, our algorithm identifies (1) key price points (without access to the listed prices), (2) products purchased within a transaction, (3) product launches, and (4) evidence of a new ‘secret’ product from Netflix—rumored to be in limited release. [Keywords: blind compressed sensing; alternative data; finance; consumer credit card transactions]

  • ⁠, Charles E. Leiserson, Neil C. Thompson, Joel S. Emer, Bradley C. Kuszmaul, Butler W. Lampson, Daniel Sanchez, Tao B. Schardl (2020-06-05):

    From bottom to top: The doubling of the number of transistors on a chip every 2 years, a seemly inevitable trend that has been called Moore’s law, has contributed immensely to improvements in computer performance. However, silicon-based transistors cannot get much smaller than they are today, and other approaches should be explored to keep performance growing. Leiserson et al. review recent examples and argue that the most promising place to look is at the top of the computing stack, where improvements in software, algorithms, and hardware architecture can bring the much-needed boost…The miniaturization of semiconductor transistors has driven the growth in computer performance for more than 50 years. As miniaturization approaches its limits, bringing an end to Moore’s law, performance gains will need to come from software, algorithms, and hardware. We refer to these technologies as the “Top” of the computing stack to distinguish them from the traditional technologies at the “Bottom”: semiconductor physics and silicon-fabrication technology. In the post-Moore era, the Top will provide substantial performance gains, but these gains will be opportunistic, uneven, and sporadic, and they will suffer from the law of diminishing returns. Big system components offer a promising context for tackling the challenges of working at the Top.

    …Unfortunately, semiconductor miniaturization is running out of steam as a viable way to grow computer performance—there isn’t much more room at the “Bottom.” If growth in computing power stalls, practically all industries will face challenges to their productivity. Nevertheless, opportunities for growth in computing performance will still be available, especially at the “Top” of the computing-technology stack: software, algorithms, and hardware architecture.

    Advances: Software can be made more efficient by performance engineering: restructuring software to make it run faster. Performance engineering can remove inefficiencies in programs, known as software bloat, arising from traditional software-development strategies that aim to minimize an application’s development time rather than the time it takes to run. Performance engineering can also tailor software to the hardware on which it runs, for example, to take advantage of parallel processors and vector units.

    Algorithms offer more-efficient ways to solve problems. Indeed, since the late 1970s, the time to solve the maximum-flow problem improved nearly as much from algorithmic advances as from hardware speedups. But progress on a given algorithmic problem occurs unevenly and sporadically and must ultimately face diminishing returns. As such, we see the biggest benefits coming from algorithms for new problem domains (e.g., machine learning) and from developing new theoretical machine models that better reflect emerging hardware.

    Hardware architectures can be streamlined—for instance, through processor simplification, where a complex processing core is replaced with a simpler core that requires fewer transistors. The freed-up transistor budget can then be redeployed in other ways—for example, by increasing the number of processor cores running in parallel, which can lead to large efficiency gains for problems that can exploit parallelism. Another form of streamlining is domain specialization, where hardware is customized for a particular application domain. This type of specialization jettisons processor functionality that is not needed for the domain. It can also allow more customization to the specific characteristics of the domain, for instance, by decreasing floating-point precision for machine-learning applications.

    In the post-Moore era, performance improvements from software, algorithms, and hardware architecture will increasingly require concurrent changes across other levels of the stack. These changes will be easier to implement, from engineering-management and economic points of view, if they occur within big system components: reusable software with typically more than a million lines of code or hardware of comparable complexity. When a single organization or company controls a big component, modularity can be more easily re-engineered to obtain performance gains. Moreover, costs and benefits can be pooled so that important but costly changes in one part of the big component can be justified by benefits elsewhere in the same component.

    OUTLOOK: As miniaturization wanes, the silicon-fabrication improvements at the Bottom will no longer provide the predictable, broad-based gains in computer performance that society has enjoyed for more than 50 years. Software performance engineering, development of algorithms, and hardware streamlining at the Top can continue to make computer applications faster in the post-Moore era. Unlike the historical gains at the Bottom, however, gains at the Top will be opportunistic, uneven, and sporadic. Moreover, they will be subject to diminishing returns as specific computations become better explored.

  • 2020-masanet.pdf

  • ⁠, Sri Aravinda Krishnan Thyagarajan, Adithya Bhat, Giulio Malavolta, Nico Döttling, Aniket Kate, Dominique Schröder (2020-10-01):

    A verifiable timed signature (VTS) scheme allows one to time-lock a signature on a known message for a given amount of time T such that after performing a sequential computation for time T anyone can extract the signature from the time-lock. Verifiability ensures that anyone can publicly check if a time-lock contains a valid signature on the message without solving it first, and that the signature can be obtained by solving the same for time T.

    This work formalizes VTS, presents efficient constructions compatible with BLS, Schnorr, and ECDSA signatures, and experimentally demonstrates that these constructions can be employed in practice. On a technical level, we design an efficient cut-and-choose protocol based on the homomorphic time-lock puzzles to prove the validity of a signature encapsulated in a time-lock puzzle. We also present a new efficient range proof protocol that substantially improves upon existing proposals in terms of the proof size, and is also of independent interest.

    While VTS is a versatile tool with numerous existing applications, we demonstrate VTS’s applicability to resolve three novel challenging issues in the space of cryptocurrencies. Specifically, we show how VTS is the cryptographic cornerstone to construct: (1) Payment channel networks with improved on-chain unlinkability of users involved in a transaction, (2) multi-party signing of transactions for cryptocurrencies without any on-chain notion of time and (3) cryptocurrency-enabled fair multi-party computation protocol. [Keywords: timed signatures, time lock puzzles, payment channel network, multi-party signing]

  • ⁠, Blake Troise (2020-01-01):

    The 1-bit sonic environment (perhaps most famously musically employed on the ZX Spectrum) is defined by extreme limitation. Yet, belying these restrictions, there is a surprisingly expressive instrumental versatility. This article explores the theory behind the primary, idiosyncratically 1-bit techniques available to the composer-programmer, those that are essential when designing “instruments” in 1-bit environments. These techniques include pulse width modulation for timbral manipulation and means of generating virtual polyphony in software, such as the pin pulse and pulse interleaving techniques. These methodologies are considered in respect to their compositional implications and instrumental applications. [Keywords: chiptune, 1-bit, one-bit, ZX Spectrum, pulse pin method, pulse interleaving, timbre, polyphony, history]

  • zend-manual-pdfmetadata.html