Recently in Colloquium Category


THURSDAY, SEPTEMBER 1, 2011 - 4:00 pm - 201 Thomas Bldg.
Refreshments: 3:30 pm - 330 Thomas Bldg.

YANG FENG, Columbia University

" Nonparametric Independence Screening in Sparse Ultra-High Dimensional Additive Models"

A variable screening procedure via correlation learning was proposed in Fan and Lv (2008) to reduce dimensionality in sparse ultra-high dimensional models. Even when the true model is linear, the marginal regression can be highly nonlinear. To address this issue, we further extend the correlation learning to marginal nonparametric learning. Our nonparametric independence screening is called NIS, a specific member of the sure independence screening.  Several closely related variable screening procedures are proposed. Under general nonparametric models, it is shown that under some mild technical conditions, the proposed
independence screening methods enjoy a sure screening property.
The extent to which the dimensionality can be reduced by Independence screening is also explicitly quantified. As a methodological extension, a data-driven thresholding and an iterative nonparametric independence screening (INIS) are also proposed to enhance the finite sample performance for fitting sparse additive models.  The simulation results and a real data analysis demonstrate that the proposed procedure works well with moderate sample size and large dimension and performs better than competing methods. This is a joint work with Jianqing Fan and Rui Song.


MICHAEL NEWTON - University of Wisconsin, Madison

Time and location
:    Wednesday, April 27, 2011 - 4:00 pm - 201 Thomas Bldg.
                                  Coffee:  3:30 pm - 330 Thomas Bldg.


A problem in statistical genomics is to examine the points of contact between genomic data generated experimentally and exogenous information about gene function.  The purpose of such data integration may be to summarize extensive gene-level data into manageable units, or it may be to enhance the signal to noise ratio through set-level averaging.  In either case there are unique statistical problems with such data integration.  I will review several statistical approaches and examine them in examples from cancer virology and flu replication genomics.  Included will be a discussion of a new ``role model'', which aims to address pleiotropy and
the spurious association of functional categories with changes in cellular state. When inferring non-null behavior of a functional category, role-model computations incorporate not only data on genes in that category, but also other functional attributes of these genes. Some difficult computational challenges emerge from this approach.

More information about speaker

Our Wed's (Note that this is different from our regular time slot) speaker is Michael Newton from Univ of Wisconsin, Madison. Dr. Newton is the recipient of the prestigious COPSS award in 2004. Among many of his contributions are Bayesian methods, bootstrap, multiple hypothesis testing, tree reconstruction, and applications of statistical methodology in genomic. Here is a link to his publications:

Abstract :


Gaussian graphical models explore dependent relationships between random variables, through estimation of the corresponding inverse covariance (precision) matrices. We develop an estimator for such models appropriate for heterogeneous data; specifically, data obtained from different categories that share some common structure, but also exhibit differences. We propose a method which jointly estimates several graphical models corresponding to the different categories present in the data. The method aims to preserve the common structure, while allowing for differences between the categories. This is achieved through a hierarchical penalty that targets the removal of common zeroes in the precision matrices across categories. We establish the asymptotic consistency and persistency of the proposed estimator in the high-dimensional case, and illustrate its superior performance on a number of simulated networks. An application to learning semantic connections between terms from webpages collected from computer science departments is also included. Some extensions to Markov networks (suitable for binary/categorical variables) are also discussed.

More information about speaker :

Our Thursday's speaker is Professor Michailidis from Univ. of Michigan.
Dr. Michailidis's research interests include machine learning, change-point problem, applied probability, computational statistics, bioinformatics, networks etc. He has numerous publications, including one book "Introduction to Machine Learning and Bioinformatics". Here is a link for more information

Time:  Thursday, March 24, 2011 at 4:00 PM
Place:  117 Osmond Lab
Speaker:  Dr. Edward Seidel Deputy
               Director of the Mathematical and Physical Science Division of the National Science Foundation

"The Data and Compute-Driven Transformation of Modern Science"

Modern science is undergoing a profound transformation as it aims to tackle the complex problems of the 21st century. It is becoming highly collaborative; problems as diverse as climate change, renewable energy or the origin of gamma ray bursts require understanding processes that no single group or community has the skills to address. At the same
time, after centuries of little change, compute, data, and network environments have grown by 12 orders of magnitude in the last few decades. Cyberinfrastructure - the comprehensive set of deployable hardware, software and algorithmic tools and environments supporting research, education, and increasingly collaboration across disciplines - is transforming all research disciplines and society itself. Motivating with examples ranging from astrophysics to emergency forecasting, Dr. Seidel will describe new trends in science and the need, the potential, and the transformative impact of cyberinfrastructure. He will also discuss current and planned future efforts at the National Science Foundation to address them.
Time: 11:30AM on Friday April 1
Place: 108 Wartik
Speaker: Nick Chia (from Illinois)


The biological world, especially its majority microbial component, is strongly interacting and dominated by collective effects. I will provide a brief introduction of how living cells communicate genetically through transferred genes and the ways in which they can reorganize their genomes in response to environmental pressure. I will show how ideas from statistical physics can impact our understanding of microbial genome dynamics as they specifically relate to:

  1. phenotype switching and specialization in closely-knit microbial communities known as biofilms
  2. environmental drivers of genome variation, as may arise when a microbe becomes symbiotic to a host, or through spatial and temporal variation in the world's oceans

Finally, I describe how my ongoing analyses of modern high-throughput genomics data are beginning to shed light on the complexity of real microbial communities and their evolutionary dynamics.

Time:  Tuesday March 29 at 4pm
Place: Berg Auditorium,100 Life Sciences Building

Speaker:  Rahul Kulkarni (Virginia Tech)

Tuesday, March 22, 2011 - 4:00 pm - 201 Thomas Bldg.
Coffee: 3:30 pm -  330 Thomas Bldg.
JING LEI - Google, Inc.
"Differential Privacy in Statistics"
 The goal of statistical disclosure control is to release accurate statistics from a data set while preserving the privacy of individuals. This problem has received attention in several disciplines, including  statistics, theoretical computer science, security, and databases.   A
 particular challenge in privacy-preserving data analysis is to achieve mathematically rigorous privacy guarantees that will hold independent of any  possible "auxiliary information" the privacy attackers might have.  Recently, the notion of differential privacy has been proposed; this protects individual information independent of the computation power and auxiliary information available to the attacker.  In this talk I will  introduce the notion of differential privacy and apply it to classical point estimation problems in statistics, such as location-scale estimation, and linear models.  This talk will be self-contained and no preliminary
 knowledge about differential privacy is required.

Probability Seminar [Feb 25]

| 0 Comments | 0 TrackBacks

Friday, Feb. 25 at 2:20pm in McAllister 106:

Speaker: Jun Masamune, Penn State University Altoona

Titel: On stochastic completeness of a jump process and its application to graphs

Abstract: In 1986, A Grigor`yan discovered a sharp volume-growth condition for the Brownian motion on a Riemannian manifold to be stochastically complete; namely, non-explosive. This condition was extended to a diffusion process on a metric measure space by K.Th. Sturm in 1994. Quite recently, R.K. Wojciechowski constructed a stochastically incomplete graph that satisfies this condition with respect to the graph distance. In this talk, I will introduce a new volume-growth condition for a jump-process on a metric measure space that is stochastically complete, then I will apply that result to a graph. Wojciechowski's example confirms that our condition for a graph is sharp. This result was obtained in a collaboration with Alexander Grigor`yan and Xeuping Huang (Bielefeld, Germany).

Time and Location:  Tuesday, March 1, 2011 - 4:00 pm - 201 Thomas Bldg.
                                 Coffee- 3:30 pm - 330 Thomas Bldg.

Speaker:    BOAZ NADLER - Weizmann Institute of Science, Israel
                  More about speaker:


Detection of signals in noise from multivariate observations is a fundamental problem in many different fields. In this talk I'll review several such problems and their connections to various classical multivariate statistical problems, including principal component analysis and MANOVA. I'll then show how these can be studied by a combination of matrix perturbation with random matrix theory.


The talk will be self contained, no prior knowledge in any of the above subjects will be assumed.

|More about speaker|

Boaz Nadler is from  Weizmann Institute of Science, Israel. He is currently working with Peter
Bickel as a postdoc at Berkeley.

Dr. Nadler has a broad range of research interests, including random matrix theory, nonparametric modeling, high-dimensional problems, information theory and signal processing, wavelets etc. Here is a link to his publications

[Location:]  Wednesday, February 23, 2011   12:15-1:30      302 Pond
                   A light lunch will be provided.


Group disparity, i.e., to what extent two groups differ from each other on a given measure, has been the focus of most sociological inquiries since the very beginning of the discipline. The standard approach to model group disparity is to compare the mean scores of the outcome variable between groups, with and without controlling for covariates. While this "average- person" approach informs us about the typical persons in the sample of interest with certain
characteristics, it is inadequate to capture the overall structure of inequality which consists of not only typical persons but also non-typical people. This inadequacy is particularly salient in
sociology, as sociologists are intrinsically interested in the relative status of individuals within social structures, not the single position that an individual occupies.

In this study I shall show that it is necessary to enlarge our scope and advance the modeling of group disparity through distribution-based comparisons. In particular, quantile regression allows us to examine the relationship between the covariates and the dependent variable
throughout the distribution of the dependent variable. An empirical study on the earnings assimilation of Hispanic immigrants in the United States will be provided for further illustration.

[More about speaker:]
Xiaozhou Wang
Department of Sociology and QuaSSI Predoctoral Fellow