BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Date iCal//NONSGML kigkonsult.se iCalcreator 2.20.2//
METHOD:PUBLISH
X-WR-CALNAME;VALUE=TEXT:Eventi DIAG
BEGIN:VTIMEZONE
TZID:Europe/Paris
BEGIN:STANDARD
DTSTART:20191027T030000
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
RDATE:20201025T030000
TZNAME:CET
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20200329T020000
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:calendar.19896.field_data.0@www.diag.uniroma1.it
DTSTAMP:20211025T223304Z
CREATED:20200511T063258Z
DESCRIPTION:The lectures of the Data Science PhD course on Computational
and Statistical Methods of Data Reduction will be held this week online at
meet.google.com/hkg-kgxg-azp with the following program: I. Computationa
l methods: sampling and inferential issues (May 11th - 12th 2020\, 09:00-1
3:00)Prof. Serena Arima (Università del Salento)II. Dimensionality Reduct
ion in Clustering and Streaming (May 14th - 15th 2020\, 09:00-13:00) Prof.
Chris Schwiegelshohn (La Sapienza) I. Computational methods: sampling and
inferential issues (Prof.ssa Arima) 1. Random number generation algorithm
:- Acceptance- rejection algorithm\;- Monte Carlo Methods\;- Importance s
ampling\;- Gibbs sampling\;- Antithetic variables 2. Numerical methods fo
r likelihood inference:- EM algorithm\;- Bootstrap\;- Jackknife3. Monte Ca
rlo and Monte Carlo Markov Chain II. Dimensionality Reduction in Clusteri
ng and Streaming (Prof. Schwiegelshohn) First Day:The curse of dimensional
ity is a common occurrence when working with large data sets. In few dimen
sions (such as the Euclidean plane)\, we visualize problems very well and
can often find interesting properties of a data set just by hand. In more
than three dimensions\, our ability to visualize a problem is already seve
rely impacted and our intuition from the Euclidean plane may lead us compl
etely astray. Moreover\, algorithms often scale poorly:Finding nearest nei
ghbors in 2d can be done in nearly linear time. In high dimensions\, it be
comes very difficult to improve over either n^2.Geometric data structures
and decompositions become hard to implement. Line sweeps\, Voronoi diagram
s\, grids\, nets usually scale by at least a factor 2^d\, where d is the d
imension. In some cases\, it may be even worse.Many problems that are easy
to solve in 2D\, such as clustering\, become computationally intractable
in high dimensions. Often\, exact solutions require running times that are
exponential in the number of dimensions.Unfortunately\, high dimensional
data sets are not the exception\, but rather the norm in modern data analy
sis. As such\, much of computational data analysis has been devoted with f
inding ways to reduce the dimension. In this course\, we will study two po
pular methods\, namely principal component analysis (PCA) and random proje
ctions. Principal component analysis originated in statistics\, but is als
o known under various other names\, depending on the fields (e.g. eigenvec
tor problem\, low rank approximation\, etc). We will illustrate the method
\, highlighting the problem that is solved and the underlying assumptions
of PCA. Next\, we will see a powerful tool for dimension reduction known a
s the Johnson-Lindenstrauss lemma. The Johnson-Lindenstrauss lemma states
that given a point set A in an arbitrary high dimension\, we can transform
A into a point set A' in dimension log |A|\, while preserving all pairwis
e distances. For both of these problems\, we will see applications\, inclu
ding k-nearest neighbor classification and k-means. Second day:Large data
sets form a sister topic to dimension reduction. While the benefits of hav
ing a small dimension are immediately understood\, reducing the size of th
e data is a comparatively recent paradigm. There are many reasons for data
compression. Aside from data storage and retrieval\, we want to minimize
the amount of communication in distributed computing\, enable online and s
treaming algorithms\, or simply run an accurate (but expensive) algorithm
on a smaller dataset. A key concept in large-scale data analysis are cores
ets. We view coresets as a succinct summary of a data set that behaves\, f
or any candidate solution\, like the original data set. The surprising suc
cess story of data compression is that for many problems\, we can construc
t coresets of size independent of the input. For example\, linear regressi
on in d dimensions admits coresets of size O(d)\, k-means has coresets of
size O(k)\, irrespective of the number of data points of the original data
set. In our course\, we will describe the coreset paradigm formally. More
over\, we will give an overview of methods to construct coresets for vario
us problems. Examples include constructing coresets from random projection
s\, by analyzing gradients\, or via sampling. We will further highlight a
number of applications.
DTSTART;TZID=Europe/Paris:20200511T083000
DTEND;TZID=Europe/Paris:20200511T083000
LAST-MODIFIED:20200528T094447Z
LOCATION:DIAG - Sapienza
SUMMARY:Data Science PhD course on Computational and Statistical Methods o
f Data Reduction - Prof. Serena Arima and Prof. Chris Schwiegelshohn \n\n
\n \n \n\n \n\n\nChris\n\n\nSchwiegelshohn \n\n \n\n \n\n\n\n\n
\nRicercatore\n\nMember of: \n\n \n\n \n\n \n\nqualifica_rr: \n\nAssi
stant professors (ricercatori)
URL;TYPE=URI:http://www.diag.uniroma1.it/node/19896
END:VEVENT
END:VCALENDAR