# the map shows the distribution of the language family that has the largest number of speakers in the world. which statement correctly identifies the language family shown and the method of diffusion that best explains the pattern?

### James

Guys, does anyone know the answer?

get the map shows the distribution of the language family that has the largest number of speakers in the world. which statement correctly identifies the language family shown and the method of diffusion that best explains the pattern? from EN Bilgi.

## Unit 3 Culture Test

Find and create gamified quizzes, lessons, presentations, and flashcards for students, employees, and everyone else. Get started for free!

## The ecological drivers of variation in global language diversity

Language diversity is distributed unevenly over the globe. Intriguingly, patterns of language diversity resemble biodiversity patterns, leading to suggestions that similar mechanisms may underlie both linguistic and biological diversification. Here we ...

## Methods

### Data collection

All analyses are based on grid cells for a global equal-area projection at three different resolutions: low resolution with grid cell size 1000 × 1000 km, medium resolution with grid cell size 500 × 500 km, and high resolution with grid cell size 200 × 200 km. We excluded grid cells where no language distributions overlap that cell.

Language distributions were compiled from the World Language Mapping System v16 (http://worldgeodatasets.com) that includes information on the geographic distribution of 6425 languages^{1}. These distribution data represent the likely spatial range of the traditional linguistic homelands of languages, and do not include in a language polygon speakers in other areas, such as migrant populations, nor do they include official languages defined by political boundaries.

Language diversity was calculated by overlaying the language range polygons with the global grid using the R packages “sp” and “raster”^{47–49} and counting the number of languages whose distribution overlaps all or part of a grid cell. We included the percentage of a grid cell covered by land as an independent variable in each regression model. For islands that cover < 1% of the area of the grid cell, land coverage was set to 0.01 unless the exact number could be derived from the language data.

Both the ecological risk and isolation hypotheses make predictions about the relationship between speaker population size and climate or landscape factors. We included both the minimum and the average size of speaker populations of all the languages present in each grid cell, based on current available estimates of the number of native speakers of a language resident in a country as recorded in WLMS (the number of L1 speakers in the country that spans a grid cell). These data do not capture change in number of speakers over time or historical changes in the geographic distribution of L1 speakers, but instead represent a snapshot of current speaker population distribution, which should provide a reflection of general patterns associated with population differences between languages^{50}. To control for regional variations in number of people per grid cell, we used total human population density from the Gridded Population of the World database^{51}.

We included four climatic variables in our analysis: annual temperature, annual precipitation, temperature seasonality, and precipitation seasonality, averaged over each grid cell. We also included two variables derived from eco-climatic factors: net primary productivity and mean growing season of a grid cell. Net primary productivity data were derived from the Socioeconomic Data and Applications Center^{52}. Data on growing season were obtained from the Global Agro-ecological Zones Data Portal version v3.0^{53}, which is calculated as the number of days per year suitable for growing crops based on precipitation, evapotranspiration and soil moisture holding capacity. The other climatic variables were obtained from the Worldclim global climate data set v1.4^{54}.

We included four variables describing landscape factors that have been previously suggested to influence population movement and range expansion: average altitude, altitudinal range, landscape roughness, and river density in each grid cell. Altitude data were obtained from Worldclim^{54}. Landscape roughness data were calculated as the autocorrelation in altitude^{13} (at every 1 km along 100 km length transects, averaged over eight different directions) derived from the SRTM30 elevation data set^{55}. River density was calculated as the number of river branches within each grid cell^{13}, derived from the Global Self-consistent, Hierarchical, High-resolution Shoreline database^{56}.

Vascular plant richness data was from Kreft & Jetz^{57}. Species richness of all the amphibian, mammal, and bird taxa was from BiodiversityMapping^{58,59} that produces maps of species richness from species distribution data obtained from IUCN, BirdLife International, and NatureServe databases. These maps were resampled to the grid resolutions we used in our analyses. To capture broad scale variation in ecosystem structure and composition, we also compiled data on the world’s biomes from WWF^{60}. Biomes are discrete regions with a distinct ecological character that is determined by a combination of climate, geomorphology and vegetation types^{60}.

### Statistical analysis

We applied generalized least squares (GLS) analysis, implemented in the R package *nlme*^{61}, to fit regression models to log-transformed language diversity. We did not transform predictors, because residuals in language diversity after accounting for all the untransformed climate and landscape predictors do not violate normality under any resolution according to Shapiro-Wilk normality test (low: *p* = 0.72; medium: *p* = 0.19; high: *p* = 0.07). We accounted for spatial autocorrelation and phylogenetic relatedness by constraining the residual correlation in language diversity between each pair of grid cells to be a linear function of the spatial proximity and phylogenetic similarity between the two cells. The correlation matrix has the form: (1 − *α*)*I* + *α*[*βP* + (1 − *β*)*D*], where *I* is an identity matrix, *P* is the phylogenetic similarity matrix, and *D* is the spatial proximity matrix, *α* represents the relative contribution of spatial and phylogenetic versus other residual effects, *β* represents the relative contribution of spatial versus phylogenetic effects^{62}. Because our analysis controls for non-independence of grid cells, we can be more confident that the results are not driven by pseudoreplication. For example, without such correction, grid cells in the Arctic that repeatedly sample the same widely distributed languages (e.g.,Russian and Yakut) may have a disproportionate influence on global language diversity correlations (Supplementary Figure ^{1}).

In order to correct for non-independence owing to descent, we need a matrix of covariation representing expected patterns of similarity. There is no accepted universal phylogeny for the world’s languages, so we constructed a global hierarchy of language relationships from the World Language Mapping System^{1} taxonomy using the python library Treemaker^{63}. This hierarchy is a proxy for the expected patterns of similarity due to relatedness and does not represent a phylogenetic history of descent. It is a represention of the relationships within language families and therefore provides a way to generate a matrix of expected similarity due to descent^{33}. The global language taxonomy is only resolved to the language family level, so we assume that any pair of languages from different families represent the maximum distance from each other. This hierarchy is therefore unresolved at the base. The expected similarity due to relatedness of languages was calculated for each pair of grid cells using the *PhyloSor* metric^{64}. This measure compares the sum of distances on the language hierarchy that connect all the languages that occur in a pair of grid cells to the sum of distances that connect all the languages occurring in each grid cell. This measure ranges from 0 to 1, with 0 for two grid cells that do not share any language families in common and 1 for two grid cells that have an identical set of languages.

The spatial proximity matrix was derived from the great-circle distances between the centroids of each pair of grid cells. We modeled the decay in similarity of language diversity with distance as the Gaussian function *e*^{−(d/γ)2}, where *d* is the great-circle distance between the two grid cells and *γ* is the coefficient describing how fast similarity decays over the distance between grid cells.

Adjacent grid cells can share similar or identical values for environmental variables, as well as sharing many of the same species and languages, making their correlation coefficient at or close to 1. A large number of self-similar values lead to degeneracy of the matrix (with much less information than the number of entries in the matrix). Under medium and high resolutions, correlation between adjacent grid cells is so high that the correlation matrix is nearly singular, leading to a high level of error when taking the inverse of a large matrix. We limit self-similarity across the correlation matrix by subsampling grid cells to avoid adjacent cells with highly similar values. For medium resolution, we avoided sampling adjacent cells by first removing the nine surrounding grid cells, i.e., sampling a grid cell every two rows and columns. This was insufficient to allow convergence of likelihood estimation for the high-resolution grids, so we then removed the 24 surrounding grid cells, i.e., sampling a grid cell every three rows and columns (Supplementary Figure ^{6}). This resulted in 216 grid cells under low resolution, 192 grid cells under medium resolution, and 366 grid cells under high resolution. This subsampling procedure also has the effect of reducing the disparity in number of datapoints at different resolutions. We repeated analyses under high resolution using subsampling that starts with different rows and columns. For example, starting with row 1, we will sample row 4, 7, etc., while starting with row 2, we will sample row 5, 8, etc. In total, there are nine subsampling regimes. Analyses under different subsampling generate qualitatively the same results (Supplementary Table ^{3} and ^{4}).

We used the *subplex* method in the R package *nloptr*^{65} to find the maximum-likelihood estimates for the coefficients in our regression models. To test if a variable is associated with language diversity above its covariation with the other variables, the variable was dropped from the full model that included all the variables, then a likelihood ratio test was used to test if dropping the variable significantly decreased model fit. To assess how much variance in language can be explained by the climatic and landscape variables, we calculated the predicted *R*^{2} of the regression model that included all the climatic and landscape variables as predictors. To evaluate the contribution of phylogenetic non-independence and spatial autocorrelation, we refitted the regression model using the method of ordinary least squares (OLS), which does not account for correlation structure in language diversity among grid cells. Difference in the predicted *R*^{2} between the GLS method and the OLS method quantifies the impact of spatial autocorrelation and phylogenetic non-independence to the results.

### Reporting summary

Further information on research design is available in the ^{Nature Research Reporting Summary} linked to this article.

Guys, does anyone know the answer?