Introduction
At the local scale, the number of species is related to the sampling effort by species-accumulation curves (Gotelli & Colwell Reference Gotelli and Colwell2001). The number of sampled species is a matter of well-known statistics based on independent and identically distributed (iid) samples, and estimators of the total number of species of a homogeneous community are available, among which the best known are Chao’s (Chao Reference Chao1984) and the jackknife (Burnham & Overton Reference Burnham and Overton1978). These estimators can be applied to incidence data (i.e. the number of sampled plots that contain a given species) as well as abundance data (the number of sampled individuals of a given species). Yet, these tools fail to estimate regional diversity because increasing the sampled area implies including new, different communities, preventing iid sampling in practice.
Yet, Cazzolla Gatti et al. (Reference Cazzolla Gatti, Reich, Gamarra, Crowther, Hui, Morera, Bastin, de-Miguel, Nabuurs, Svenning, Serra-Diaz, Merow, Enquist, Kamenetsky, Lee, Zhu, Fang, Jacobs, Pijanowski, Banerjee, Giaquinto, Alberti, Almeyda Zambrano, Alvarez-Davila, Araujo-Murakami, Avitabile, Aymard, Balazy, Baraloto, Barroso, Bastian, Birnbaum, Bitariho, Bogaert, Bongers, Bouriaud, Brancalion, Brearley, Broadbent, Bussotti, Castro da Silva, César, Češljar, Chama Moscoso, Chen, Cienciala, Clark, Coomes, Dayanandan, Decuyper, Dee, Del Aguila Pasquel, Derroire, Djuikouo, Van Do, Dolezal, Đorđević, Engel, Fayle, Feldpausch, Fridman, Harris, Hemp, Hengeveld, Herault, Herold, Ibanez, Jagodzinski, Jaroszewicz, Jeffery, Johannsen, Jucker, Kangur, Karminov, Kartawinata, Kennard, Kepfer-Rojas, Keppel, Khan, Khare, Kileen, Kim, Korjus, Kumar, Kumar, Laarmann, Labrière, Lang, Lewis, Lukina, Maitner, Malhi, Marshall, Martynenko, Monteagudo Mendoza, Ontikov, Ortiz-Malavasi, Pallqui Camacho, Paquette, Park, Parthasarathy, Peri, Petronelli, Pfautsch, Phillips, Picard, Piotto, Poorter, Poulsen, Pretzsch, Ramírez-Angulo, Restrepo Correa, Rodeghiero, Rojas Gonzáles, Rolim, Rovero, Rutishauser, Saikia, Salas-Eljatib, Schepaschenko, Scherer-Lorenzen, Šebeň, Silveira, Slik, Sonké, Souza, Stereńczak, Svoboda, Taedoumg, Tchebakova, Terborgh, Tikhonova, Torres-Lezama, van der Plas, Vásquez, Viana, Vibrans, Vilanova, Vos, Wang, Westerlund, White, Wiser, Zawiła-Niedźwiecki, Zemagho, Zhu, Zo-Bi and Liang2022) successfully applied the incidence-based Chao estimator to 100- by 100-km cells (each cell considered as a plot) covering all forests in the world to assess the number of tree species at the scale of continents. The method requires huge datasets to avoid undersampling and sampling biases.
At very large scales, the unified neutral theory of biodiversity and biogeography (Hubbell Reference Hubbell2001) implies that the distribution of the metacommunity’s species abundances is in log-series (Fisher et al. Reference Fisher, Corbet and Williams1943), allowing the extrapolation of the rank-abundance curve of sampled species up to the rarest one, represented by a single individual, and counting the number of necessary species. Based on this method, the diversity of tree species has been estimated in Amazonia (ter Steege et al. Reference ter Steege, Pitman, Sabatier, Baraloto, Salomão, Guevara, Phillips, Castilho, Magnusson, Molino, Monteagudo, Núñez Vargas, Montero, Feldpausch, Coronado, Killeen, Mostacedo, Vasquez, Assis, Terborgh, Wittmann, Andrade, Laurance, Laurance, Marimon, Marimon, Guimarães Vieira, Amaral, Brienen, Castellanos, Cárdenas López, Duivenvoorden, Mogollón, de Almeida Matos, Dávila, García-Villacorta, Stevenson Diaz, Costa, Emilio, Levis, Schietti, Souza, Alonso, Dallmeier, Montoya, Fernandez Piedade, Araujo-Murakami, Arroyo, Gribel, Fine, Peres, Toledo, Aymard, Baker, Cerón, Engel, Henkel, Maas, Petronelli, Stropp, Zartman, Daly, Neill, Silveira, Paredes, Chave, de Andrade Lima Filho, Jørgensen, Fuentes, Schöngart, Cornejo Valverde, Di Fiore, Jimenez, Peñuela-Mora, Phillips, Rivas, van Andel, von Hildebrand, Hoffman, Zent, Malhi, Prieto, Rudas, Ruschell, Silva, Vos, Zent, Oliveira, Schutz, Gonzales, Trindade Nascimento, Ramirez-Angulo, Sierra, Tirado, Umaña Medina, van der Heijden, Vela, Vilanova Torre, Vriesendorp, Wang, Young, Baider, Balslev, Ferreira, Mesones, Torres-Lezama, Urrego Giraldo, Zagt, Alexiades, Hernandez, Huamantupa-Chuquimaco, Milliken, Palacios Cuenca, Pauletto, Valderrama Sandoval, Valenzuela Gamarra, Dexter, Feeley, Lopez-Gonzalez and Silman2013; ter Steege et al. Reference ter Steege, Prado, de Lima, Pos, de Souza Coelho, de Andrade Lima Filho, Salomão, Amaral, de Almeida Matos, Castilho, Phillips, Guevara, de Jesus Veiga Carim, Cárdenas López, Magnusson, Wittmann, Martins, Sabatier, Irume, da Silva Guimarães, Molino, Bánki, Piedade, Pitman, Ramos, Monteagudo Mendoza, Venticinque, Luize, Núñez Vargas, Silva, de Leão Novo, Reis, Terborgh, Manzatto, Casula, Honorio Coronado, Montero, Duque, Costa, Castaño Arboleda, Schöngart, Zartman, Killeen, Marimon, Marimon-Junior, Vasquez, Mostacedo, Demarchi, Feldpausch, Engel, Petronelli, Baraloto, Assis, Castellanos, Simon, de Medeiros, Quaresma, Laurance, Rincón, Andrade, Sousa, Camargo, Schietti, Laurance, de Queiroz, Nascimento, Lopes, de Sousa Farias, Magalhães, Brienen, Aymard, Revilla, Vieira, Cintra, Stevenson, Feitosa, Duivenvoorden, Mogollón, Araujo-Murakami, Ferreira, Lozada, Comiskey, de Toledo, Damasco, Dávila, Lopes, García-Villacorta, Draper, Vicentini, Cornejo Valverde, Lloyd, Gomes, Neill, Alonso, Dallmeier, de Souza, Gribel, Arroyo, Carvalho, de Aguiar, do Amaral, Pansonato, Feeley, Berenguer, Fine, Guedes, Barlow, Ferreira, Villa, Peñuela Mora, Jimenez, Licona, Cerón, Thomas, Maas, Silveira, Henkel, Stropp, Paredes, Dexter, Daly, Baker, Huamantupa-Chuquimaco, Milliken, Pennington, Tello, Pena, Peres, Klitgaard, Fuentes, Silman, Di Fiore, von Hildebrand, Chave, van Andel, Hilário, Phillips, Rivas-Torres, Noronha, Prieto, Gonzales, de Sá Carpanedo, Gonzales, Gómez, de Jesus Rodrigues, Zent, Ruschel, Vos, Fonty, Junqueira, Doza, Hoffman, Zent, Barbosa, Malhi, de Matos Bonates, de Andrade Miranda, Silva, Barbosa, Vela, Pinto, Rudas, Albuquerque, Umaña, Carrero Márquez, van der Heijden, Young, Tirado, Correa, Sierra, Costa, Rocha, Vilanova Torre, Wang, Oliveira, Kalamandeen, Vriesendorp, Ramirez-Angulo, Holmgren, Nascimento, Galbraith, Flores, Scudeller, Cano, Ahuite Reategui, Mesones, Baider, Mendoza, Zagt, Urrego Giraldo, Ferreira, Villarroel, Linares-Palomino, Farfan-Rios, Farfan-Rios, Casas, Cárdenas, Balslev, Torres-Lezama, Alexiades, Garcia-Cabrera, Valenzuela Gamarra, Valderrama Sandoval, Ramirez Arevalo, Hernandez, Sampaio, Pansini, Palacios Cuenca, de Oliveira, Pauletto, Levesley, Melgaço and Pickavance2020) and at the world scale (Slik et al. Reference Slik, Arroyo-Rodríguez, Aiba, Alvarez-Loayza, Alves, Ashton, Balvanera, Bastian, Bellingham, van den Berg, Bernacci, da Conceição Bispo, Blanc, Böhning-Gaese, Boeckx, Bongers, Boyle, Bradford, Brearley, Breuer-Ndoundou Hockemba, Bunyavejchewin, Calderado Leal Matos, Castillo-Santiago, Catharino, Chai, Chen, Colwell, Robin, Clark, Clark, Clark, Culmsee, Damas, Dattaraja, Dauby, Davidar, DeWalt, Doucet, Duque, Durigan, Eichhorn, Eisenlohr, Eler, Ewango, Farwig, Feeley, Ferreira, Field, de Oliveira Filho, Fletcher, Forshed, Franco, Fredriksson, Gillespie, Gillet, Amarnath, Griffith, Grogan, Gunatilleke, Harris, Harrison, Hector, Homeier, Imai, Itoh, Jansen, Joly, de Jong, Kartawinata, Kearsley, Kelly, Kenfack, Kessler, Kitayama, Kooyman, Larney, Laumonier, Laurance, Laurance, Lawes, Amaral, Letcher, Lindsell, Lu, Mansor, Marjokorpi, Martin, Meilby, Melo, Metcalfe, Medjibe, Metzger, Millet, Mohandass, Montero, de Morisson Valeriano, Mugerwa, Nagamasu, Nilus, Ochoa-Gaona, Page, Parolin, Parren, Parthasarathy, Paudel, Permana, Piedade, Pitman, Poorter, Poulsen, Poulsen, Powers, Prasad, Puyravaud, Razafimahaimodison, Reitsma, dos Santos, Roberto Spironello, Romero-Saltos, Rovero, Rozak, Ruokolainen, Rutishauser, Saiter, Saner, Santos, Santos, Sarker, Satdichanh, Schmitt, Schöngart, Schulze, Suganuma, Sheil, da Silva Pinheiro, Sist, Stevart, Sukumar, Sun, Sunderland, Suresh, Suzuki, Tabarelli, Tang, Targhetta, Theilade, Thomas, Tchouto, Hurtado, Valencia, van Valkenburg, Van Do, Vasquez, Verbeeck, Adekunle, Vieira, Webb, Whitfeld, Wich, Williams, Wittmann, Wöll, Yang, Adou Yao, Yap, Yoneda, Zahawi, Zakaria, Zang, de Assis, Garcia Luize and Venticinque2015).
Regional diversity, i.e. at intermediate scales between single communities and the metacommunity, brought less attention. The large and spatially uniform datasets necessary to apply incidence data extrapolation are not easy to gather so alternative methods must be considered: this motivated this study, along with a particular interest for the forest of French Guiana.
The main contribution of this paper is to estimate the number of tree species at the regional scale, in French Guiana (8 million hectares of tropical moist forest with no ecological boundary to distinguish them from the rest of Amazonia) and demonstrate which method is valid to do so. We build on Harte’s self-similarity model (Harte et al. Reference Harte, Kinzig and Green1999a) that implies the power-law relationship of Arrhenius (Reference Arrhenius1921) and provides a technique to evaluate its parameters (Harte et al. Reference Harte, Mccarthy, Taylor, Kinzig and Fischer1999b), previously applied by Krishnamani et al. (Reference Krishnamani, Kumar and Harte2004) in the Western Ghats, India, a 60,000-ha tropical forest with around 1,000 tree species. The current checklist contains close to 1800 tree species (Molino et al. Reference Molino, Sabatier, Grenand, Engel, Frame, Delprete, Fleury, Odonne, Davy, Lucas and Martin2022) in French Guiana. Our estimate is around 2200.
We also compare our work to all methods reviewed above and the lesser-known, scale-independent universal species–area relationship based on maximum entropy (Harte et al. Reference Harte, Smith and Storch2009). We discuss in depth which method may be applied according to the addressed spatial scale.
Methods
Data
To apply the methods detailed below, a large enough inventory is necessary along with a set of small, widely spread forest plots. We gathered 3 local, large inventories to account for environmental variability and a network of plot covering the whole region.
Our plot network is GuyaDiv (Engel Reference Engel2015). Since the installation of the first plots in 1986, the GuyaDiv network has continuously grown until today. It now consists of 243 plots of various sizes and shapes, distributed in various forest types, in 30 sites across French Guiana. We took into account the 68 one-hectare plots of the network (Figure 1). They are located in 21 sites, which provides fairly good coverage of the variability of the forest. They contain 43081 trees among which 415 were removed from the analyses because they could not be assigned to a species or morphospecies.
The Paracou research station (Gourlet-Fleury et al. Reference Gourlet-Fleury, Guehl and Laroussinie2004) contains six 6.25-ha and one 25-ha plots of primary rainforest. Nine 6.25-ha plots were logged between 1986 and 1988 in a forestry experiment that temporarily increased the recruitment of light-demanding species (Mirabel et al. Reference Mirabel, Marcon and Hérault2021) and the functional diversity (Mirabel et al. Reference Mirabel, Hérault and Marcon2020).
In a rather conservative approach, we retained only the well-identified trees of the permanent plots (571 species) and added available data from the GuyaDiv network: transects from Molino & Sabatier (Reference Molino and Sabatier2001) and ten 0.49-ha plots around the Guyaflux tower (Bonal et al. Reference Bonal, Bosc, Ponton, Goret, Burban, Gross, Bonnefond, Elbers, Longdoz, Epron, Guehl and Granier2008) contain 575 species, including 132 new ones. 37 more species at the French Guiana IRD Herbarium (CAY: Gonzalez et al. Reference Gonzalez, Bilot-Guérin, Delprete, Geniez, Molino and Smock2022) were collected in the area but outside the plots. The total number of species is thus 740 included in a 4.84-km2 convex envelope.
The Piste de Saint-Elie site has been intensively sampled for 50 years. It encompasses nineteen 1-ha and one half-hectare plots in GuyaDiv and a few small plots added for various studies. Moreover, many herbarium specimen were collected from the site. As a whole, we gathered 763 species in a 3-km2 area.
Nouragues research station (Bongers et al. Reference Bongers, Charles-Dominique, Forget and Théry2001) provides 22 hectares of permanent plots. We applied the same protocol, adding 11 Guyadiv plots and herbarium collections up to 850 species in a 2.5-km2 area.
Self-similarity
Self-similarity (Harte et al. Reference Harte, Kinzig and Green1999a) is a property based on scale invariance. Consider a species that is present in an area ${A_0}$ , say French Guiana. The probability to find it in half the whole area, denoted ${A_1}$ is $a$ . Then, if it is present in ${A_1}$ , the probability to find it in turn in half ${A_1}$ , denoted ${A_2}$ , is also $a$ , and so on. The probability to find the species in ${A_n}$ is thus ${a^n}$ . In other words, the conditional probability to find a species in a subarea, given that it is present in the area containing it, is constant: it does not depend either on the observation scale or on the species considered.
The Arrhenius power law (Arrhenius Reference Arrhenius1921) both implies and is a consequence of the self-similarity property (Harte et al. Reference Harte, Kinzig and Green1999a). The number of species ${\rm{S}}\left( A \right)$ observed in an area $A$ is
where $z$ is the power parameter and $c$ is the number of species in an area of size 1. Actually, $a = {2^{ - z}}$ . This is a classical relation in macroecology, with long empirical and theoretical support (Gárcia Martín & Goldenfeld Reference Gárcia Martín and Goldenfeld2006; Williamson et al. Reference Williamson, Gaston and Lonsdale2001).
If $z$ is known, the inventory of a reasonably large area $b$ allows computing $c = {\rm{S}}\left( b \right)/{b^z}$ . Then, ${\rm{S}}\left( A \right)$ can be calculated for any value of $A$ .
Harte et al. (Reference Harte, Mccarthy, Taylor, Kinzig and Fischer1999b) showed that under the assumption of self-similarity, $z$ can be inferred from the dissimilarity between small and distant plots of equal size distributed across the area. The Sørensen (Reference Sørensen1948) similarity between two plots is
where ${S_1}$ (respectively ${S_2}$ ) is the number of species in plot 1 (resp. plot 2) and ${S_1} \cap {S_2}$ is the number of common species.
Applied to plots of the same size separated by distance $d$ , Sørensen’s similarity decreases with distance following the relation $\chi \sim {d^{ - 2z}}$ (Harte et al. Reference Harte, Mccarthy, Taylor, Kinzig and Fischer1999b) that can be estimated by the linear model
The logarithm of the Sørensen dissimilarity between pairs of plots can be regressed against the logarithm of the distance between the plots: the slope of the regression is $ - 2z$ .
The relation (2.3) holds at the same scale as the power law, i.e. at the regional scale (Grilli et al. Reference Grilli, Azaele, Banavar and Maritan2012). Krishnamani et al. (Reference Krishnamani, Kumar and Harte2004) estimated $z \approx 0.12$ with a very good fit to the linear model at distances up from 1 km but not below.
The number of plots varies across locations so the estimation of $z$ must be made with care. We sampled one random plot at each location to obtain $21 \times 20/2 = 210$ pairs of plots. We calculated the Sørensen dissimilarity $\chi $ and the geographic distance $d$ between each pair of plots. We estimated $z$ as half the coefficient of the distance variable in the linear model ${\rm{log}}\left( \chi \right) \sim {\rm{log}}\left( d \right)$ . We repeated these steps 1000 times to obtain a distribution of estimated $z$ values depending on the plots drawn in each location. $z$ was estimated as the empirical mean of the distribution and its 95% confidence interval was obtained by eliminating the 2.5% extreme values on both tails.
The confidence interval of the estimation of the number of species was assessed by combining the uncertainty in $c$ and ${A^z}$ . The variance of $c$ was estimated by the empirical variance of the values calculated at Paracou, Piste de Saint-Elie and Nouragues. That of ${A^z}$ was obtained from the empirical distribution of $z$ . The variance of their product was calculated (the formula and its derivation are in the appendix). Finally, we assumed the normality of the distribution of the product of the estimates to retain an approximate 95% confidence interval of $ \pm $ 2 standard deviations.
All analyses were made with R (R Core Team 2023) v. 4.3.1.
Nonparametric estimators
At smaller scales, i.e. inside a single community, the relation between area and number of species is described by species accumulation curves (SAC: Gotelli & Colwell Reference Gotelli and Colwell2001). It is driven by statistical models that address incomplete sampling (Béguinot Reference Béguinot2015; Shen et al. Reference Shen, Chao and Lin2003). After replacing the sampled area by the number of individuals it contains, well-known estimators of richness such as Chao’s (Chao Reference Chao1984) or the jackknife (Burnham & Overton Reference Burnham and Overton1978) apply.
The Chao1 estimator is
where ${s_{obs}}$ is the number of observed species, $n$ is the sample size, ${f_1}$ and ${f_2}$ are the number of species observed once and twice. Since $n$ is large, $\left( {n - 1} \right)/2n$ can be approximated by $1/2$ .
The jackknife estimator depends on the sampling level of the data. The estimator of order $k$ includes $f_1, f_2, ..., f_k$ , the number of species observed up to $k$ times. Increasing the order implies increasing both the estimate and its uncertainty: starting from order 1, the order is incremented as long as the new estimator is significantly higher than the previous one (Burnham & Overton Reference Burnham and Overton1978). For large $n$ , the jackknife estimator of order 3, used below, is
An alternative, following Cazzolla Gatti et al. (Reference Cazzolla Gatti, Reich, Gamarra, Crowther, Hui, Morera, Bastin, de-Miguel, Nabuurs, Svenning, Serra-Diaz, Merow, Enquist, Kamenetsky, Lee, Zhu, Fang, Jacobs, Pijanowski, Banerjee, Giaquinto, Alberti, Almeyda Zambrano, Alvarez-Davila, Araujo-Murakami, Avitabile, Aymard, Balazy, Baraloto, Barroso, Bastian, Birnbaum, Bitariho, Bogaert, Bongers, Bouriaud, Brancalion, Brearley, Broadbent, Bussotti, Castro da Silva, César, Češljar, Chama Moscoso, Chen, Cienciala, Clark, Coomes, Dayanandan, Decuyper, Dee, Del Aguila Pasquel, Derroire, Djuikouo, Van Do, Dolezal, Đorđević, Engel, Fayle, Feldpausch, Fridman, Harris, Hemp, Hengeveld, Herault, Herold, Ibanez, Jagodzinski, Jaroszewicz, Jeffery, Johannsen, Jucker, Kangur, Karminov, Kartawinata, Kennard, Kepfer-Rojas, Keppel, Khan, Khare, Kileen, Kim, Korjus, Kumar, Kumar, Laarmann, Labrière, Lang, Lewis, Lukina, Maitner, Malhi, Marshall, Martynenko, Monteagudo Mendoza, Ontikov, Ortiz-Malavasi, Pallqui Camacho, Paquette, Park, Parthasarathy, Peri, Petronelli, Pfautsch, Phillips, Picard, Piotto, Poorter, Poulsen, Pretzsch, Ramírez-Angulo, Restrepo Correa, Rodeghiero, Rojas Gonzáles, Rolim, Rovero, Rutishauser, Saikia, Salas-Eljatib, Schepaschenko, Scherer-Lorenzen, Šebeň, Silveira, Slik, Sonké, Souza, Stereńczak, Svoboda, Taedoumg, Tchebakova, Terborgh, Tikhonova, Torres-Lezama, van der Plas, Vásquez, Viana, Vibrans, Vilanova, Vos, Wang, Westerlund, White, Wiser, Zawiła-Niedźwiecki, Zemagho, Zhu, Zo-Bi and Liang2022), consists of paving the territory with a grid whose size does not change the estimation, say 100 km. In each 100 by 100 km cell of the grid, all available data are aggregated to obtain an incidence dataset. The Chao2 estimator (whose formula is identical to that of Chao1, with $n$ equal to the number of grid cells) is finally applied: it combines the number of species observed in only one or two cells to estimate the number of unobserved species.
The Chao and Jackknife estimators variance can be estimated and a confidence interval is available (Burnham & Overton Reference Burnham and Overton1978; Chao Reference Chao1987).
Log-series extrapolation
Assuming that the plots are samples of a metacommunity that follows a log-series distribution, the rank-abundance curve can be extrapolated following ter Steege et al. (Reference ter Steege, Pitman, Sabatier, Baraloto, Salomão, Guevara, Phillips, Castilho, Magnusson, Molino, Monteagudo, Núñez Vargas, Montero, Feldpausch, Coronado, Killeen, Mostacedo, Vasquez, Assis, Terborgh, Wittmann, Andrade, Laurance, Laurance, Marimon, Marimon, Guimarães Vieira, Amaral, Brienen, Castellanos, Cárdenas López, Duivenvoorden, Mogollón, de Almeida Matos, Dávila, García-Villacorta, Stevenson Diaz, Costa, Emilio, Levis, Schietti, Souza, Alonso, Dallmeier, Montoya, Fernandez Piedade, Araujo-Murakami, Arroyo, Gribel, Fine, Peres, Toledo, Aymard, Baker, Cerón, Engel, Henkel, Maas, Petronelli, Stropp, Zartman, Daly, Neill, Silveira, Paredes, Chave, de Andrade Lima Filho, Jørgensen, Fuentes, Schöngart, Cornejo Valverde, Di Fiore, Jimenez, Peñuela-Mora, Phillips, Rivas, van Andel, von Hildebrand, Hoffman, Zent, Malhi, Prieto, Rudas, Ruschell, Silva, Vos, Zent, Oliveira, Schutz, Gonzales, Trindade Nascimento, Ramirez-Angulo, Sierra, Tirado, Umaña Medina, van der Heijden, Vela, Vilanova Torre, Vriesendorp, Wang, Young, Baider, Balslev, Ferreira, Mesones, Torres-Lezama, Urrego Giraldo, Zagt, Alexiades, Hernandez, Huamantupa-Chuquimaco, Milliken, Palacios Cuenca, Pauletto, Valderrama Sandoval, Valenzuela Gamarra, Dexter, Feeley, Lopez-Gonzalez and Silman2013).
First, the total number of trees is estimated by extrapolation of the average number of trees per 1-ha plot of the Guyadiv network to the 8 million hectares of the French Guiana forest.
The probability for one of these trees to belong to a given species is obtained by averaging the frequency of the species among plots. Each plot is a sample of a local community whose composition is not completely known: many rare species are not in the sample. The observed frequency of a species in a plot is not the probability of the species in the community: frequencies sum up to 1 while the sum of the actual probabilities of observed species, called the sample coverage (Good Reference Good1953), sums up to 1 minus that of the unobserved species. The actual probabilities of observed species can be estimated following Chao & Jost (Reference Chao and Jost2015), with the entropart package (Marcon & Hérault Reference Marcon and Hérault2015).
The number of trees per species is then obtained by multiplying the total number of trees by the probability of each species. A rank-abundance curve is produced. Its center part is a straight line (see Figure 3) that can be extrapolated down to the last species, represented by a single tree. The number of species is finally counted.
Its confidence interval is not available: the extrapolation of the curve is very robust, but the estimation of the total number of trees and of the probabilities of species are sources of uncertainty.
Universal species–area relationship
Harte et al. (Reference Harte, Zillio, Conlisk and Smith2008) derived a universal species–area relationship based on the maximum entropy theory. Assuming only that the area, the total numbers of species and individuals, and the summed metabolic energy rate of all individuals are fixed, many features of the species distribution at any scale can be predicted. Of particular interest is the possibility to derive the number of species in a doubled area from the number of species in a sampled, reference area (Harte et al. Reference Harte, Smith and Storch2009; Xu et al. Reference Xu, Liu, Li, Zang and He2012). Starting from a local sample, which may be a single 1-ha plot or one of our large inventories, the area can be doubled until the target size is reached.
The number of trees per hectare is estimated from the Guyadiv network to obtain a single starting point rather than a different one for each plot. To be consistent with the model, the geometric mean is applied: its logarithm equals the average logarithm of the number of trees in all 1-ha plots.
Each step of the estimation consists of doubling the area and calculating the new number of species. This operation is repeated until the target area (8 Mha) is reached, i.e. between 15 times for Paracou (the largest inventory: 484 ha) and 24 times for the 1-ha plots.
Results
Self-similarity
The relation between Sørensen’s similarity and distance is presented in Figure 2. All pairs of plots more than 1 km apart (the scale of Paracou’s 0.625-km2 inventory) are shown, and the regression line of the figure illustrates the relation. Actually, the estimation of $z$ was made as explained in the methods by 1000 random draws of sets of a single random plot per location.
The estimated value of $z$ is 0.104 with a 95% confidence interval between 0.088 and 0.120.
The estimated number of species per square kilometer, $c$ , is respectively 629, 681 and 773 in Paracou, Piste de Saint-Elie and Nouragues. The average value is 694.
Finally, the estimated number of species is 2234. Taking into account the uncertainty about $c$ and $z$ , its 95% confidence interval is between 1587 and 2882.
Species accumulation
The observed number of species is 1314 among which 204 and 119 are sampled once and twice. The lower-bound estimation of the number of species by the Chao1 estimator is 1489. The best jackknife estimator (of order 3) is 1677. Its confidence interval is between 1563 and 1791 at the 5% risk level.
The Chao2 estimator applied to the same plots aggregated into 100 × 100 km cells is 1643. Its confidence interval is between 1436 and 1564 at the 5% risk level.
Log-series extrapolation
The mean number of trees per ha in the Guyadiv 1-ha plots is 627. There are close to 5 billion trees in French Guiana.
Figure 3 is the rank-abundance curve of the species. The most abundant tree species is Eperua falcata with around 151 million trees. The log-abundances of the 25 to 75 percentiles of species are linearly related to the rank, allowing the extrapolation of the curve (the red line).
The estimated number of species according to this model is 4368.
Universal species–area relationship
The method from Harte et al. (Reference Harte, Smith and Storch2009) is applied to our data. The geometric mean number of trees per hectare estimated from the Guyadiv network is 602 trees/ha.
Initial inventories, e.g. 740 trees species in 484 ha in Paracou and the geometric mean number of species in Guyadiv plots, are the starting points of the estimation. Figure 4 shows the species–area curves obtained by successive doubling of the areas.
The curves are almost perfectly fitted by a Michaelis–Menten model, estimated by the linear model (Lineweaver & Burk Reference Lineweaver and Burk1934) ${1 \over {{\rm{log}}S}}\sim{1 \over {{\rm{log}}n}}$ , where $S$ is the number of species and $n$ is the number of trees, allowing a very accurate interpolation at any number of trees. The estimated number of species is thus obtained for $n$ equal to 8 Mha times 602 trees per ha:
-
From Nouragues: 3238 species.
-
From Piste de Saint-Elie: 2739 species.
-
From Paracou: 2385 species.
The extrapolation from the average 1-ha plot is 4737 species. Since it is far less reliable than those from the wide inventories, with 7 to 9 more doubling steps, we do not retain it to produce the average estimate of the universal SAR. Finally, we obtain 2787 species.
Summary
The estimated number of species according to the different methods is summarized in Table 1.
Discussion
The species–area relationship varies across scales
We consider three different spatial scales where different models apply.
At the local scale, i.e. inside a single community, the relation between the area and the number of species is described by species accumulation curves (SAC: Gotelli & Colwell Reference Gotelli and Colwell2001). It is driven by statistical models that address incomplete sampling (Béguinot Reference Béguinot2015; Shen et al. Reference Shen, Chao and Lin2003). Local SACs have been extensively studied and are out of the scope of this paper, but a few results are important here. The distributions of local, tropical moist forest communities are often approximately log-normal. This has been shown empirically (e.g. Duque et al. Reference Duque, Muller-Landau, Valencia, Cardenas, Davies, de Oliveira, Pérez, Romero-Saltos and Vicentini2017) and theoretically (May Reference May, Cody and Diamond1975; Preston Reference Preston1948, Reference Preston1962). In the framework of the neutral theory (Hubbell Reference Hubbell2001), the local community follows a zero-sum multinomial distribution, derived by Volkov et al. (Reference Volkov, Banavar, Hubbell and Maritan2003) but challenged empirically by McGill (Reference McGill2003), in favor of the log-normal distribution.
The SAC, plotted as the number of species against the number of individuals in natural scale, is concave downwards since its slope is the probability for the next individual to belong to a new species (Chao et al. Reference Chao, Wang and Jost2013; Grabchak et al. Reference Grabchak, Marcon, Lang and Zhang2017), which decreases with the sampling effort. This means that the Arrhenius power law does not apply at the local scale. The power law can be estimated empirically (Condit et al. Reference Condit, Hubbell, Lafrankie, Sukumar, Manokaran, Foster and Ashton1996; Plotkin et al. Reference Plotkin, Potts, Leslie, Manokaran, LaFrankie and Ashton2000) but then the value of $z$ depends on the distance, which is in contradiction with the model, which relies on a constant $z$ .
At the regional scale, the mixture of local communities makes a new pattern emerge, namely the power law of Arrhenius (Reference Arrhenius1921). Its origin is empirical, with a lot of support (e.g. Dengler Reference Dengler2009; Triantis et al. Reference Triantis, Guilhaumon and Whittaker2012; Williamson et al. Reference Williamson, Gaston and Lonsdale2001). Theoretically, Hubbell (Reference Hubbell2001) showed that the power law applied to intermediate scales of the neutral theory and Grilli et al. (Reference Grilli, Azaele, Banavar and Maritan2012) derived it from a spatially-explicit model only based on the clustering of species. Preston (Reference Preston1962) showed that local, log-normal communities imply the power law at the regional scale. At this scale, the species–area relationship (SAR) properly speaking is not just a matter of accumulation due to sampling (SAC) but the consequence of the inclusion of different communities.
A long empirical controversy (Connor & McCoy Reference Connor and McCoy1979) opposed Arrhenius and Gleason (Reference Gleason1922), who argued that the number of species predicted by the power law was far too high and proposed ${\rm{S}}\left( {\rm{A}} \right) = {\rm{zlnA}} + {{\rm{c}}_{\rm{g}}}$ rather than the equivalent of eq. (2.1), i.e. ${\rm{lnS}}\left( A \right) = z{\rm{ln}}A + {c_a}$ (where ${c_g}$ , ${c_a}$ and ${c_f}$ below are constants). Actually, Gleason’s model is equivalent to Fisher’s, where ${\rm{S}}\left( A \right) = \alpha {\rm{ln}}A + {c_f}$ if the number of trees is large and is proportional to the area (Engen Reference Engen1977). There is no theoretical support to apply Gleason’s model at the regional scale (Gárcia Martín & Goldenfeld Reference Gárcia Martín and Goldenfeld2006); in other words, the regional distribution of species is not log-series.
The widest scale is that of the metacommunity, in the sense of the neutral theory. Its follows a log-series distribution (Hubbell Reference Hubbell2001) with Fisher’s $\alpha $ equal to $\theta $ , known as the fundamental biodiversity number. The log-series does not apply to the local or regional scale: the empirical estimates of Fisher’s $\alpha $ at these scales increase with the sampling size (e.g. Condit et al. Reference Condit, Hubbell, Lafrankie, Sukumar, Manokaran, Foster and Ashton1996), which is again in contradiction with the model, which implies that $\alpha $ is a constant. At the regional scale, Fisher’s $\alpha $ increases with area because Arrhenius’s law, and not Gleason’s law, is valid. Yet, our data fit a log-series distribution quite well (Figure 3): empirical tests are not efficient to reject the model at the regional scale (Connor & McCoy Reference Connor and McCoy1979). We must rely on theory.
The limits between scales are obviously not sharp. Krishnamani et al. (Reference Krishnamani, Kumar and Harte2004) found that $z$ stabilized when plots more than 1 km apart were considered. We followed them here. Increasing the regional scale makes it converge to the metacommunity. In the absence of dispersal limitation, i.e. with migration parameter equal to 1 in the neutral theory, any regional sample would represent the metacommunity and follow a log-series distribution. So the wider the sampled area, the less distinguishable from the metacommunity the data will be, but at the scale of French Guiana, roughly 1% of Amazonia, many less species are present than in a sample of the same size taken across the whole metacommunity, even if we ignore environmental filtering.
The self-similarity model can be applied at the regional scale
The power law is equivalent to self-similarity (Harte et al. Reference Harte, Kinzig and Green1999a), justifying our preferred method to estimate the richness of the French Guiana forest.
The self-similarity model allows estimating the number of species of tropical forests at a regional scale. It requires a network of plots at a wide range of distances from each other to estimate Arrhenius’s power law parameter. It should be completed by a continuous inventory whose size is consistent with the smallest scale of the power law. These constraints explain why the method has not been widely applied, beyond Krishnamani et al. (Reference Krishnamani, Kumar and Harte2004).
As shown in Figure 2, the fit of the linear model is not perfect. The theory does not address habitat variation that is well-described in French Guiana (Guitet et al. Reference Guitet, Pélissier, Brunaux, Jaouen and Sabatier2015). The dissimilarity between plot pairs is thus explained by distance and habitat dissimilarity, with the latter ignored in the model. Yet, the estimation of $z$ is quite robust because the GuyaDiv network covers a wide range of habitats, allowing to cancel out local variability. Adding more plots or describing a few more species in the previous plots may not change $z$ significantly since it is obtained from the dissimilarity between plots. Its value 0.104 is in line with that of Krishnamani et al. (Reference Krishnamani, Kumar and Harte2004) in another tropical forest: it is very small compared to the classical 0.25 of Arrhenius (Reference Arrhenius1921) or 0.263 of Preston (Reference Preston1962). This was discussed by MacArthur & Wilson (Reference MacArthur and Wilson1967), chapter 2. The power law applies to embedded scales of the same ecosystem here, in contrast to the usual sets of isolated islands providing the data (Triantis et al. Reference Triantis, Guilhaumon and Whittaker2012): in our case, the number of species increases less with the area, leading to smaller $z$ values.
The critical aspect of the estimation is the accuracy of the starting point of the extrapolation, which mainly depends on the representativeness of the local inventories. Again, the self-similarity model assumes that $c$ , the number of species per square kilometer, is the same everywhere. Local, observed values must be understood as variations around the real $c$ , which should be estimated by replicating inventories across the whole region. This is of course restricted by the huge resources needed to settle a single one: three replicates are an exceptional amount of data. Paracou, Piste de Saint-Elie and Nouragues represent quite well the variability of local richness of the forest of French Guiana. We made a strict selection of the data to count the numbers of species, which are thus lower bounds. Ongoing efforts of botanists may increase a bit the value of $c$ , implying a proportional increase in the estimation of the number of species.
Chao2’s estimator is a valid alternative
Nonparametric estimators of richness are widely used to estimate the asymptotic richness of a community because they are designed to estimate the number of unobserved species due to uncomplete sampling (Colwell & Coddington Reference Colwell and Coddington1994). Yet, their underlying assumptions are limited: they do not depend on any distribution model or scale of observation. The only constraint is independent and identically distributed (iid) sampling, even though at the local scale spatial aggregation is often neglected (Picard et al. Reference Picard, Karembe and Birnbaum2004).
The asymptotic estimation based on the Chao1 or jackknife estimator is less than 1700 species, i.e. less than the total number of known species (Molino et al. Reference Molino, Sabatier, Grenand, Engel, Frame, Delprete, Fleury, Odonne, Davy, Lucas and Martin2022). As already underlined by ter Steege et al. (Reference ter Steege, Pitman, Sabatier, Baraloto, Salomão, Guevara, Phillips, Castilho, Magnusson, Molino, Monteagudo, Núñez Vargas, Montero, Feldpausch, Coronado, Killeen, Mostacedo, Vasquez, Assis, Terborgh, Wittmann, Andrade, Laurance, Laurance, Marimon, Marimon, Guimarães Vieira, Amaral, Brienen, Castellanos, Cárdenas López, Duivenvoorden, Mogollón, de Almeida Matos, Dávila, García-Villacorta, Stevenson Diaz, Costa, Emilio, Levis, Schietti, Souza, Alonso, Dallmeier, Montoya, Fernandez Piedade, Araujo-Murakami, Arroyo, Gribel, Fine, Peres, Toledo, Aymard, Baker, Cerón, Engel, Henkel, Maas, Petronelli, Stropp, Zartman, Daly, Neill, Silveira, Paredes, Chave, de Andrade Lima Filho, Jørgensen, Fuentes, Schöngart, Cornejo Valverde, Di Fiore, Jimenez, Peñuela-Mora, Phillips, Rivas, van Andel, von Hildebrand, Hoffman, Zent, Malhi, Prieto, Rudas, Ruschell, Silva, Vos, Zent, Oliveira, Schutz, Gonzales, Trindade Nascimento, Ramirez-Angulo, Sierra, Tirado, Umaña Medina, van der Heijden, Vela, Vilanova Torre, Vriesendorp, Wang, Young, Baider, Balslev, Ferreira, Mesones, Torres-Lezama, Urrego Giraldo, Zagt, Alexiades, Hernandez, Huamantupa-Chuquimaco, Milliken, Palacios Cuenca, Pauletto, Valderrama Sandoval, Valenzuela Gamarra, Dexter, Feeley, Lopez-Gonzalez and Silman2013), nonparametric asymptotic estimation of richness is not appropriate at large scales because of severe undersampling: many local communities are just not included in the data. Yet, increasing the sampling effort would not be enough: mixing local samples (the 1-ha plots) to mimic an iid sampling of a whole region is clearly not a valid approximation because each plot has its own distribution.
Cazzolla Gatti et al. (Reference Cazzolla Gatti, Reich, Gamarra, Crowther, Hui, Morera, Bastin, de-Miguel, Nabuurs, Svenning, Serra-Diaz, Merow, Enquist, Kamenetsky, Lee, Zhu, Fang, Jacobs, Pijanowski, Banerjee, Giaquinto, Alberti, Almeyda Zambrano, Alvarez-Davila, Araujo-Murakami, Avitabile, Aymard, Balazy, Baraloto, Barroso, Bastian, Birnbaum, Bitariho, Bogaert, Bongers, Bouriaud, Brancalion, Brearley, Broadbent, Bussotti, Castro da Silva, César, Češljar, Chama Moscoso, Chen, Cienciala, Clark, Coomes, Dayanandan, Decuyper, Dee, Del Aguila Pasquel, Derroire, Djuikouo, Van Do, Dolezal, Đorđević, Engel, Fayle, Feldpausch, Fridman, Harris, Hemp, Hengeveld, Herault, Herold, Ibanez, Jagodzinski, Jaroszewicz, Jeffery, Johannsen, Jucker, Kangur, Karminov, Kartawinata, Kennard, Kepfer-Rojas, Keppel, Khan, Khare, Kileen, Kim, Korjus, Kumar, Kumar, Laarmann, Labrière, Lang, Lewis, Lukina, Maitner, Malhi, Marshall, Martynenko, Monteagudo Mendoza, Ontikov, Ortiz-Malavasi, Pallqui Camacho, Paquette, Park, Parthasarathy, Peri, Petronelli, Pfautsch, Phillips, Picard, Piotto, Poorter, Poulsen, Pretzsch, Ramírez-Angulo, Restrepo Correa, Rodeghiero, Rojas Gonzáles, Rolim, Rovero, Rutishauser, Saikia, Salas-Eljatib, Schepaschenko, Scherer-Lorenzen, Šebeň, Silveira, Slik, Sonké, Souza, Stereńczak, Svoboda, Taedoumg, Tchebakova, Terborgh, Tikhonova, Torres-Lezama, van der Plas, Vásquez, Viana, Vibrans, Vilanova, Vos, Wang, Westerlund, White, Wiser, Zawiła-Niedźwiecki, Zemagho, Zhu, Zo-Bi and Liang2022) applied a similar method on a large-scale grid (100 × 100 km cells) where species occurrences were reported in each cell. Considering each cell as a plot, the Chao2 estimator (Chao Reference Chao1987) allows estimating the total richness. The practical advantage of this approach is the opportunity to combine several sources of occurrence data to improve the sampling coverage. Theoretically, it is far more robust than the mixture of abundance data: the local distribution of each plot is cancelled out by its transformation into incidence data. An appropriate spatial distribution of sampling plots, covering all habitats or at least a regular grid in absence of more detailed knowledge, can be seen as a valid sampling. When applied to our data, aggregated in 100-km square cells, the estimation is similar to that obtained directly from the abundance data of the plots because of undersampling but the method must not be rejected.
Log series extrapolations are not valid at the regional scale
At the scale of the metacommunity, defined as of the neutral model of biogeography, the species distribution is in log-series (Hubbell Reference Hubbell2001; Volkov et al. Reference Volkov, Banavar, Hubbell and Maritan2003). ter Steege et al. (Reference ter Steege, Pitman, Sabatier, Baraloto, Salomão, Guevara, Phillips, Castilho, Magnusson, Molino, Monteagudo, Núñez Vargas, Montero, Feldpausch, Coronado, Killeen, Mostacedo, Vasquez, Assis, Terborgh, Wittmann, Andrade, Laurance, Laurance, Marimon, Marimon, Guimarães Vieira, Amaral, Brienen, Castellanos, Cárdenas López, Duivenvoorden, Mogollón, de Almeida Matos, Dávila, García-Villacorta, Stevenson Diaz, Costa, Emilio, Levis, Schietti, Souza, Alonso, Dallmeier, Montoya, Fernandez Piedade, Araujo-Murakami, Arroyo, Gribel, Fine, Peres, Toledo, Aymard, Baker, Cerón, Engel, Henkel, Maas, Petronelli, Stropp, Zartman, Daly, Neill, Silveira, Paredes, Chave, de Andrade Lima Filho, Jørgensen, Fuentes, Schöngart, Cornejo Valverde, Di Fiore, Jimenez, Peñuela-Mora, Phillips, Rivas, van Andel, von Hildebrand, Hoffman, Zent, Malhi, Prieto, Rudas, Ruschell, Silva, Vos, Zent, Oliveira, Schutz, Gonzales, Trindade Nascimento, Ramirez-Angulo, Sierra, Tirado, Umaña Medina, van der Heijden, Vela, Vilanova Torre, Vriesendorp, Wang, Young, Baider, Balslev, Ferreira, Mesones, Torres-Lezama, Urrego Giraldo, Zagt, Alexiades, Hernandez, Huamantupa-Chuquimaco, Milliken, Palacios Cuenca, Pauletto, Valderrama Sandoval, Valenzuela Gamarra, Dexter, Feeley, Lopez-Gonzalez and Silman2013) fitted a log-series to data provided by a network of plots to estimate the number of species in Amazonia. We applied the same method to our data. Its estimation is well over 4000 species in French Guiana: a very unlikely result according to the current expert knowledge and the recent checklist (Molino et al. Reference Molino, Sabatier, Grenand, Engel, Frame, Delprete, Fleury, Odonne, Davy, Lucas and Martin2022). The regional species pool does not follow a log-series distribution because of dispersal limitation (Grilli et al. Reference Grilli, Azaele, Banavar and Maritan2012). In other words, the regional community is not a sample of the metacommunity: many of the metacommunity’s species are not present. As a consequence, the log-series estimation of the richness of a regional species pool leads to severe overestimation. For the same reasons, hyperdominance is less pronounced: 4% of the species contain half the trees (Figure S1 in the appendix), compared to 1.4% in Amazonia as a whole (ter Steege et al. Reference ter Steege, Pitman, Sabatier, Baraloto, Salomão, Guevara, Phillips, Castilho, Magnusson, Molino, Monteagudo, Núñez Vargas, Montero, Feldpausch, Coronado, Killeen, Mostacedo, Vasquez, Assis, Terborgh, Wittmann, Andrade, Laurance, Laurance, Marimon, Marimon, Guimarães Vieira, Amaral, Brienen, Castellanos, Cárdenas López, Duivenvoorden, Mogollón, de Almeida Matos, Dávila, García-Villacorta, Stevenson Diaz, Costa, Emilio, Levis, Schietti, Souza, Alonso, Dallmeier, Montoya, Fernandez Piedade, Araujo-Murakami, Arroyo, Gribel, Fine, Peres, Toledo, Aymard, Baker, Cerón, Engel, Henkel, Maas, Petronelli, Stropp, Zartman, Daly, Neill, Silveira, Paredes, Chave, de Andrade Lima Filho, Jørgensen, Fuentes, Schöngart, Cornejo Valverde, Di Fiore, Jimenez, Peñuela-Mora, Phillips, Rivas, van Andel, von Hildebrand, Hoffman, Zent, Malhi, Prieto, Rudas, Ruschell, Silva, Vos, Zent, Oliveira, Schutz, Gonzales, Trindade Nascimento, Ramirez-Angulo, Sierra, Tirado, Umaña Medina, van der Heijden, Vela, Vilanova Torre, Vriesendorp, Wang, Young, Baider, Balslev, Ferreira, Mesones, Torres-Lezama, Urrego Giraldo, Zagt, Alexiades, Hernandez, Huamantupa-Chuquimaco, Milliken, Palacios Cuenca, Pauletto, Valderrama Sandoval, Valenzuela Gamarra, Dexter, Feeley, Lopez-Gonzalez and Silman2013).
The universal species–area relationship (Harte et al. Reference Harte, Zillio, Conlisk and Smith2008) allowed the extrapolation of observed richness up to the 8 million hectares of French Guiana. The number of species estimated from Paracou, Piste de Saint-Elie and Nouragues starting points (their number of species and area) is on average 2787, and over 4500 when extrapolating from an average Guyadiv 1-ha plot. Again, this model implies a log-series distribution as it integrates as few assumptions as possible (Harte et al. Reference Harte, Zillio, Conlisk and Smith2008). On the log–log representation of Figure 4, the species–area relationships are never linear, as predicted by the power law at the regional scale. The arguments for overestimation are the same as those against the extrapolation of the log-series at the regional scale.
The number of species is around 2200
Finally, our estimations of the number of tree species in the 8-million-hectare forest of French Guiana are close to 2200, with a quite wide confidence interval due to the variability in the estimation of both the number of trees in a square kilometer and the power-law parameter. Their distribution is highly unequal: 90 species (4%) contain half the trees.
A recent work (Molino et al. Reference Molino, Sabatier, Grenand, Engel, Frame, Delprete, Fleury, Odonne, Davy, Lucas and Martin2022) lists nearly 1800 species of indigenous trees in French Guiana, based on herbarium collections on the one hand, and on data from the GuyaDiv and GuyaFor plot networks (Engel Reference Engel2015) on the other. However, this checklist is only a state of the art of our knowledge of the tree flora. Even in the most intensively explored areas, botanists conducting botanical inventories have identified a number of entities that are morphologically distinct from all known species in French Guiana, and which they therefore consider to be still unnamed species. They gave them provisional names (e.g. Pouteria sp. A), until more information is available to either recognize species known in other parts of the world, or to describe them and give them a valid name according to the Code of Nomenclature. The GuyaDiv and GuyaFor databases together currently list more than 300 of these unnamed species, but Molino et al. (Reference Molino, Sabatier, Grenand, Engel, Frame, Delprete, Fleury, Odonne, Davy, Lucas and Martin2022) selected only 143 of them for their checklist, the ones that were best characterized and best illustrated by good quality herbarium specimens. Although it cannot be excluded that some of the other 150–200 unnamed species are in fact simple morphological variants of already described species, they believe that most of them represent distinct species. In other words, the number of known species in French Guiana (named and unnamed) is probably already close to 2000. Furthermore, the available data are very unevenly distributed across the territory. The south and especially the north-west of French Guiana are poorly explored botanically (few inventory plots, relatively few herbarium specimens), while their floras are significantly different from the better inventoried northern and central zones. It is thus very likely that the exploration of these little-known areas will add new species to the list. Therefore, the estimate of 2234 spp. given here seems quite plausible, given the state of our knowledge.
A improvement perspective is the aggregation of all sources of localized data at the scale of French Guiana, including all Guyadiv plots, the Guyafor network (Jaouen et al. Reference Jaouen, Dourdain and Derroire2021), herbarium collections and various scientific projects, to proceed with an incidence-based estimation of richness following Cazzolla Gatti et al. (Reference Cazzolla Gatti, Reich, Gamarra, Crowther, Hui, Morera, Bastin, de-Miguel, Nabuurs, Svenning, Serra-Diaz, Merow, Enquist, Kamenetsky, Lee, Zhu, Fang, Jacobs, Pijanowski, Banerjee, Giaquinto, Alberti, Almeyda Zambrano, Alvarez-Davila, Araujo-Murakami, Avitabile, Aymard, Balazy, Baraloto, Barroso, Bastian, Birnbaum, Bitariho, Bogaert, Bongers, Bouriaud, Brancalion, Brearley, Broadbent, Bussotti, Castro da Silva, César, Češljar, Chama Moscoso, Chen, Cienciala, Clark, Coomes, Dayanandan, Decuyper, Dee, Del Aguila Pasquel, Derroire, Djuikouo, Van Do, Dolezal, Đorđević, Engel, Fayle, Feldpausch, Fridman, Harris, Hemp, Hengeveld, Herault, Herold, Ibanez, Jagodzinski, Jaroszewicz, Jeffery, Johannsen, Jucker, Kangur, Karminov, Kartawinata, Kennard, Kepfer-Rojas, Keppel, Khan, Khare, Kileen, Kim, Korjus, Kumar, Kumar, Laarmann, Labrière, Lang, Lewis, Lukina, Maitner, Malhi, Marshall, Martynenko, Monteagudo Mendoza, Ontikov, Ortiz-Malavasi, Pallqui Camacho, Paquette, Park, Parthasarathy, Peri, Petronelli, Pfautsch, Phillips, Picard, Piotto, Poorter, Poulsen, Pretzsch, Ramírez-Angulo, Restrepo Correa, Rodeghiero, Rojas Gonzáles, Rolim, Rovero, Rutishauser, Saikia, Salas-Eljatib, Schepaschenko, Scherer-Lorenzen, Šebeň, Silveira, Slik, Sonké, Souza, Stereńczak, Svoboda, Taedoumg, Tchebakova, Terborgh, Tikhonova, Torres-Lezama, van der Plas, Vásquez, Viana, Vibrans, Vilanova, Vos, Wang, Westerlund, White, Wiser, Zawiła-Niedźwiecki, Zemagho, Zhu, Zo-Bi and Liang2022). A main issue of this approach is the standardization of the taxonomy, already well advanced by Molino et al. (Reference Molino, Sabatier, Grenand, Engel, Frame, Delprete, Fleury, Odonne, Davy, Lucas and Martin2022), so it may be feasible in a near future.
Supplementary material
The supplementary material that contains the appendix of this article can be found at https://doi.org/10.1017/S0266467424000099
Financial support
This work benefited from an ‘Investissement d’Avenir’ grant managed by the Agence Nationale de la Recherche (LABEX CEBA, ref. ANR-10-LBX-25).
Competing interests
The authors declare none.