机器学习在历史保护街区活力分析上的应用
来源:buildings | 作者:公丕欣,黄骁然 | 发布时间 :2022-12-01 | 94 次浏览: | 分享到:
本文基于北京市有的八个历史街区,运用机器学习的方法对分为23个因子(形态、功能、视觉和交通四个方面)的活力特征进行分析建模,以便更好地管理历史保护街区。
Based on eight historic blocks in Beijing, this paper uses machine learning to analyze and model the dynamic characteristics divided into 23 factors (form, function, vision and traffic), so as to better manage historic protected blocks.

研究方法

Research method

本研究结合了多源数据和机器学习技术来评估历史保护的活力力特征区域(HPZs)和探索相关的影响因素。把8 个HPZs分成842个单位,然后构造一个缓冲区(= 250米,半径的大小选择经过多次尝试),每个单元的质心计算每一个指标。

HPZs的生命力特征分为三个维度,“物理空间活力”、“网络空间活力”,“情绪程度”。影响因素的研究,总共有23个指标构建块的四维形态、道路交通特性,功能形式和视觉环境。然后,活力的三个维度作为响应变量,分别建立回归研究,使用随机森林算法(监督机器学习算法广泛应用于分类和回归问题)。最后,每个因素的影响程度的生命力是解释基于功能的重要性和相关性分析。

This study combines multi-source data and machine learning techniques to assess the dynamic force characteristic regions (HPZs) of historic preservation and explore related influencing factors. Divide 8 HPZs into 842 units, then construct a buffer zone (= 250 meters, the size of the radius was chosen after many attempts), and calculate each index by the center of mass of each unit.

The vitality characteristics of HPZs are divided into three dimensions, "physical space vitality", "network space vitality", "emotional degree". In the study of influencing factors, there are a total of 23 indicators to construct the four-dimensional form, road traffic characteristics, functional form and visual environment of the block. The three dimensions of vitality were then used as response variables to separately establish regression studies, using random forest algorithms (supervised machine learning algorithms widely used in classification and regression problems). Finally, the vitality of the degree of influence of each factor is explained based on an analysis of functional importance and relevance.


图片

图片

本文中人群的密度由百度地图的热力图决定,百度移动应用服务记录人们的轨迹,合理、有效地提供一个机会来观察研究区域的人流密度[74]。我们收集热力图数据从百度地图(http://lbsyun.baidu.com/, 2021年6月19日访问),数据采样精度1 h,从早上7:30到下午23:30 (2021/06/19)。栅格数据被导入到ArcGIS软件和geo-calibrated使用地图包含坐标(WGS84普遍墨卡托投影UTM Zone50)。

In this paper, the population density is determined by the heat map of Baidu Map. Baidu mobile application service records people's tracks and reasonably and effectively provides an opportunity to observe the population density of the study area [74]. Heat map we collect data from baidu map (http://lbsyun.baidu.com/, June 19, 2021), the data sampling precision of 1 h, 30 (2021/06/19) from 7:30 in the morning to the afternoon. Raster data is imported into ArcGIS software and geo-calibrated using maps containing coordinates (WGS84 Universal Mercator Projection UTM Zone50).

图片

本文中网络空间的活力值由新浪微博的打卡数据决定。从新浪微博签到数据能反映二维特征:一是用户实际到达的地方活动,另一个是用户愿意分享,这将带来相应的网络容量和注意加热后通过网络传播。我们应用微博签到数据(https://www.beijingcitylab.com/data-released-1/, 2022年6月6日访问)作为网络代理的生命力。被处理的数据在ArcGIS软件方面签入的数量在每个采样点的缓冲区的值指标。

对于情绪方面,我们使用这些8 个HPZs的“名称”为关键词搜索和抓取微博的内容(中文)。由BERTpretraining(https://github.com/729593736/Sentiment-Analysis, 2022年6月6日访问)。输出值从0(负面)到1(积极的)。然后,我们设置基线0.5区分情绪倾向来计算每一块的信心程度的比例正样本。

视觉因子的计算,首先,沿街采样点生成轴,50米的间距),获得相对足够的从百度地图街景图片(https://lbsyun.baidu.com/, 2021年6月20日访问)。其次,Deeplab v3 + pretrained网络,采用基于MATLAB的语义分割深度学习工具箱和计算机视觉的工具箱。第三,我们计算每个视觉元素的比例计算的平均比例街景元素在每个研究小组的缓冲。我们最终选定9类景观元素,包括绿色比例看,比天空视图,路比等,作为视觉指标。

The vitality value of the network space in this paper is determined by the punched data of Sina Weibo. The sign-in data from Sina Weibo can reflect two-dimensional characteristics: one is the actual local activities of users, and the other is the willingness of users to share, which will bring the corresponding network capacity and attention to heat up and spread through the network. We used weibo sign in data (https://www.beijingcitylab.com/data-released-1/, June 6, 2022) as the vitality of network proxy. The amount of data being processed in the ArcGIS software is checked in at each sampling point in the buffer value index.

For the emotional aspect, we use these 8 HPZs' names' for keyword search and crawl the content of Weibo (in Chinese). By BERTpretraining (https://github.com/729593736/Sentiment-Analysis, June 6, 2022). Output values range from 0(negative) to 1(positive). We then set baseline 0.5 to distinguish emotional tendencies to calculate the confidence level of each piece of the positive proportional sample.

Visual factor calculation, first of all, the sampling points along the street to generate shaft, 50 meters distance), a relatively sufficient street view images from baidu map (https://lbsyun.baidu.com/, June 20, 2021). Secondly,Deeplab v3 + pretrained network adopts semantic segmentation deep learning toolbox based on MATLAB and computer vision toolbox. Third, we calculated the ratio of each visual element to calculate the average ratio of the street view element buffer in each study group. We finally selected 9 categories of landscape elements, including green scale to look at, sky view to compare, Luby, etc., as visual indicators.


研究结果

Research result

因子相关性

Factor correlation

经过因子相关性分析,发现交叉路口、购物和消费场所的数量,餐饮场所的数量,天空视图比例和物理空间活力呈现负相关性。反而建筑在土地利用的占比和物理空间活力呈现正相关性。在历史保护区域(HPZs)以旅游为主要业务形式,行人喜欢古城的缓慢移动的系统享受他们的旅行。道路交叉口的密集的区域,更复杂的道路交通,不利于人们的保持和娱乐。购物和娱乐场所的数量通常是商业区的手段来吸引人。然而,由于城市形态的限制和保护政策,通常没有许多购物和娱乐的地方。因此,他们表现出相反的趋势物理空间活力,指示HPZs和商业区的区别。至于视觉环境,由于古城街道尺度,天空视图比例越高,宽度越高,建筑就越少。相对而言,建筑的比例越高,更舒适周围的感觉,从而吸引更多的人流。由于百度热地图数据的限制,我们无法确定组谁是真正的行人,因此得出的结论可能有偏见;然而,人群从量的角度来看,它还提供了有益的思想。

Through factor correlation analysis, it is found that intersections, the number of shopping and consumption places, the number of dining places, the proportion of sky view and the vitality of physical space are negatively correlated. On the contrary, the proportion of architecture in land use is positively correlated with physical space vitality. In Historic Preservation Areas (HPZs) where tourism is the main form of business, pedestrians enjoy the slow moving system of the ancient city to enjoy their travels. Dense areas of road intersections and more complex road traffic are not conducive to people's retention and recreation. The number of shopping and entertainment venues is usually a means for business districts to attract people. However, due to the restrictive and protective policies of the urban form, there are usually not many places for shopping and entertainment. As a result, they exhibit the opposite trend of physical space vitality, indicating the difference between HPZs and commercial areas. As for the visual environment, due to the street scale of the ancient city, the higher the sky view proportion and the higher the width, the fewer buildings. Relatively speaking, the higher the proportion of the building, the more comfortable the feeling of the surroundings, thus attracting more people. Due to the limitations of our heat map data, we are not able to determine who is a real pedestrian in the group, so the conclusions drawn may be biased; However, the crowd from a quantitative point of view, it also provides useful ideas.

图片

通过随机森林模型可以对因子的重要性进行排序,本文设置了三个模型:物理空间分析、情感分析、网络空间活力分析

物理空间分析

 道路交叉口密度对物理空间活力是最重要的影响,这可能是因为更密集的交通路口的意思是重,过度密度的区域道路十字路口将吸引离线行人流量有一定的障碍。购物和消费场所的数量和道路阻抗是产生影响的其他因素。这两个因素有负面影响在物理空间的活力。然而,酒店设施的数量,公共基础设施的数量和面积比率对物理空间活力几乎没有影响。

情感分析

道路阻抗和公共基础设施的数量是最大的影响对于人群的情绪。其次,天空视图比,行人的数量,建筑高度的标准差和平均建筑高度也对群众满意度有积极的影响。其它因素,尤其是酒店的数量和教育设施的数量没有影响。

网络空间活力分析

对于网络空间的活力,餐饮场所的数量和娱乐设施影响最大,这对网络空间的活力有积极的影响。其他因素,如道路阻抗,景点的数量和绿视率在模型中也很重要。然而,酒店的数量、道路交叉口密度、公共基础设施的数量不重要。

The random forest model can be used to rank the importance of factors. In this paper, three models are set up: physical space analysis, emotion analysis and network space vitality analysis

Physical space analysis

Road intersection density is the most important influence on physical space vitality, which may be because denser traffic intersections mean heavier, and over-dense areas of road intersections will attract offline pedestrian traffic with certain barriers. The number of shopping and consumption places and road impedance are other factors that make a difference. These two factors have a negative effect on the vitality of the physical space. However, the number of hotel facilities, the number of public infrastructure and the area ratio have little effect on the physical space vitality.

Sentiment analysis

Road impedance and the amount of public infrastructure are the biggest influences on the mood of the crowd. Secondly, the sky view ratio, the number of pedestrians, the standard deviation of building height and the average building height also have a positive impact on the satisfaction of the masses. Other factors, notably the number of hotels and educational facilities, had no effect. 

Analysis of the vitality of cyberspace 

For the vitality of cyberspace, the number of dining places and entertainment facilities have the greatest impact, which has a positive impact on the vitality of cyberspace. Other factors such as road impedance, the number of attractions and greenness are also important in the model. However, the number of hotels, the density of road intersections, and the amount of public infrastructure are not important.


研究结论

Research conclusions

本文的贡献主要有:

第一,在研究方法和工作流程,提出了一种框架,结合多源数据和机器学习技术和与其他先进数字集成分析方法如简历、NLP和GIS的建设活力指数。这将为城市活力研究提供一个新的视角和其他定量研究相关主题;

第二,对于模型的性能,随机森林模型提出了研究有很好的拟合数据分布能力:模型1的r平方(物理空间活力)是0.86,模型2的r平方(情绪程度)是0.85,模型3是0.76(网络空间活力),以及每个模型的均方根误差小于0.5。这三个模型建立在这项研究在解释变量和泛化性能好,可进一步应用于大规模测量其他历史保护街区,建立更迅速和翔实的结果。

第三,发现了一些最影响历史街区活力的因子。(1)交叉路口密度,商店的数量,和道路阻抗三个最重要的影响因素与活力是负相关。(2)对于情绪来说,影响最高的是道路阻抗和公共基础设施的数量,也影响人们的满意度。(3)餐饮和娱乐设施的数量是至关重要的因素,积极影响网络空间的活力。

The main contributions of this paper are:

First, in research methods and workflows, a framework is proposed that combines multi-source data and machine learning techniques and builds a dynamic index with other advanced digital integration analysis methods such as RESUME, NLP and GIS. This will provide a new perspective for the study of urban vitality and other quantitative research related topics;

Second, for the performance of the model, the random forest model has a good ability to fit the data distribution: the r square of model 1 (physical space vitality) is 0.86, the r square of model 2 (emotional degree) is 0.85, and the R square of model 3 is 0.76(cyberspace vitality), and the root-mean-square error of each model is less than 0.5. The three models established in this study have good performance in explanatory variables and generalization, and can be further applied to large-scale measurements of other historic preservation districts to establish more rapid and informative results.

Thirdly, some factors that most affect the vitality of historic districts are found. (1) The three most important influencing factors of intersection density, number of shops, and road impedance are negatively correlated with vitality. (2) For emotion, the highest impact is the amount of road impedance and public infrastructure, which also affects people's satisfaction. (3) The number of dining and entertainment facilities is a crucial factor that positively affects the vitality of cyberspace.