Geohash

水深无声 2022-08-07 09:55 215阅读 0赞



Geohash

From Wikipedia, the free encyclopedia

Jump to: navigation, search

This article is about a system for encoding geographic coordinates. For the game, see Geohashing.

Geohash is a latitude/longitude geocode system invented by Gustavo Niemeyer when writing the web service at geohash.org, and put into the public domain. It is a hierarchical spatial data structure which subdivides space into buckets of grid shape.

Geohashes offer properties like arbitrary precision and the possibility of gradually removing characters from the end of the code to reduce its size (and gradually lose precision).

As a consequence of the gradual precision degradation, nearby places will often (but not always) present similar prefixes. The longer a shared prefix is, the closer the two places are.

Contents

[hide]

  • 1 Service
  • 2 Uses
  • 3 Example

    • 3.1 Decode from base 32
    • 3.2 Decode binary to decimal
    • 3.3 Worked example
  • 4 Limitations
  • 5 License and patents
  • 6 See also
  • 7 External links
  • 8 References

Service[edit]

The purpose of the geohash.org service, launched in February 2008, is to offer short URLs which uniquely identify positions on the Earth, so that referencing them in emails, forums, and websites is more convenient.

To obtain the Geohash, the user provides an address to be geocoded, or latitude and longitude coordinates, in a single input box (most commonly used formats for latitude and longitude pairs are accepted), and performs the request.

Besides showing the latitude and longitude corresponding to the given Geohash, users who navigate to a Geohash at geohash.org are also presented with an embedded map, and may download a GPX file, or transfer the waypoint directly to certain GPS receivers. Links are also provided to external sites that may provide further details around the specified location.

For example, the coordinate pair 57.64911,10.40744 (near the tip of the peninsula of Jutland, in Denmark) produces a slightly shorter hash of u4pruydqqvj, which can be used in the URL http://geohash.org/u4pruydqqvj

Uses[edit]

The main usages of Geohashes are

  • as a unique identifier.
  • represent point data e.g. in databases.

Geohashes have also been proposed to be used for geotagging.

When used in a database, the structure of geohashed data has two advantages. First, data indexed by geohash will have all points for a given rectangular area in contiguous slices (the number of slices depends on the precision required and the presence of geohash “fault lines”). This is especially useful in database systems where queries on a single index are much easier or faster than multiple-index queries. Second, this index structure can be used for a quick-and-dirty proximity search - the closest points are often among the closest geohashes. Another system known as GeoZip might provide a simpler way of achieving similar results if the application does not require the resulting code to be in a compressed format. This claim made by the GeoZip inventor has not been verified yet.

Example[edit]

Using the hash ezs42 as an example, here is how it is decoded into a decimal latitude and longitude

Decode from base 32[edit]

The first step is decoding it from base 32 using the following character map:



















































































Decimal 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Base 32 0 1 2 3 4 5 6 7 8 9 b c d e f g
 
Decimal 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Base 32 h j k m n p q r s t u v w x y z

This operation results in the bits 01101 11111 11000 00100 00010. Assuming that counting starts at 0 in the left side, the even bits are taken for the longitude code (0111110000000), while the odd bits are taken for the latitude code (101111001001).

Decode binary to decimal[edit]

Each binary code is then used in a series of divisions, considering one bit at a time, again from the left to the right side. For the latitude value, the interval -90 to +90 is divided by 2, producing two intervals: -90 to 0, and 0 to +90. Since the first bit is 1, the higher interval is chosen, and becomes the current interval. The procedure is repeated for all bits in the code. Finally, the latitude value is the center of the resulting interval. Longitudes are processed in an equivalent way, keeping in mind that the initial interval is -180 to +180.

Finishing the procedure should yield approximately latitude 42.6 and longitude -5.6.

Worked example[edit]

Here’s a worked example decoding 101111001001 into 42.6. To start with, we know the latitude is somewhere in the range −90 to 90. With no bits, we’d have to guess the latitude was 0, giving us an error of ±90. With one bit, we can decide whether its in the range −90 to 0, or 0 to 90. The first bit is high, so we know our latitude is somewhere between 0 and 90. Without any more bits, we’d guess the latitude was 45, giving us an error of ±45.

Each subsequent bit halves this error. This table shows the effect of each bit. At each stage, the relevant half of the range is highlighted in green; a low bit selects the lower range, a high bit selects the upper range.

The last column shows the latitude, simply the mean value of the range. Each subsequent bit makes this value more precise.












































































































bit min mid max val err
1 -90.000 0.000 90.000 45.000 45.000
0 0.000 45.000 90.000 22.500 22.500
1 0.000 22.500 45.000 33.750 11.250
1 22.500 33.750 45.000 39.375 5.625
1 33.750 39.375 45.000 42.188 2.813
1 39.375 42.188 45.000 43.594 1.406
0 42.188 43.594 45.000 42.891 0.703
0 42.188 42.891 43.594 42.539 0.352
1 42.188 42.539 42.891 42.715 0.176
0 42.539 42.715 42.891 42.627 0.088
0 42.539 42.627 42.715 42.583 0.044
1 42.539 42.583 42.627 42.605 0.022

(The numbers in the above table have been rounded to 3 decimal places for clarity)

Final rounding should be done carefully in a way that

\\min \\le \\mathrm\{round\}(value) \\le \\max

So if rounding 42.605 to 42.61 or 42.6 is correct, rounding to 43 it is not.












































































geohash length lat bits lng bits lat error lng error km error
1 2 3 ±23 ±23 ±2500
2 5 5 ± 2.8 ± 5.6 ±630
3 7 8 ± 0.70 ± 0.7 ±78
4 10 10 ± 0.087 ± 0.18 ±20
5 12 13 ± 0.022 ± 0.022 ±2.4
6 15 15 ± 0.0027 ± 0.0055 ±0.61
7 17 18 ±0.00068 ±0.00068 ±0.076
8 20 20 ±0.000085 ±0.00017 ±0.019

Limitations[edit]

One limitation of the Geohash algorithm is in attempting to utilize it to find points in proximity to each other based on a common prefix. Edge case locations close to each other but on opposite sides of the 180 degree meridian will result in Geohash codes with no common prefix (different longitudes for near physical locations). Points close by at the North and South poles will have very different geohashes (different latitudes for near physical locations).

Two close locations on either side of the Equator (or Greenwich meridian) will not have a long common prefix since they belong to different ‘halves’ of the world. Put simply, one location’s binary latitude (or longitude) will be 011111… and the other 100000…., so they will not have a common prefix and most bits will be flipped. This can also be seen as a consequence of relying on the Z-order curve for ordering the points, as two points close-by might be visited at very different times. However, two points with a long common prefix will be close-by.

In order to do a proximity search, one could compute the southwest corner (low geohash with low latitude and longitude) and northeast corner (high geohash with high latitude and longitude) of a bounding box and search for geohashes between those two. This will retrieve all points in the z-order curve between the two corners, which can be far too many points, this also breaks down at the 180 meridians and the poles. Solr uses a filter list of prefixes, by computing the prefixes of the nearest squares close to the geohash [1].

Thirdly, since a geohash (in this implementation) is based on coordinates of longitude and latitude the distance between two geohashes reflects the distance in latitude/longitude coordinates between two points, which does not translate to actual distance, see Haversine formula.

Example of non-linearity for latitude-longitude system:

  • At the Equator (0 Degrees) the length of a degree of longitude is 111.320 km, while a degree of latitude measures 110.574 km, an error of 0.67%.
  • At 30 Degrees (Mid Latitudes) the error is 110.852/96.486 = 14.89%
  • At 60 Degrees (High Arctic) the error is 111.412/55.800 = 99.67%, reaching infinity at the poles.

Note that these limitations are not due to geohashing, and not due to latitude-longitude coordinates, but due to the difficulty of mapping coordinates on a sphere (non linear and with wrapping of values, similar to modulo arithmetic) to two dimensional coordinates and the difficulty of exploring a two dimensional space uniformly. The first is related to Geographical coordinate system and Map projection, and the other to Hilbert curve and z-order curve. Once a coordinate system is found that represents points linearly in distance and wraps up at the edges, and can be explored uniformly, applying geohashing to those coordinates will not suffer from the limitations above.

While it is possible to apply geohashing to an area with a cartesian coordinate system, it would then only apply to the area where the coordinate system applies.

Despite those issues, there are possible workarounds, and the algorithm has been successfully used in Elasticsearch,[1] MongoDB,[2] HBase, and Accumulo[3] to implement proximity searches.

An alternative to storing Geohashes as strings in a database are Locational codes, which are also called spatial keys and similar to QuadTiles.[4][5]

License and patents[edit]

The Geohash geocode has been put in the public domain by its inventor in the public announcement date, on February 26, 2008.[6]

While comparable algorithms have been successfully patented[7] and had copyright claimed upon,[8][9] GeoHash is based on an entirely different algorithm and approach.

See also[edit]

  • Geohash-36
  • Grid (spatial index)
  • Morton number (number theory)
  • Natural Area Code
  • Maidenhead Locator System
  • Military grid reference system
  • geohash.org
  • elasticsearch: the definitive guide - Geo
  • Visualizing Geohash
  • Perl module to interact with geohash.org
  • Libraries, packages and modules for encoding and decoding geohashes without interacting with geohash.org: in C, Python, Haskell,

Perl, Ruby (Gem), Ruby, Ocaml, Clojure, Go, Objective-C, Javascript (Demo), PHP, MySQL, PostGIS, nodejs

  • Java classes for encoding and decoding geohashes: geospatialweb, jgeohash, geohash-java, and geo
  • kml file for Google Earth displaying geohash grid
  • Area Check Tool
  • [2] - GeoMesa - A spatio-temporal database using Geohashes and Accumulo
  • Simple and fast conversion from geohash to latitude/longitude and from latitude/longitude to geohash

References[edit]

  1. Jump up ^ geo_shape Type Mapping in Elasticsearch
  2. Jump up ^ Geospatial Indexing in MongoDB
  3. Jump up ^ Spatio-temporal Indexing in Non-relational Distributed Databases
  4. Jump up ^ Spatial Keys
  5. Jump up ^ QuadTiles
  6. Jump up ^ geohash.org announcement post in groundspeak.com forum
  7. Jump up ^ Compact text encoding of latitude/longitude coordinates - Patent 20050023524
  8. Jump up ^ Does Microsoft Infringe the Natural Area Coding System?
  9. Jump up ^ The Natural Area Coding System - Legal and Licensing





































































[hide]



Administrative codes
 
Airport codes
 
Country codes
 
Geodesic place codes

















Global

 




 
Postal codes
 
Telephony
 
Radio broadcasting
 
Sport

Retrieved from “ http://en.wikipedia.org/w/index.php?title=Geohash&oldid=651848024“

Categories:

  • Geocodes

发表评论

表情:
评论列表 (有 0 条评论,215人围观)

还没有评论,来说两句吧...

相关阅读

    相关 GeoHash

    1、geohash及其性质 一种空间索引技术。 (1)将二维的经纬度位置数据转换为一维的字符串(基本上hash族的算法都是这样); 其优点在于hash编码后的字符串,可以

    相关 GeoHash核心原理解析

       机机是个好动又好学的孩子,平日里就喜欢拿着手机地图点点按按来查询一些好玩的东西。某一天机机到北海公园游玩,肚肚饿了,于是乎打开手机地图,搜索北海公园附近的餐馆,并选

    相关 GeoHash索引

    GeoHash简介 GeoHash索引是一种基于B树索引,又结合了格网索引的思想的使用广泛的空间索引算法。GeoHash将空间位置编码为一串字符,通过字符串的比较可以得到

    相关 geohash算法原理

    1.geohash算法实现的功能: 在很多应用中会用到查询附近的人的功能,具体实现就是根据自己所在地点的经纬度与别人的经纬度做计算。如果在某个范围内则取出显示,但是经纬度是