Optical metasurfaces for general vision processing on the ed…

By Chaoran, Huang

Jun 17, 2026

Optical metasurfaces for general vision processing on the edge

References Shanahan, M., McDonell, K. & Reynolds, L. Role play with large language models. Nature 623 , 493–498 (2023). Article ADS CAS PubMed Google Scholar Singhal, K. et al. Large language models encode clinical knowledge. Nature 620 , 172–180 (2023). Article ADS CAS PubMed PubMed Central Google Scholar Shastri, B. J. et al. Photonics for artificial intelligence and neuromorphic computing. Nat. Photonics 15 , 102–114 (2021). Article ADS CAS Google Scholar Bernstein, L. et al. Single-shot optical neural network. Sci. Adv. 9 , eadg7904 (2023). Article CAS PubMed PubMed Central Google Scholar Zheng, H. et al. Multichannel meta-imagers for accelerating machine vision. Nat. Nanotechnol. 19 , 471–478 (2024). Article ADS CAS PubMed PubMed Central Google Scholar Zheng, H. et al. Meta-optic accelerators for object classifiers. Sci. Adv. 8 , eabo6410 (2022). Article PubMed PubMed Central Google Scholar Luo, M. et al. Meta-optics based parallel convolutional processing for neural network accelerator. Laser Photonics Rev. 18 , 2300984 (2024). Article ADS Google Scholar Liu, C. et al. A programmable diffractive deep neural network based on a digital-coding metasurface array. Nat. Electron. 5 , 113–122 (2022). Article Google Scholar Shen, Y. et al. Deep learning with coherent nanophotonic circuits. Nat. Photon. 11 , 441–446 (2017). Article ADS CAS Google Scholar Ashtiani, F., Geers, A. J. & Aflatouni, F. An on-chip photonic deep neural network for image classification. Nature 606 , 501–506 (2022). Article ADS CAS PubMed Google Scholar Feldmann, J. et al. Parallel convolutional processing using an integrated photonic tensor core. Nature 589 , 52–58 (2021). Article ADS CAS PubMed Google Scholar Lin, X. et al. All-optical machine learning using diffractive deep neural networks. Science 361 , 1004–1008 (2018). Article ADS MathSciNet CAS PubMed Google Scholar Zhou, T. et al. Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit. Nat. Photonics 15 , 367–373 (2021). Article ADS CAS Google Scholar Antonik, P., Marsal, N., Brunner, D. & Rontani, D. Human action recognition with a large-scale brain-inspired photonic computer. Nat. Mach. Intell. 1 , 530–537 (2019). Article Google Scholar Wang, T. et al. Image sensing with multilayer nonlinear optical neural networks. Nat. Photon. 17 , 408–415 (2023). Article ADS CAS Google Scholar Xia, F. et al. Nonlinear optical encoding enabled by recurrent linear scattering. Nat. Photon. 18 , 1067–1075 (2024). Article ADS CAS Google Scholar Luo, X. et al. Metasurface-enabled on-chip multiplexed diffractive neural networks in the visible. Light Sci. Appl. 11 , 158 (2022). Article ADS CAS PubMed PubMed Central Google Scholar Huang, C. et al. A silicon photonic–electronic neural network for fibre nonlinearity compensation. Nat. Electron. 4 , 837–844 (2021). Article CAS Google Scholar Fu, T. et al. Photonic machine learning with on-chip diffractive optics. Nat. Commun. 14 , 70 (2023). Article ADS CAS PubMed PubMed Central Google Scholar Dong, B. et al. Partial coherence enhances parallelized photonic computing. Nature 632 , 55–62 (2024). Article ADS CAS PubMed PubMed Central Google Scholar Xu, Z. et al. Large-scale photonic chiplet Taichi empowers 160-TOPS/W artificial general intelligence. Science 384 , 202–209 (2024). Article ADS CAS PubMed Google Scholar McMahon, P. L. The physics of optical computing. Nat. Rev. Phys. 5 , 717–734 (2023). Article Google Scholar Yildirim, M., Dinc, N. U., Oguz, I., Psaltis, D. & Moser, C. Nonlinear processing with linear optics. Nat. Photon. 18 , 1076–1082 (2024). Article ADS CAS Google Scholar Goi, E. et al. Nanoprinted high-neuron-density optical linear perceptrons performing near-infrared inference on a CMOS chip. Light Sci. Appl. 10 , 40 (2021). Article ADS CAS PubMed PubMed Central Google Scholar Chen, Y. et al. All-analog photoelectronic chip for high-speed vision tasks. Nature 623 , 48–57 (2023). Article ADS CAS PubMed PubMed Central Google Scholar Wetzstein, G. et al. Inference in artificial intelligence with deep optics and photonics. Nature 588 , 39–47 (2020). Article ADS CAS PubMed Google Scholar Feng, H. et al. Integrated lithium niobate microwave photonic processing engine. Nature 627 , 80–87 (2024). Article ADS CAS PubMed Google Scholar Xu, X. et al. 11 TOPS photonic convolutional accelerator for optical neural networks. Nature 589 , 44–51 (2021). Article ADS CAS PubMed Google Scholar Liu, Z. et al. Swin Transformer: hierarchical vision transformer using shifted windows. In Proc. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 10012–10022 (IEEE, 2021). Cui, K. et al. Spectral convolutional neural network chip for in-sensor edge computing of incoherent natural light. Nat. Commun. 16 , 81 (2025). Article ADS CAS PubMed PubMed Central Google Scholar Wei, K. et al. Spatially varying nanophotonic neural networks. Sci. Adv. 10 , eadp0391 (2024). Article PubMed PubMed Central Google Scholar Qu, G. et al. All-dielectric metasurface empowered optical-electronic hybrid neural networks. Laser Photonics Rev. 16 , 2100732 (2022). Article ADS CAS Google Scholar Rahimi, A. & Recht, B. Random features for large-scale kernel machines. In Proc. 21st International Conference on Neural Information Processing Systems (NIPS’07) 1177–1184 (Curran Associates, 2007). Choromanski, K. M. et al. Rethinking attention with performers. In Proc. International Conference on Learning Representations (ICLR 2021) (ICLR, 2021). Zhang, Y. et al. Image super-resolution using very deep residual channel attention networks. In Proc. European Conference on Computer Vision (ECCV) 286–301 (CVF, 2018). Wang, Q. et al. ECA-net: efficient channel attention for deep convolutional neural networks. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 11534–11542 (CVF, 2020). Vaswani, A. et al. Attention is all you need. In Proc. 31st International Conference on Neural Information Processing Systems (NIPS’17) 6000–6010 (Curran Associates, 2017). Dosovitskiy, A. et al. An image is worth 16×16 words: transformers for image recognition at scale. In Proc. International Conference on Learning Representations (ICLR 2021) (ICLR, 2021). Cordts, M. et al. The Cityscapes dataset for semantic urban scene understanding. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3213–3223 (CVF, 2016). Perazzi, F. et al. A benchmark dataset and evaluation methodology for video object segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 724–732 (CVF, 2016). Jocher, G. Ultralytics YOLOv5. https://github.com/ultralytics/yolov5 (2020). Zhu, X. et al. Deformable DETR: deformable transformers for end-to-end object detection. In Proc. International Conference on Learning Representations (ICLR 2021) (ICLR, 2021). Cheng, B., Misra, I., Schwing, A. G., Kirillov, A. & Girdhar, R. Masked-attention Mask Transformer for universal image segmentation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 1290–1299 (CVF, 2022). Pan, H., Hong, Y., Sun, W. & Jia, Y. Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes. IEEE Trans. Intell. Transp. Syst. 24 , 3448–3460 (2022). Article Google Scholar Xie, E. et al. SegFormer: simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 34 , 12077–12090 (2021). Google Scholar Ranftl, R., Bochkovskiy, A. & Koltun, V. Vision transformers for dense prediction. In Proc. IEEE/CVF International Conference on Computer Vision (ICCV) 12179–12188 (CVF, 2021). Bhat, S. F., Alhashim, I. & Wonka, P. AdaBins: depth estimation using adaptive bins. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 4009–4018 (CVF, 2021). Yang, L. et al. Depth anything: unleashing the power of large-scale unlabeled data. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 10371–10381 (CVF, 2024). Ranftl, R., Lasinger, K., Hafner, D., Schindler, K. & Koltun, V. Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell. 44 , 1623–1637 (2020). Article ADS Google Scholar Zitova, B. & Flusser, J. Image registration methods: a survey. Image Vis. Comput. 21 , 977–1000 (2003). Article Google Scholar Bergevin, R., Soucy, M., Gagnon, H. & Laurendeau, D. Towards a general multi-view registration technique. IEEE Trans. Pattern Anal. Mach. Intell. 18 , 540–547 (1996). Article ADS Google Scholar Ravi, N. et al. Sam 2: Segment anything in images and videos. In Proc. International Conference on Learning Representations (ICLR 2025) (ICLR, 2025). LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86 , 2278–2324 (1998). Article ADS Google Scholar Xiao, H., Rasul, K. & Vollgraf, R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. Preprint at https://arxiv.org/abs/1708.07747 (2017). Schüldt, C., Laptev, I. & Caputo, B. Recognizing human actions: a local SVM approach. In Proc. 17th International Conference on Pattern Recognition (ICPR 2004) Vol. 3, 32–36 (IEEE, 2004). Zheng, Z., Wei, Y. & Yang, Y. University-1652: a multi-view multi-source benchmark for drone-based geo-localization. In Proc. 28th ACM International Conference on Multimedia 1395–1403 (ACM, 2020). Berman, M., Triki, A. R. & Blaschko, M. B. The Lovász-softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 4413–4421 (CVF, 2018). Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L.-C. MobileNetV2: inverted residuals and linear bottlenecks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 4510–4520 (CVF, 2018). Han, K. et al. GhostNet: more features from cheap operations. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 1580–1589 (CVF, 2020). Han, K. et al. Model Rubik’s cube: twisting resolution, depth and width for tinynets. Adv. Neural Inf. Process. Syst. 33 , 19353–19364 (2020). Google Scholar Tan, M. & Le, Q. EfficientNet: rethinking model scaling for convolutional neural networks. In Proc. 36th International Conference on Machine Learning 6105–6114 (PMLR, 2019). Ren, S., He, K., Girshick, R. & Sun, J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39 , 1137–1149 (2016). Article ADS PubMed Google Scholar He, K., Gkioxari, G., Dollár, P. & Girshick, R. B. Mask R-CNN. In Proc. IEEE International Conference on Computer Vision (ICCV) 2961–2969 (CVF, 2017). Lin, T.-Y., Goyal, P., Girshick, R. B., He, K. & Dollár, P. Focal loss for dense object detection. In Proc. IEEE International Conference on Computer Vision (ICCV) 2980–2988 (CVF, 2017). Tan, M., Pang, R. & Le, Q. V. EfficientDet: scalable and efficient object detection. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 10781–10790 (2020). Liu, S. et al. Grounding DINO: marrying DINO with grounded pre-training for open-set object detection. In Proc. European Conference on Computer Vision (ECCV 2024) 38–55 (Springer, 2025). Ronneberger, O., Fischer, P. & Brox, T. U-net: convolutional networks for biomedical image segmentation. In Proc. Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015) 234–241 (Springer, 2015). Zhao, H., Shi, J., Qi, X., Wang, X. & Jia, J. Pyramid scene parsing network. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2881–2890 (CVF, 2017). Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F. & Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proc. European Conference on Computer Vision (ECCV) 801–818 (CVF, 2018). Eigen, D., Puhrsch, C. & Fergus, R. Depth map prediction from a single image using a multi-scale deep network. In Proc. 28th International Conference on Neural Information Processing Systems (NIPS’14) 2366–2374 (MIT Press, 2014). Wofk, D., Ma, F., Yang, T.-J., Karaman, S. & Sze, V. FastDepth: fast monocular depth estimation on embedded systems. In Proc. 2019 International Conference on Robotics and Automation (ICRA) 6101–6108 (IEEE, 2019). Hazirbas, C., Ma, L., Domokos, C. & Cremers, D. FuseNet: incorporating depth into semantic segmentation via fusion-based CNN architecture. In Proc. Asian Conference on Computer Vision (ACCV 2016) 213–228 (Springer, 2017). Peng, J. Code for optical metasurfaces for general vision processing on the edge. Zenodo https://doi.org/10.5281/zenodo.19382032 (2026). Download references

Source: Nature

Read Original Source →

Optical metasurfaces for general vision processing on the edge

Categories

Tags

Get Appointment