Resumen de: US20260057685A1
Methods, systems, and computer readable storage media for performing operations comprising: obtaining a plurality of initial network inputs that have been classified as belonging to a corresponding ground truth class; processing each of the plurality of initial network inputs using a trained target neural network to generate a respective predicted network output for each initial network input, the respective predicted network output comprising a respective score for each of a plurality of classes, the plurality of classes comprising the ground truth class; identifying, based on the respective predicted network outputs and the ground truth class, a subset of the initial network inputs as having been misclassified by the trained target neural network; and determining, based on the subset of initial network inputs, one or more failure case latent representations, wherein each failure case latent representation is a latent representation that characterizes network inputs that belong to the ground truth class but that are likely to be misclassified by the trained target neural network.
Resumen de: US20260056983A1
A method and system for providing an intelligent response agent based on a sophisticated reasoning and speculation function can generate and provide response data for queries related to specialized documents using a deep-learning neural network that implements a stepwise process for a sophisticated reasoning and speculation function.
Resumen de: US20260057486A1
The present disclosure provides an apparatus and method of guided neural network model for image processing. An apparatus may comprise a guidance map generator, a synthesis network and an accelerator. The guidance map generator may receive a first image as a content image and a second image as a style image, and generate a first plurality of guidance maps and a second plurality of guidance maps, respectively from the first image and the second image. The synthesis network may synthesize the first plurality of guidance maps and the second plurality of guidance maps to determine guidance information. The accelerator may generate an output image by applying the style of the second image to the first image based on the guidance information.
Resumen de: US20260057234A1
A method and a device for training a graph neural network are provided. The method may be performed by a graphics processing unit (GPU), and may include determining at least one batch of training data; transmitting batch information corresponding to the determined at least one batch to at least one memory expansion device, so that the at least one memory expansion device acquires feature data for one or more data blocks of the at least one batch based on the batch information, receiving the feature data from the at least one memory expansion device; and training the graph neural network based on the feature data.
Resumen de: US20260051318A1
A method includes receiving training utterances that include non-synthetic speech training utterances and synthetic speech utterances. For each training utterance, the method includes processing, using a memorized neural network, a corresponding sequence of input audio frames to generate a hotword detection output indicating a likelihood the training utterance includes a hotword, determining a first loss based on the hotword detection output, obtaining a hidden layer feature vector for each corresponding input audio frame; processing, using a speech classification model, the hidden layer feature vectors to predict a classification output for the training utterance; and determining an adversarial loss based on the classification output predicted for the training utterance. The method also includes training the memorized neural network on the first losses and the adversarial losses to teach the memorized neural network to learn how to detect the hotword in audio and prevent overfitting of the synthetic speech training utterances.
Resumen de: US20260051148A1
A processor-implemented method for implementing graph cuts for explainability using an artificial neural network (ANN) includes receiving, via the ANN, an input. The input is represented as a graph. The graph includes nodes connected by edges. The ANN determines a graph cut between a source node and a sink node associated with the input by solving a quadratic process with equality constraints. The ANN processes a subset of the input based on the graph cut to generate a prediction.
Resumen de: US20260050438A1
One embodiment provides for a compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction that specifies multiple operands including a multi-bit input value and a one-bit weight associated with a neural network, as well as an arithmetic logic unit including a multiplier, an adder, and an accumulator register. To execute the decoded instruction, the multiplier is to perform a fused operation including an exclusive not OR (XNOR) operation and a population count operation. The adder is configured to add the intermediate product to a value stored in the accumulator register and update the value stored in the accumulator register.
Resumen de: US2024428056A1
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing tasks. One of the methods includes obtaining a sequence of input tokens, where each token is selected from a vocabulary of tokens that includes text tokens and audio tokens, and wherein the sequence of input tokens includes tokens that describe a task to be performed and data for performing the task; generating a sequence of embeddings by embedding each token in the sequence of input tokens in an embedding space; and processing the sequence of embeddings using a language model neural network to generate a sequence of output tokens for the task, where each token is selected from the vocabulary.
Resumen de: JP2025166199A
To provide a decoder, an encoder, a neural network controller and a method allowing for efficient representation and transmission of neural network parameters of a learning or update process.SOLUTION: A decoder for decoding parameters of a neural network obtains a plurality of neural network parameters of the neural network on the basis of an encoded bitstream (1010), and obtains, from the encoded bitstream, node information describing a node of a parameter update tree (1020). The node information includes a parent node identifier and parameter update information. The decoder also derives one or more neural network parameters using parameter information of a parent node identified by the parent node identifier and using the parameter update information (1030).SELECTED DRAWING: Figure 10
Resumen de: US2024330679A1
A method for making predictions pertaining to entities represented within a heterogeneous graph includes: identifying, for each node in the heterogeneous graph structure, a set of node-target paths that connect the node to a target node; assigning, to each of the node-target paths identified for each node, a path type identifier indicative of a number of edges and corresponding edge types in the associated node-target path; and extracting a semantic tree from the heterogeneous graph structure. The semantic tree includes the target node as a root node and defines a hierarchy of metapaths that each individually correspond to a subset of the node-target paths in the heterogeneous graph structure assigned to a same path type identifier. The semantic tree is encoded, using one or more neural networks by generating a metapath embedding corresponding to each metapath in the semantic tree. Each of the resulting metapath embeddings encodes aggregated feature-label data for nodes in the heterogeneous graph structure corresponding to the path type identifier corresponding to the metapath associated with the metapath embedding. A label is predicted for the target node in the heterogeneous graph structure based on the set of metapath embeddings.
Resumen de: US20260038529A1
A method, apparatus, and non-transitory computer-readable medium for automatic speech recognition using conditional factorization for bilingual code-switched and monolingual speech may include receiving an audio observation sequence comprising a plurality of frames, the audio observation sequence including audio in a first language or a second language. The approach may further include mapping the audio observation sequence into a first sequence of hidden representations, the mapping being generated by a first encoder corresponding to the first language and mapping the audio observation sequence into a second sequence of hidden representations, the mapping being generated by a second encoder corresponding to the second language. The approach may further include generating a label-to-frame sequence based on the first sequence of hidden representations and the second sequence of hidden representations, using a joint neural network based model.
Resumen de: US20260038120A1
A method for analyzing an image, called “analysis image”, of a dental arch of a patient. The method in which the analysis image is submitted to a neural network, in order to determine at least one value of an image attribute relating to the analysis image. The analysis image is a photograph or an image taken from a film. The image attribute relates to a position or an orientation or a calibration of an acquisition apparatus used to acquire the analysis image or a combination thereof, or a quality of the analysis image, and in particular relating to the brightness, to the contrast or to the sharpness of the analysis image, or a combination thereof.
Resumen de: US20260037790A1
Apparatuses, systems, and techniques to identify objects with in an image. In at least one embodiment, objects are identified in an image using one or more neural networks, in which the one or more neural networks are trained using one or more decay parameters.
Resumen de: US20260037802A1
Systems and methods for training a machine learning model implemented over a network configured to represent the machine learning model are provided. At least one or more directed edges connect the one or more nodes with an edge representing a connection between a first node and a second node, the second node computing an activation depending on the values of activations on first nodes and values associated with the connections, the connection being either conforming or non-conforming. The machine learning model may be trained by iteratively adjusting parameters w and b, respectively associated with weights and biases associated with edges connecting computational nodes. Connections between nodes may be sparsified by adjusting the parameter w to a first value for non-conforming connections during the training phase to reduce complexity of the connections among the plurality of nodes, or to ensure the input-output function of the network adheres to additional constraints.
Resumen de: US20260037770A1
In some embodiments, a natural language input directed to an entity may be obtained. In connection with obtaining the natural language input, a vector similarity search of a database may be performed based on the natural language input to obtain one or more vectors corresponding to stored data related to the natural language input. In some embodiments, in connection with obtaining the natural language input directed to the entity, a first neural network may be used to obtain a state vector representing stored data related to the natural language input. The natural language input and the state vector may be inputted into a second neural network associated with the entity to generate a conversation response of the entity to the natural language input for presentation via the user interface. In some embodiments, the state vector may be updated using the conversation response of the entity.
Resumen de: US20260038489A1
Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.
Resumen de: US20260030910A1
In a computer-implemented workflow, a submission of an asset localized for a first location is received. The asset may be intended for dissemination to a second location. A trained neural network is applied to the asset to determine a probability of recommending localization of the asset for the second location. This determination can be based on a plurality of features indicating contextual aspects of a document, which are identified in accordance with a plurality of transformations performed on the asset utilizing the trained neural network. Responsive to determining that the probability satisfies a condition, such as being a percentage above a threshold value, a recommendation is provided to exclude the asset from being localized to the second location.
Nº publicación: US20260030500A1 29/01/2026
Solicitante:
NEW YORK UNIV [US]
YEDA RES AND DEVELOPMENT CO LTD [IL]
New York University,
Yeda Research And Development Co. Ltd
Resumen de: US20260030500A1
A system for processing ultrasound images utilizes a trained orientation neural network to provide orientation information for a multiplicity of images captured around a body part, orienting each image with respect to a canonical view. In one aspect, the system includes a set creator and a generative neural network. The set creator generates sets of images and their associated transformations over time. The generative neural network then produces a summary canonical view set from these sets, showing changes during a body part cycle. In another aspect, the system includes a volume reconstructer. The volume reconstructer uses the orientation information to generate a volume representation of the body part from the oriented images using tomographic reconstruction, and to generate a canonical image from that volume representation.